轉載地址:http://www.cnblogs.com/lxf20061900/p/3658172.html
有的時候希望通過Flume將讀取的文件再細分存儲,比如講source的數據按照業務類型分開存儲,具體一點比如類似:將source中web、wap、media等的內容分開存儲;比如丟棄或修改一些數據。這時可以考慮使用攔截器Interceptor。
flume通過攔截器實現修改和丟棄事件的功能。攔截器通過定義類繼承org.apache.flume.interceptor.Interceptor接口來實現。用戶可以通過該節點定義規則來修改或者丟棄事件。Flume支持鏈式攔截,通過在配置中指定構建的攔截器類的名稱。在source的配置中,攔截器被指定爲一個以空格爲間隔的列表。攔截器按照指定的順序調用。一個攔截器返回的事件列表被傳遞到鏈中的下一個攔截器。當一個攔截器要丟棄某些事件時,攔截器只需要在返回事件列表時不返回該事件即可。若攔截器要丟棄所有事件,則其返回一個空的事件列表即可。
先解釋一下一個重要對象Event:event是flume傳輸的最小對象,從source獲取數據後會先封裝成event,然後將event發送到channel,sink從channel拿event消費。event由頭(Map<String, String> headers)和身體(body)兩部分組成:Headers部分是一個map,body部分可以是String或者byte[]等。其中body部分是真正存放數據的地方,headers部分用於本節所講的interceptor。
Flume-NG自帶攔截器有多種:
1、HostInterceptor:使用IP或者hostname攔截;
2、TimestampInterceptor:使用時間戳攔截;
3、RegexExtractorInterceptor:該攔截器提取正則表達式匹配組,通過使用指定的正則表達式並追加匹配組作爲事件的header。它還支持可插拔的serializers用於在添加匹配組作爲事件header之前格式化匹配組;
4、RegexFilteringInterceptor:該攔截器會選擇性地過濾事件。通過以文本的方式解析事件主體,用配置好的規則表達式來匹配文本。提供的正則表達式可以用於包含事件或排除事件;這個和上面的那個區別是這個會按照正則表達式選擇性的讓event通過,上面那個是提取event.body符合正則的內容作爲headers的value。
5、StaticInterceptor:可以自定義event的header的value。
這些類都在org.apache.flume.interceptor包下。
這些interceptor都比較簡單我們選取HostInterceptor來講解interceptor的原理,以及如何自己定製interceptor。
這些interceptor都實現了org.apache.flume.interceptor.Interceptor接口,該接口有四個方法以及一個內部接口:
1、public void initialize()運行前的初始化,一般不需要實現(上面的幾個都沒實現這個方法);
2、public Event intercept(Event event)處理單個event;
3、public List<Event> intercept(List<Event> events)批量處理event,實際上市循環調用上面的2;
4、public void close()可以做一些清理工作,上面幾個也都沒有實現這個方法;
5、 public interface Builder extends Configurable 構建Interceptor對象,外部使用這個Builder來獲取Interceptor對象。
如果要自己定製,必須要完成上面的2,3,5。
下面,我們來看看org.apache.flume.interceptor.HostInterceptor,其全部代碼如下:
/**
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.flume.interceptor;
import java.net.InetAddress;
import java.net.UnknownHostException;
import java.util.List;
import java.util.Map;
import org.apache.flume.Context;
import org.apache.flume.Event;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import static org.apache.flume.interceptor.HostInterceptor.Constants.*;
/**
* Simple Interceptor class that sets the host name or IP on all events
* that are intercepted.<p>
* The host header is named <code>host</code> and its format is either the FQDN
* or IP of the host on which this interceptor is run.
*
*
* Properties:<p>
*
* preserveExisting: Whether to preserve an existing value for 'host'
* (default is false)<p>
*
* useIP: Whether to use IP address or fully-qualified hostname for 'host'
* header value (default is true)<p>
*
* hostHeader: Specify the key to be used in the event header map for the
* host name. (default is "host") <p>
*
* Sample config:<p>
*
* <code>
* agent.sources.r1.channels = c1<p>
* agent.sources.r1.type = SEQ<p>
* agent.sources.r1.interceptors = i1<p>
* agent.sources.r1.interceptors.i1.type = host<p>
* agent.sources.r1.interceptors.i1.preserveExisting = true<p>
* agent.sources.r1.interceptors.i1.useIP = false<p>
* agent.sources.r1.interceptors.i1.hostHeader = hostname<p>
* </code>
*
*/
public class HostInterceptor implements Interceptor {
private static final Logger logger = LoggerFactory
.getLogger(HostInterceptor.class);
private final boolean preserveExisting;
private final String header;
private String host = null;
/**
* Only {@link HostInterceptor.Builder} can build me
*/
private HostInterceptor(boolean preserveExisting,
boolean useIP, String header) {
this.preserveExisting = preserveExisting;
this.header = header;
InetAddress addr;
try {
addr = InetAddress.getLocalHost();
if (useIP) {
host = addr.getHostAddress();
} else {
host = addr.getCanonicalHostName();
}
} catch (UnknownHostException e) {
logger.warn("Could not get local host address. Exception follows.", e);
}
}
@Override
public void initialize() {
// no-op
}
/**
* Modifies events in-place.
*/
@Override
public Event intercept(Event event) {
Map<String, String> headers = event.getHeaders();
if (preserveExisting && headers.containsKey(header)) {
return event;
}
if(host != null) {
headers.put(header, host);
}
return event;
}
/**
* Delegates to {@link #intercept(Event)} in a loop.
* @param events
* @return
*/
@Override
public List<Event> intercept(List<Event> events) {
for (Event event : events) {
intercept(event);
}
return events;
}
@Override
public void close() {
// no-op
}
/**
* Builder which builds new instances of the HostInterceptor.
*/
public static class Builder implements Interceptor.Builder {
private boolean preserveExisting = PRESERVE_DFLT;
private boolean useIP = USE_IP_DFLT;
private String header = HOST;
@Override
public Interceptor build() {
return new HostInterceptor(preserveExisting, useIP, header);
}
@Override
public void configure(Context context) {
preserveExisting = context.getBoolean(PRESERVE, PRESERVE_DFLT);
useIP = context.getBoolean(USE_IP, USE_IP_DFLT);
header = context.getString(HOST_HEADER, HOST);
}
}
public static class Constants {
public static String HOST = "host";
public static String PRESERVE = "preserveExisting";
public static boolean PRESERVE_DFLT = false;
public static String USE_IP = "useIP";
public static boolean USE_IP_DFLT = true;
public static String HOST_HEADER = "hostHeader";
}
}
Constants類是參數類及默認的一些參數:
Builder類是構造HostInterceptor對象的,它會首先通過configure(Context context)方法獲取配置文件中interceptor的參數,然後方法build()用來返回一個HostInterceptor對象:
1、preserveExisting表示如果event的header中包含有本interceptor指定的header,是否要保留這個header,true則保留;
2、useIP表示是否使用本機IP地址作爲header的value,true則使用IP,默認是true;
3、header是event的headers的key,默認是host。
HostInterceptor:
1、構造函數除了賦值外,還有就是根據useIP獲取IP或者hostname;
2、intercept(Event event)方法是設置event的header的地方,首先是獲取headers對象,然後如果同時滿足preserveExisting==true並且headers.containsKey(header)就直接返回event,否則設置headers:headers.put(header, host)。
3、intercept(List<Event> events)方法是循環調用上述2的方法。
顯然其他幾個Interceptor也就類似這樣。在配置文件中配置source的interceptor時,如果是自己定製的interceptor,則需要對type參數賦值:完整類名+¥Builder,比如com.MyInterceptor$Builder即可。
這樣設置好headers後,就可以在後續的流轉中通過selector實現細分存儲。