1、怎樣將不同項目的的日誌輸出到不同的channel?
2、如何理解一個sink爲hdfs,一個sink爲logger的拓撲結構?
3、怎樣在Log4jExtAppender.java類裏擴展一個參數?
前幾篇文章只有一個項目的日誌,現在我們考慮多個項目的日誌的收集,我拷貝了一份flumedemo項目,重命名爲flumedemo2,添加了一個WriteLog2.java類,稍微改動了一下JSON字符串的輸出,將以前requestUrl中的"reporter-api"改爲了"image-api",以便和WriteLog類的輸出稍微區分開來,如下:
-
package com.besttone.flume;
-
-
import java.util.Date;
-
-
import org.apache.commons.logging.Log;
-
import org.apache.commons.logging.LogFactory;
-
-
public class WriteLog2 {
-
protected static final Log logger = LogFactory.getLog(WriteLog2.class);
-
-
/**
-
* @param args
-
* @throws InterruptedException
-
*/
-
public static void main(String[] args) throws InterruptedException {
-
// TODO Auto-generated method stub
-
while (true) {
-
logger.info(new Date().getTime());
-
logger.info("{\"requestTime\":"
-
+ System.currentTimeMillis()
-
+ ",\"requestParams\":{\"timestamp\":1405499314238,\"phone\":\"02038824941\",\"cardName\":\"測試商家名稱\",\"provinceCode\":\"440000\",\"cityCode\":\"440106\"},\"requestUrl\":\"/image-api/reporter/reporter12/init.do\"}");
-
Thread.sleep(2000);
-
-
}
-
}
- }
現在有這麼一個需求描述:要求flumedemo的項目的log4j日誌輸出到hdfs,而flumedemo2項目的log4j日誌輸出到agent的log日誌中。
我們還是採用log4jappender來配置log4j輸出給flume的souce,現在的需求明顯是有兩個sink了,一個sink爲hdfs,一個sink爲logger。於是現在的拓撲結構應該是這樣的:
需要實現這麼一個拓撲接口,就需要使用到channel selectors,讓不同的項目日誌通過不同的channel到不同的sink中去。
官方文檔上channel selectors 有兩種類型:
Replicating Channel Selector (default)
Multiplexing Channel Selector
這兩種selector的區別是:Replicating 會將source過來的events發往所有channel,而Multiplexing 可以選擇該發往哪些channel。對於上面的例子來說,如果採用Replicating ,那麼demo和demo2的日誌會同時發往channel1和channel2,這顯然是和需求不符的,需求只是讓demo的日誌發往channel1,而demo2的日誌發往channel2。
綜上所述,我們選擇Multiplexing Channel Selector。這裏我們有遇到一個棘手的問題,Multiplexing 需要判斷header裏指定key的值來決定分發到某個具體的channel,我們現在demo和demo2同時運行在同一個服務器上,如果在不同的服務器上運行,我們可以在 source1上加上一個 host 攔截器(上一篇有介紹過),這樣可以通過header中的host來判斷event該分發給哪個channel,而這裏是在同一個服務器上,由host是區分不出來日誌的來源的,我們必須想辦法在header中添加一個key來區分日誌的來源。
設想一下,如果header中有一個key:flume.client.log4j.logger.source,我們通過設置這個key的值,demo設爲app1,demo2設爲app2,這樣我們就能通過設置:
-
tier1.sources.source1.channels=channel1 channel2
-
tier1.sources.source1.selector.type=multiplexing
-
tier1.sources.source1.selector.header=flume.client.log4j.logger.source
-
tier1.sources.source1.selector.mapping.app1=channel1
- tier1.sources.source1.selector.mapping.app2=channel2
來將不同項目的的日誌輸出到不同的channel了。
我們按照這個思路繼續下去,遇到了困難,log4jappender沒有這樣的參數來讓你設置。怎麼辦?翻看了一下log4jappender的源碼,發現可以很容易的實現擴展參數,於是我複製了一份log4jappender代碼,新加了一個類叫Log4jExtAppender.java,裏面擴展了一個參數叫:source,代碼如下:
-
package com.besttone.flume;
-
-
import java.io.ByteArrayOutputStream;
-
import java.io.IOException;
-
import java.nio.charset.Charset;
-
import java.util.HashMap;
-
import java.util.Map;
-
import java.util.Properties;
-
-
import org.apache.avro.Schema;
-
import org.apache.avro.generic.GenericRecord;
-
import org.apache.avro.io.BinaryEncoder;
-
import org.apache.avro.io.DatumWriter;
-
import org.apache.avro.io.EncoderFactory;
-
import org.apache.avro.reflect.ReflectData;
-
import org.apache.avro.reflect.ReflectDatumWriter;
-
import org.apache.avro.specific.SpecificRecord;
-
import org.apache.flume.Event;
-
import org.apache.flume.EventDeliveryException;
-
import org.apache.flume.FlumeException;
-
import org.apache.flume.api.RpcClient;
-
import org.apache.flume.api.RpcClientConfigurationConstants;
-
import org.apache.flume.api.RpcClientFactory;
-
import org.apache.flume.clients.log4jappender.Log4jAvroHeaders;
-
import org.apache.flume.event.EventBuilder;
-
import org.apache.log4j.AppenderSkeleton;
-
import org.apache.log4j.helpers.LogLog;
-
import org.apache.log4j.spi.LoggingEvent;
-
-
/**
-
*
-
* Appends Log4j Events to an external Flume client which is decribed by the
-
* Log4j configuration file. The appender takes two required parameters:
-
* <p>
-
* <strong>Hostname</strong> : This is the hostname of the first hop at which
-
* Flume (through an AvroSource) is listening for events.
-
* </p>
-
* <p>
-
* <strong>Port</strong> : This the port on the above host where the Flume
-
* Source is listening for events.
-
* </p>
-
* A sample log4j properties file which appends to a source would look like:
-
*
-
* <pre>
-
* <p>
-
* log4j.appender.out2 = org.apache.flume.clients.log4jappender.Log4jAppender
-
* log4j.appender.out2.Port = 25430
-
* log4j.appender.out2.Hostname = foobarflumesource.com
-
* log4j.logger.org.apache.flume.clients.log4jappender = DEBUG,out2</p>
-
* </pre>
-
* <p>
-
* <i>Note: Change the last line to the package of the class(es), that will do
-
* the appending.For example if classes from the package com.bar.foo are
-
* appending, the last line would be:</i>
-
* </p>
-
*
-
* <pre>
-
* <p>log4j.logger.com.bar.foo = DEBUG,out2</p>
-
* </pre>
-
*
-
*
-
*/
-
public class Log4jExtAppender extends AppenderSkeleton {
-
-
private String hostname;
-
private int port;
-
private String source;
-
-
public String getSource() {
-
return source;
-
}
-
-
public void setSource(String source) {
-
this.source = source;
-
}
-
-
private boolean unsafeMode = false;
-
private long timeout = RpcClientConfigurationConstants.DEFAULT_REQUEST_TIMEOUT_MILLIS;
-
private boolean avroReflectionEnabled;
-
private String avroSchemaUrl;
-
-
RpcClient rpcClient = null;
-
-
/**
-
* If this constructor is used programmatically rather than from a log4j
-
* conf you must set the <tt>port</tt> and <tt>hostname</tt> and then call
-
* <tt>activateOptions()</tt> before calling <tt>append()</tt>.
-
*/
-
public Log4jExtAppender() {
-
}
-
-
/**
-
* Sets the hostname and port. Even if these are passed the
-
* <tt>activateOptions()</tt> function must be called before calling
-
* <tt>append()</tt>, else <tt>append()</tt> will throw an Exception.
-
*
-
* @param hostname
-
* The first hop where the client should connect to.
-
* @param port
-
* The port to connect on the host.
-
*
-
*/
-
public Log4jExtAppender(String hostname, int port, String source) {
-
this.hostname = hostname;
-
this.port = port;
-
this.source = source;
-
}
-
-
/**
-
* Append the LoggingEvent, to send to the first Flume hop.
-
*
-
* @param event
-
* The LoggingEvent to be appended to the flume.
-
* @throws FlumeException
-
* if the appender was closed, or the hostname and port were not
-
* setup, there was a timeout, or there was a connection error.
-
*/
-
@Override
-
public synchronized void append(LoggingEvent event) throws FlumeException {
-
// If rpcClient is null, it means either this appender object was never
-
// setup by setting hostname and port and then calling activateOptions
-
// or this appender object was closed by calling close(), so we throw an
-
// exception to show the appender is no longer accessible.
-
if (rpcClient == null) {
-
String errorMsg = "Cannot Append to Appender! Appender either closed or"
-
+ " not setup correctly!";
-
LogLog.error(errorMsg);
-
if (unsafeMode) {
-
return;
-
}
-
throw new FlumeException(errorMsg);
-
}
-
-
if (!rpcClient.isActive()) {
-
reconnect();
-
}
-
-
// Client created first time append is called.
-
Map<String, String> hdrs = new HashMap<String, String>();
-
hdrs.put(Log4jAvroHeaders.LOGGER_NAME.toString(), event.getLoggerName());
-
hdrs.put(Log4jAvroHeaders.TIMESTAMP.toString(),
-
String.valueOf(event.timeStamp));
-
-
// 添加日誌來源
-
if (this.source == null || this.source.equals("")) {
-
this.source = "unknown";
-
}
-
hdrs.put("flume.client.log4j.logger.source", this.source);
-
// To get the level back simply use
-
// LoggerEvent.toLevel(hdrs.get(Integer.parseInt(
-
// Log4jAvroHeaders.LOG_LEVEL.toString()))
-
hdrs.put(Log4jAvroHeaders.LOG_LEVEL.toString(),
-
String.valueOf(event.getLevel().toInt()));
-
-
Event flumeEvent;
-
Object message = event.getMessage();
-
if (message instanceof GenericRecord) {
-
GenericRecord record = (GenericRecord) message;
-
populateAvroHeaders(hdrs, record.getSchema(), message);
-
flumeEvent = EventBuilder.withBody(
-
serialize(record, record.getSchema()), hdrs);
-
} else if (message instanceof SpecificRecord || avroReflectionEnabled) {
-
Schema schema = ReflectData.get().getSchema(message.getClass());
-
populateAvroHeaders(hdrs, schema, message);
-
flumeEvent = EventBuilder
-
.withBody(serialize(message, schema), hdrs);
-
} else {
-
hdrs.put(Log4jAvroHeaders.MESSAGE_ENCODING.toString(), "UTF8");
-
String msg = layout != null ? layout.format(event) : message
-
.toString();
-
flumeEvent = EventBuilder.withBody(msg, Charset.forName("UTF8"),
-
hdrs);
-
}
-
-
try {
-
rpcClient.append(flumeEvent);
-
} catch (EventDeliveryException e) {
-
String msg = "Flume append() failed.";
-
LogLog.error(msg);
-
if (unsafeMode) {
-
return;
-
}
-
throw new FlumeException(msg + " Exception follows.", e);
-
}
-
}
-
-
private Schema schema;
-
private ByteArrayOutputStream out;
-
private DatumWriter<Object> writer;
-
private BinaryEncoder encoder;
-
-
protected void populateAvroHeaders(Map<String, String> hdrs, Schema schema,
-
Object message) {
-
if (avroSchemaUrl != null) {
-
hdrs.put(Log4jAvroHeaders.AVRO_SCHEMA_URL.toString(), avroSchemaUrl);
-
return;
-
}
-
LogLog.warn("Cannot find ID for schema. Adding header for schema, "
-
+ "which may be inefficient. Consider setting up an Avro Schema Cache.");
-
hdrs.put(Log4jAvroHeaders.AVRO_SCHEMA_LITERAL.toString(),
-
schema.toString());
-
}
-
-
private byte[] serialize(Object datum, Schema datumSchema)
-
throws FlumeException {
-
if (schema == null || !datumSchema.equals(schema)) {
-
schema = datumSchema;
-
out = new ByteArrayOutputStream();
-
writer = new ReflectDatumWriter<Object>(schema);
-
encoder = EncoderFactory.get().binaryEncoder(out, null);
-
}
-
out.reset();
-
try {
-
writer.write(datum, encoder);
-
encoder.flush();
-
return out.toByteArray();
-
} catch (IOException e) {
-
throw new FlumeException(e);
-
}
-
}
-
-
// This function should be synchronized to make sure one thread
-
// does not close an appender another thread is using, and hence risking
-
// a null pointer exception.
-
/**
-
* Closes underlying client. If <tt>append()</tt> is called after this
-
* function is called, it will throw an exception.
-
*
-
* @throws FlumeException
-
* if errors occur during close
-
*/
-
@Override
-
public synchronized void close() throws FlumeException {
-
// Any append calls after this will result in an Exception.
-
if (rpcClient != null) {
-
try {
-
rpcClient.close();
-
} catch (FlumeException ex) {
-
LogLog.error("Error while trying to close RpcClient.", ex);
-
if (unsafeMode) {
-
return;
-
}
-
throw ex;
-
} finally {
-
rpcClient = null;
-
}
-
} else {
-
String errorMsg = "Flume log4jappender already closed!";
-
LogLog.error(errorMsg);
-
if (unsafeMode) {
-
return;
-
}
-
throw new FlumeException(errorMsg);
-
}
-
}
-
-
@Override
-
public boolean requiresLayout() {
-
// This method is named quite incorrectly in the interface. It should
-
// probably be called canUseLayout or something. According to the docs,
-
// even if the appender can work without a layout, if it can work with
-
// one,
-
// this method must return true.
-
return true;
-
}
-
-
/**
-
* Set the first flume hop hostname.
-
*
-
* @param hostname
-
* The first hop where the client should connect to.
-
*/
-
public void setHostname(String hostname) {
-
this.hostname = hostname;
-
}
-
-
/**
-
* Set the port on the hostname to connect to.
-
*
-
* @param port
-
* The port to connect on the host.
-
*/
-
public void setPort(int port) {
-
this.port = port;
-
}
-
-
public void setUnsafeMode(boolean unsafeMode) {
-
this.unsafeMode = unsafeMode;
-
}
-
-
public boolean getUnsafeMode() {
-
return unsafeMode;
-
}
-
-
public void setTimeout(long timeout) {
-
this.timeout = timeout;
-
}
-
-
public long getTimeout() {
-
return this.timeout;
-
}
-
-
public void setAvroReflectionEnabled(boolean avroReflectionEnabled) {
-
this.avroReflectionEnabled = avroReflectionEnabled;
-
}
-
-
public void setAvroSchemaUrl(String avroSchemaUrl) {
-
this.avroSchemaUrl = avroSchemaUrl;
-
}
-
-
/**
-
* Activate the options set using <tt>setPort()</tt> and
-
* <tt>setHostname()</tt>
-
*
-
* @throws FlumeException
-
* if the <tt>hostname</tt> and <tt>port</tt> combination is
-
* invalid.
-
*/
-
@Override
-
public void activateOptions() throws FlumeException {
-
Properties props = new Properties();
-
props.setProperty(RpcClientConfigurationConstants.CONFIG_HOSTS, "h1");
-
props.setProperty(RpcClientConfigurationConstants.CONFIG_HOSTS_PREFIX
-
+ "h1", hostname + ":" + port);
-
props.setProperty(
-
RpcClientConfigurationConstants.CONFIG_CONNECT_TIMEOUT,
-
String.valueOf(timeout));
-
props.setProperty(
-
RpcClientConfigurationConstants.CONFIG_REQUEST_TIMEOUT,
-
String.valueOf(timeout));
-
try {
-
rpcClient = RpcClientFactory.getInstance(props);
-
if (layout != null) {
-
layout.activateOptions();
-
}
-
} catch (FlumeException e) {
-
String errormsg = "RPC client creation failed! " + e.getMessage();
-
LogLog.error(errormsg);
-
if (unsafeMode) {
-
return;
-
}
-
throw e;
-
}
-
}
-
-
/**
-
* Make it easy to reconnect on failure
-
*
-
* @throws FlumeException
-
*/
-
private void reconnect() throws FlumeException {
-
close();
-
activateOptions();
-
}
- }
然後然後將這個類打了一個jar包:Log4jExtAppender.jar,扔在了flumedemo和flumedemo2的lib目錄下。
這時候flumedemo的log4j.properties如下:
-
log4j.rootLogger=INFO
-
-
-
log4j.category.com.besttone=INFO,flume,console,LogFile
-
-
#log4j.appender.flume = org.apache.flume.clients.log4jappender.Log4jExtAppender
-
log4j.appender.flume = com.besttone.flume.Log4jExtAppender
-
log4j.appender.flume.Hostname = localhost
-
log4j.appender.flume.Port = 44444
-
log4j.appender.flume.UnsafeMode = false
-
log4j.appender.flume.Source = app1
-
-
-
log4j.appender.console= org.apache.log4j.ConsoleAppender
-
log4j.appender.console.Target= System.out
-
log4j.appender.console.layout= org.apache.log4j.PatternLayout
-
log4j.appender.console.layout.ConversionPattern= %d{yyyy-MM-dd HH:mm:ss} %5p %c{1}: %L - %m%n
-
-
log4j.appender.LogFile= org.apache.log4j.DailyRollingFileAppender
-
log4j.appender.LogFile.File= logs/app.log
-
log4j.appender.LogFile.MaxFileSize=10KB
-
log4j.appender.LogFile.Append= true
-
log4j.appender.LogFile.Threshold= DEBUG
-
log4j.appender.LogFile.layout= org.apache.log4j.PatternLayout
- log4j.appender.LogFile.layout.ConversionPattern= %-d{yyyy-MM-dd HH:mm:ss} [%t:%r] - [%5p] %m%n
flumedemo2的如下:
-
log4j.rootLogger=INFO
-
-
-
log4j.category.com.besttone=INFO,flume,console,LogFile
-
-
#log4j.appender.flume = org.apache.flume.clients.log4jappender.Log4jExtAppender
-
log4j.appender.flume = com.besttone.flume.Log4jExtAppender
-
log4j.appender.flume.Hostname = localhost
-
log4j.appender.flume.Port = 44444
-
log4j.appender.flume.UnsafeMode = false
-
log4j.appender.flume.Source = app2
-
-
-
log4j.appender.console= org.apache.log4j.ConsoleAppender
-
log4j.appender.console.Target= System.out
-
log4j.appender.console.layout= org.apache.log4j.PatternLayout
-
log4j.appender.console.layout.ConversionPattern= %d{yyyy-MM-dd HH:mm:ss} %5p %c{1}: %L - %m%n
-
-
log4j.appender.LogFile= org.apache.log4j.DailyRollingFileAppender
-
log4j.appender.LogFile.File= logs/app.log
-
log4j.appender.LogFile.MaxFileSize=10KB
-
log4j.appender.LogFile.Append= true
-
log4j.appender.LogFile.Threshold= DEBUG
-
log4j.appender.LogFile.layout= org.apache.log4j.PatternLayout
- log4j.appender.LogFile.layout.ConversionPattern= %-d{yyyy-MM-dd HH:mm:ss} [%t:%r] - [%5p] %m%n
將原來的log4j.appender.flume 由org.apache.flume.clients.log4jappender.Log4jExtAppender改爲了我重新實現添加了source參數的com.besttone.flume.Log4jExtAppender
然後flumedemo的log4j.appender.flume.Source = app1,flumedemo2的log4j.appender.flume.Source = app2。
運行flumedemo的WriteLog類,和flumedemo2的WriteLog2類,分別去hdfs上和agent的log文件中看看內容,發現hdfs上都是app1的日誌,log文件中都是app2的日誌,功能實現。
完整的flume.conf如下:
-
tier1.sources=source1
-
tier1.channels=channel1 channel2
-
tier1.sinks=sink1 sink2
-
tier1.sources.source1.type=avro
-
tier1.sources.source1.bind=0.0.0.0
-
tier1.sources.source1.port=44444
-
tier1.sources.source1.channels=channel1 channel2
-
tier1.sources.source1.selector.type=multiplexing
-
tier1.sources.source1.selector.header=flume.client.log4j.logger.source
-
tier1.sources.source1.selector.mapping.app1=channel1
-
tier1.sources.source1.selector.mapping.app2=channel2
-
tier1.sources.source1.interceptors=i1 i2
-
tier1.sources.source1.interceptors.i1.type=regex_filter
-
tier1.sources.source1.interceptors.i1.regex=\\{.*\\}
-
tier1.sources.source1.interceptors.i2.type=timestamp
-
tier1.channels.channel1.type=memory
-
tier1.channels.channel1.capacity=10000
-
tier1.channels.channel1.transactionCapacity=1000
-
tier1.channels.channel1.keep-alive=30
-
tier1.channels.channel2.type=memory
-
tier1.channels.channel2.capacity=10000
-
tier1.channels.channel2.transactionCapacity=1000
-
tier1.channels.channel2.keep-alive=30
-
tier1.sinks.sink1.type=hdfs
-
tier1.sinks.sink1.channel=channel1
-
tier1.sinks.sink1.hdfs.path=hdfs://master68:8020/flume/events/%y-%m-%d
-
tier1.sinks.sink1.hdfs.round=true
-
tier1.sinks.sink1.hdfs.roundValue=10
-
tier1.sinks.sink1.hdfs.roundUnit=minute
-
tier1.sinks.sink1.hdfs.fileType=DataStream
-
tier1.sinks.sink1.hdfs.writeFormat=Text
-
tier1.sinks.sink1.hdfs.rollInterval=0
-
tier1.sinks.sink1.hdfs.rollSize=10240
-
tier1.sinks.sink1.hdfs.rollCount=0
-
tier1.sinks.sink1.hdfs.idleTimeout=60
-
tier1.sinks.sink2.type=logger
- tier1.sinks.sink2.channel=channel2