Name | Default | Description |
channel | – | |
type | – | The component type name, needs to be hdfs |
hdfs.path | – | HDFS directory path (eg hdfs://namenode/flume/webdata/) |
hdfs.filePrefix | FlumeData | HDSFSink產生文件的前綴名稱,如果定義爲 MyFile,則生成文件將會是/hdfspath/MyFile.後綴 |
hdfs.fileSuffix | – | 定義產生文件的後綴。例如要生成文本類型的.txt作爲後綴 |
hdfs.inUsePrefix | – | 寫入HDFS過程中,會生成一個臨時文件,inUsePrefix則定義了臨時文件的前綴 |
hdfs.inUseSuffix | .tmp | 定義寫入HDFS過程中,文件的後綴 |
hdfs.rollInterval | 30 | Number of seconds to wait before rolling current file (0 = never roll based on time interval) 按時間生成HDFS文件 |
hdfs.rollSize | 1024 | File size to trigger roll, in bytes (0: never roll based on file size) 按文件大小觸發roll,生成新的HDFS文件 |
hdfs.rollCount | 10 | Number of events written to file before it rolled (0 = never roll based on number of events) 按寫入的event的個數觸發roll,生成新的HDFS文件 |
hdfs.idleTimeout | 0 | Timeout after which inactive files get closed (0 = disable automatic closing of idle files) 當達到idleTimeout時,關閉空閒的文件(默認爲0,不自動關閉空閒的文件) |
hdfs.batchSize | 100 | number of events written to file before it is flushed to HDFS HDFSEventSink以batchSize的event個數作爲一個事務處理,統一flush到HDFS中,提交事務 |
hdfs.codeC | – | Compression codec. one of following : gzip, bzip2, lzo, lzop, snappy 設置具體的壓縮格式,在配置fileType爲CompressedStream時,需要設置CodeC |
hdfs.fileType | SequenceFile | File format: currently SequenceFile, DataStream or CompressedStream (1)DataStream will not compress output file and please don’t set codeC (2)CompressedStream requires set hdfs.codeC with an available codeC 在生成HDFSWriter的具體實現時,通過該參數制定HDFSWriter的實現類。 三個格式分別對應HDFSSequenceFile,HDFSDataStream,HDFSCompressedDataStream |
hdfs.maxOpenFiles | 5000 | Allow only this number of open files. If this number is exceeded, the oldest file is closed. HDFSSink中維護了與HDFS創建的文件的連接(每個文件對應一個BucketWriter), 如果超過該值,HDFSSink將會關閉創建最久的BucketWriter |
hdfs.minBlockReplicas | – | Specify minimum number of replicas per HDFS block. If not specified, it comes from the default Hadoop config in the classpath. 設置HDFS塊的副本數,如果沒有特殊設置,默認採用Hadoop的配置 |
hdfs.writeFormat | Writable | Format for sequence file records. One of Text or Writable. Set to Text before creating data files with Flume, otherwise those files cannot be read by either Apache Impala (incubating) or Apache Hive. |
hdfs.callTimeout | 10000 | Number of milliseconds allowed for HDFS operations, such as open, write, flush, close. This number should be increased if many HDFS timeout operations are occurring. |
hdfs.threadsPoolSize | 10 | Number of threads per HDFS sink for HDFS IO ops (open, write, etc.) 處理HDFS相關操作的線程數 |
hdfs.rollTimerPoolSize | 1 | Number of threads per HDFS sink for scheduling timed file rolling 對HDFS文件進行roll操作的線程數 |
hdfs.kerberosPrincipal | – | Kerberos user principal for accessing secure HDFS |
hdfs.kerberosKeytab | – | Kerberos keytab for accessing secure HDFS |
hdfs.proxyUser | ||
hdfs.round | false | Should the timestamp be rounded down (if true, affects all time based escape sequences except %t) |
hdfs.roundValue | 1 | Rounded down to the highest multiple of this (in the unit configured using hdfs.roundUnit), less than current time. |
hdfs.roundUnit | second | The unit of the round down value - second, minute or hour. |
hdfs.timeZone | Local Time | Name of the timezone that should be used for resolving the directory path, e.g. America/Los_Angeles. |
hdfs.useLocalTimeStamp | false | Use the local time (instead of the timestamp from the event header) while replacing the escape sequences. |
hdfs.closeTries | 0 | Number of times the sink must try renaming a file, after initiating a close attempt. If set to 1, this sink will not re-try a failed rename (due to, for example, NameNode or DataNode failure), and may leave the file in an open state with a .tmp extension. If set to 0, the sink will try to rename the file until the file is eventually renamed (there is no limit on the number of times it would try). The file may still remain open if the close call fails but the data will be intact and in this case, the file will be closed only after a Flume restart. |
hdfs.retryInterval | 180 | Time in seconds between consecutive attempts to close a file. Each close call costs multiple RPC round-trips to the Namenode, so setting this too low can cause a lot of load on the name node. If set to 0 or less, the sink will not attempt to close the file if the first attempt fails, and may leave the file open or with a ”.tmp” extension. |
serializer | TEXT | Other possible options include avro_event or the fully-qualified class name of an implementation of the EventSerializer.Builder interface. |
serializer.* |
【Flume】HDFSSink配置參數說明
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章
Flume的安裝
duanpeng198738
2018-09-11 09:19:48
日誌收集框架 Flume 簡介
豆芽菜橙
2019-02-22 17:53:14
Flume的介紹和簡單操作
KeepInUp
2018-11-17 02:06:43
日誌收集框架 Flume 組件之Source使用
豆芽菜橙
2018-09-27 02:36:45
Flume 概念和流程初步瞭解
豆芽菜橙
2018-09-22 02:29:07
Flume的load-balance、failover
CZ小螞蚱
2018-09-18 02:01:54
Apache Flume
CZ小螞蚱
2018-09-14 02:01:50
Flume+Kafka+Zookeeper搭建大數據日誌採集框架
duanpeng198738
2018-09-11 09:19:50
雲帆大數據學院_hadoop 2.2.0源碼編譯
yunfanhadoop
2018-09-11 08:25:21
雲帆大數據學院_hdfs的Shell操作與管理
yunfanhadoop
2018-09-11 08:25:20
海量日誌收集利器 —— Flume
gl_windhome
2018-09-11 08:17:44
Flume interceptor
duoku
2018-09-11 07:06:33