flume接收數據傳入hbase，並生成指定的rowkey和column

原創

2018-11-14 00:18

接口源碼文章：https://blogs.apache.org/flume/entry/streaming_data_into_apache_hbase
參考博客：https://blog.csdn.net/m0_37739193/article/details/72868456

目的：flume從event中取出數據作爲hbase的rowkey
使用flume接收數據，再傳入hbase中，要求中間數據不落地。
flume使用http source入口，使用sink連接hbase實現數據導入，並且通過channels使flume的內存數據保存到本地磁盤（防止集羣出現故障，數據可以備份至本地）
傳入數據格式爲 http:10.0.0.1_{asdasd} 格式說明（url_數據）

hbase存儲的結果爲：
rowkey:當前時間_url
value:數據
即要對傳入的數據進行切分，將url作爲rowkey的一部分，當前時間作爲另一部分，數據存儲到value中

步驟：
1.重寫flume中能指定rowkey的源碼（HbaseEventSerializer接口）。再打成jar包
java源碼見下面：

2.將製作jar包放入flume的/home/hadoop/apache-flume-1.6.0-cdh5.5.2-bin/lib目錄下

3.flume配置文件

a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = http
a1.sources.r1.port = 44444
a1.sources.r1.bind = 10.0.0.183

# Describe the sink
a1.sinks.k1.type = hbase
a1.sinks.k1.channel = c1
a1.sinks.k1.table = httpdata
a1.sinks.k1.columnFamily = a
a1.sinks.k1.serializer = com.hbase.Rowkey
a1.sinks.k1.channel = memoryChannel

# Use a channel which buffers events in memory
a1.channels.c1.type = file
a1.channels.c1.checkpointDir = /home/x/oyzm_test/flu-hbase/checkpoint/
a1.channels.c1.useDualCheckpoints = false
a1.channels.c1.dataDirs = /home/x/oyzm_test/flu-hbase/flumedir/
a1.channels.c1.maxFileSize = 2146435071
a1.channels.c1.capacity = 100000
a1.channels.c1.transactionCapacity = 10000

# Bind the source and sink to the channel
a1.sources.r1.channels = c1 
a1.sinks.k1.channel = c1

4.在hbase中建表 create 'httpdata,‘a’

5.flume啓動命令
flume-ng agent -c . -f /mysoftware/flume-1.7.0/conf/hbase_simple.conf -n a1 -Dflume.root.logger=INFO,console

6.flume數據寫入命令
curl -X POST -d’[{“body”:“http:10.0.0.1_{asdasd}”}]’ http://10.0.0.183:44444

hbase中數據結果：
20181108104034_http:10.0.0.183 column=a:data, timestamp=1541644834926, value={asdasd}

java源碼：

package com.hbase;

import java.text.SimpleDateFormat;
import java.util.ArrayList;
import java.util.Date;
import java.util.List;

import org.apache.flume.Context;  
import org.apache.flume.Event;
import org.apache.flume.conf.ComponentConfiguration;  
import org.apache.flume.sink.hbase.HbaseEventSerializer;  
import org.apache.hadoop.hbase.client.Increment;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Row;

public class Rowkey implements HbaseEventSerializer {   
   //列族（不用管）
    private byte[] colFam="cf".getBytes();  
    //獲取文件
    private Event currentEvent;  
    
    public void initialize(Event event, byte[] colFam) {  
        //byte[]字節型數組  
        this.currentEvent = event;
        this.colFam = colFam;  
    }  
    public void configure(Context context) {}  
    
    public void configure(ComponentConfiguration conf) {  
    }  
    
    //指定rowkey，單元格修飾名，值
    public List<Row> getActions() {  
         // 切分 currentEvent文件 從中拿到的值
         String eventStr = new String(currentEvent.getBody());
         
         //body格式爲：url_value
         String url = eventStr.split("_")[0];
         String value = eventStr.split("_")[1];
         
         //得到系統日期  
		 Date d = new Date();
		 SimpleDateFormat df = new SimpleDateFormat("yyyyMMddHHmmss");
         //rowkey
         byte[] currentRowKey = (df.format(d)+"_"+url).getBytes(); 
         
         //hbase的put操作
         List<Row> puts = new ArrayList<Row>();  
         Put putReq = new Put(currentRowKey);  
         //putReq.addColumn  列族，單元格修飾名（可指定），值
         //putReq:  column=a, data, value={asdasd} 
         putReq.addColumn(colFam,  "data".getBytes(), value.getBytes());  
         puts.add(putReq);                
         return puts;  
    }   
    public List<Increment> getIncrements() {  
        List<Increment> incs = new ArrayList<Increment>();      
        return incs;  
    }  
   //關閉流
    public void close() {  
        colFam = null;  
        currentEvent = null;  
    }  
}

pom文件：

  <dependencies>
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>4.12</version>
      <scope>test</scope>
    </dependency>
    <dependency>
	    <groupId>org.apache.flume.flume-ng-sinks</groupId>
		<artifactId>flume-ng-hbase-sink</artifactId>
		<version>1.7.0</version>
	</dependency>
   	<dependency>
		<groupId>org.apache.hbase</groupId>
		<artifactId>hbase-client</artifactId>
		<version>1.2.4</version>
	</dependency>
	<dependency>
         <groupId>jdk.tools</groupId>
         <artifactId>jdk.tools</artifactId>
         <version>1.8</version>
         <scope>system</scope>
         <systemPath>${JAVA_HOME}/lib/tools.jar</systemPath>
	</dependency>
  </dependencies>
</project>

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

flume接收數據傳入hbase，並生成指定的rowkey和column

嘗試flume配置文件從啓動命令接收參數

Flink Table API和SQL的分析及使用（一）

Flink Table API和SQL的分析及使用（二）

centos7配置可訪問外網網絡

pom文件打包插件

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結