HBase MapReduce 詳解

通過HBase的相關JavaAPI,我們可以實現伴隨HBase操作的MapReduce過程,比如使用MapReduce將數據從本地文件系統導入到HBase的表中,比如我們從HBase中讀取一些原始數據後使用MapReduce做數據分析。

官方HBase-MapReduce

查看HBase的MapReduce任務的執行

$ bin/hbase mapredcp

環境變量的導入

  1. 執行環境變量的導入(臨時生效,在命令行執行下述操作)
$ export HBASE_HOME=/opt/module/hbase-1.3.1
$ export HADOOP_HOME=/opt/module/hadoop-2.7.2
$ export HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase mapredcp`
  1. 永久生效:在/etc/profile配置
export HBASE_HOME=/opt/module/hbase-1.3.1
export HADOOP_HOME=/opt/module/hadoop-2.7.2
並在hadoop-env.sh中配置:(注意:在for循環之後配)
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/opt/module/hbase/lib/*

運行官方的MapReduce任務

案例一:統計Student表中有多少行數據

$ /opt/module/hadoop-2.7.2/bin/yarn jar lib/hbase-server-1.3.1.jar rowcounter student

案例二:使用MapReduce將本地數據導入到HBase

  1. 在本地創建一個tsv格式的文件:fruit.tsv
1001	Apple	Red
1002	Pear		Yellow
1003	Pineapple	Yellow
  1. 創建HBase表
hbase(main):001:0> create 'fruit','info'
  1. 在HDFS中創建input_fruit文件夾並上傳fruit.tsv文件
$ /opt/module/hadoop-2.7.2/bin/hdfs dfs -mkdir /input_fruit/
$ /opt/module/hadoop-2.7.2/bin/hdfs dfs -put fruit.tsv /input_fruit/
  1. 執行MapReduce到HBase的fruit表中
$ /opt/module/hadoop-2.7.2/bin/yarn jar lib/hbase-server-1.3.1.jar importtsv \
-Dimporttsv.columns=HBASE_ROW_KEY,info:name,info:color fruit \
hdfs://hadoop102:9000/input_fruit
  1. 使用scan命令查看導入後的結果
hbase(main):001:0> scan ‘fruit’

自定義HBase-MapReduce1

目標:將fruit表中的一部分數據,通過MR遷入到fruit_mr表中。
分步實現:

  1. 構建ReadFruitMapper類,用於讀取fruit表中的數據
import java.io.IOException;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.CellUtil;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableMapper;
import org.apache.hadoop.hbase.util.Bytes;
public class ReadFruitMapper extends TableMapper<ImmutableBytesWritable, Put> {
	@Override
	protected void map(ImmutableBytesWritable key, Result value, Context context) 
	throws IOException, InterruptedException {
	//將fruit的name和color提取出來,相當於將每一行數據讀取出來放入到Put對象中。
		Put put = new Put(key.get());
		//遍歷添加column行
		for(Cell cell: value.rawCells()){
			//添加/克隆列族:info
			if("info".equals(Bytes.toString(CellUtil.cloneFamily(cell)))){
				//添加/克隆列:name
				if("name".equals(Bytes.toString(CellUtil.cloneQualifier(cell)))){
					//將該列cell加入到put對象中
					put.add(cell);
					//添加/克隆列:color
				}else if("color".equals(Bytes.toString(CellUtil.cloneQualifier(cell)))){
					//向該列cell加入到put對象中
					put.add(cell);
				}
			}
		}
		//將從fruit讀取到的每行數據寫入到context中作爲map的輸出
		context.write(key, put);
	}
}
  1. 構建WriteFruitMRReducer類,用於將讀取到的fruit表中的數據寫入到fruit_mr表中
import java.io.IOException;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableReducer;
import org.apache.hadoop.io.NullWritable;
public class WriteFruitMRReducer extends TableReducer<ImmutableBytesWritable, Put, NullWritable> {
	@Override
	protected void reduce(ImmutableBytesWritable key, Iterable<Put> values, Context context) 
	throws IOException, InterruptedException {
		//讀出來的每一行數據寫入到fruit_mr表中
		for(Put put: values){
			context.write(NullWritable.get(), put);
		}
	}
}
  1. 構建Fruit2FruitMRRunner extends Configured implements Tool用於組裝運行Job任務
//組裝Job
	public int run(String[] args) throws Exception {
		//得到Configuration
		Configuration conf = this.getConf();
		//創建Job任務
		Job job = Job.getInstance(conf, this.getClass().getSimpleName());
		job.setJarByClass(Fruit2FruitMRRunner.class);

		//配置Job
		Scan scan = new Scan();
		scan.setCacheBlocks(false);
		scan.setCaching(500);

		//設置Mapper,注意導入的是mapreduce包下的,不是mapred包下的,後者是老版本
		TableMapReduceUtil.initTableMapperJob(
		"fruit", //數據源的表名
		scan, //scan掃描控制器
		ReadFruitMapper.class,//設置Mapper類
		ImmutableBytesWritable.class,//設置Mapper輸出key類型
		Put.class,//設置Mapper輸出value值類型
		job//設置給哪個JOB
		);
		//設置Reducer
		TableMapReduceUtil.initTableReducerJob("fruit_mr", WriteFruitMRReducer.class, job);
		//設置Reduce數量,最少1個
		job.setNumReduceTasks(1);

		boolean isSuccess = job.waitForCompletion(true);
		if(!isSuccess){
			throw new IOException("Job running with error");
		}
		return isSuccess ? 0 : 1;
	}
  1. 主函數中調用運行該Job任務
public static void main( String[] args ) throws Exception{
Configuration conf = HBaseConfiguration.create();
int status = ToolRunner.run(conf, new Fruit2FruitMRRunner(), args);
System.exit(status);
}
  1. 打包運行任務
$ /opt/module/hadoop-2.7.2/bin/yarn jar ~/softwares/jars/hbase-0.0.1-SNAPSHOT.jar com.z.hbase.mr1.Fruit2FruitMRRunner

提示:運行任務前,如果待數據導入的表不存在,則需要提前創建。
提示:maven打包命令:-P local clean package或-P dev clean package install(將第三方jar包一同打包,需要插件:maven-shade-plugin)

自定義HBase-MapReduce2

目標:實現將HDFS中的數據寫入到HBase表中。
分步實現:

  1. 構建ReadFruitFromHDFSMapper於讀取HDFS中的文件數據
import java.io.IOException;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class ReadFruitFromHDFSMapper extends Mapper<LongWritable, Text, ImmutableBytesWritable, Put> {
	@Override
	protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
		//從HDFS中讀取的數據
		String lineValue = value.toString();
		//讀取出來的每行數據使用\t進行分割,存於String數組
		String[] values = lineValue.split("\t");
		
		//根據數據中值的含義取值
		String rowKey = values[0];
		String name = values[1];
		String color = values[2];
		
		//初始化rowKey
		ImmutableBytesWritable rowKeyWritable = new ImmutableBytesWritable(Bytes.toBytes(rowKey));
		
		//初始化put對象
		Put put = new Put(Bytes.toBytes(rowKey));
		
		//參數分別:列族、列、值  
        put.add(Bytes.toBytes("info"), Bytes.toBytes("name"),  Bytes.toBytes(name)); 
        put.add(Bytes.toBytes("info"), Bytes.toBytes("color"),  Bytes.toBytes(color)); 
        
        context.write(rowKeyWritable, put);
	}
}
  1. 構建WriteFruitMRFromTxtReducer類
import java.io.IOException;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableReducer;
import org.apache.hadoop.io.NullWritable;

public class WriteFruitMRFromTxtReducer extends TableReducer<ImmutableBytesWritable, Put, NullWritable> {
	@Override
	protected void reduce(ImmutableBytesWritable key, Iterable<Put> values, Context context) throws IOException, InterruptedException {
		//讀出來的每一行數據寫入到fruit_hdfs表中
		for(Put put: values){
			context.write(NullWritable.get(), put);
		}
	}
}
  1. 創建Txt2FruitRunner組裝Job
public int run(String[] args) throws Exception {
//得到Configuration
Configuration conf = this.getConf();

//創建Job任務
Job job = Job.getInstance(conf, this.getClass().getSimpleName());
job.setJarByClass(Txt2FruitRunner.class);
Path inPath = new Path("hdfs://hadoop102:9000/input_fruit/fruit.tsv");
FileInputFormat.addInputPath(job, inPath);

//設置Mapper
job.setMapperClass(ReadFruitFromHDFSMapper.class);
job.setMapOutputKeyClass(ImmutableBytesWritable.class);
job.setMapOutputValueClass(Put.class);

//設置Reducer
TableMapReduceUtil.initTableReducerJob("fruit_mr", WriteFruitMRFromTxtReducer.class, job);

//設置Reduce數量,最少1個
job.setNumReduceTasks(1);

boolean isSuccess = job.waitForCompletion(true);
if(!isSuccess){
throw new IOException("Job running with error");
}

return isSuccess ? 0 : 1;
}
  1. 調用執行Job
public static void main(String[] args) throws Exception {
		Configuration conf = HBaseConfiguration.create();
	    int status = ToolRunner.run(conf, new Txt2FruitRunner(), args);
	    System.exit(status);
}
  1. 打包運行
$ /opt/module/hadoop-2.7.2/bin/yarn jar hbase-0.0.1-SNAPSHOT.jar com.liujh.hbase.mr2.Txt2FruitRunner

提示:運行任務前,如果待數據導入的表不存在,則需要提前創建之。
提示:maven打包命令:-P local clean package或-P dev clean package install(將第三方jar包一同打包,需要插件:maven-shade-plugin)

關注微信公衆號
簡書:https://www.jianshu.com/u/0278602aea1d
CSDN:https://blog.csdn.net/u012387141

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章