通過HBase的相關JavaAPI，我們可以實現伴隨HBase操作的MapReduce過程，比如使用MapReduce將數據從本地文件系統導入到HBase的表中，比如我們從HBase中讀取一些原始數據後使用MapReduce做數據分析。

官方HBase-MapReduce

查看HBase的MapReduce任務的執行

$ bin/hbase mapredcp

環境變量的導入

執行環境變量的導入（臨時生效，在命令行執行下述操作）

$ export HBASE_HOME=/opt/module/hbase-1.3.1
$ export HADOOP_HOME=/opt/module/hadoop-2.7.2
$ export HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase mapredcp`

永久生效：在/etc/profile配置

export HBASE_HOME=/opt/module/hbase-1.3.1
export HADOOP_HOME=/opt/module/hadoop-2.7.2
並在hadoop-env.sh中配置：（注意：在for循環之後配）
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/opt/module/hbase/lib/*

運行官方的MapReduce任務

案例一：統計Student表中有多少行數據

$ /opt/module/hadoop-2.7.2/bin/yarn jar lib/hbase-server-1.3.1.jar rowcounter student

案例二：使用MapReduce將本地數據導入到HBase

在本地創建一個tsv格式的文件：fruit.tsv

1001	Apple	Red
1002	Pear		Yellow
1003	Pineapple	Yellow

創建HBase表

hbase(main):001:0> create 'fruit','info'

在HDFS中創建input_fruit文件夾並上傳fruit.tsv文件

$ /opt/module/hadoop-2.7.2/bin/hdfs dfs -mkdir /input_fruit/
$ /opt/module/hadoop-2.7.2/bin/hdfs dfs -put fruit.tsv /input_fruit/

執行MapReduce到HBase的fruit表中

$ /opt/module/hadoop-2.7.2/bin/yarn jar lib/hbase-server-1.3.1.jar importtsv \
-Dimporttsv.columns=HBASE_ROW_KEY,info:name,info:color fruit \
hdfs://hadoop102:9000/input_fruit

使用scan命令查看導入後的結果

hbase(main):001:0> scan ‘fruit’

自定義HBase-MapReduce1

目標：將fruit表中的一部分數據，通過MR遷入到fruit_mr表中。
分步實現：

構建ReadFruitMapper類，用於讀取fruit表中的數據

import java.io.IOException;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.CellUtil;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableMapper;
import org.apache.hadoop.hbase.util.Bytes;
public class ReadFruitMapper extends TableMapper<ImmutableBytesWritable, Put> {
	@Override
	protected void map(ImmutableBytesWritable key, Result value, Context context) 
	throws IOException, InterruptedException {
	//將fruit的name和color提取出來，相當於將每一行數據讀取出來放入到Put對象中。
		Put put = new Put(key.get());
		//遍歷添加column行
		for(Cell cell: value.rawCells()){
			//添加/克隆列族:info
			if("info".equals(Bytes.toString(CellUtil.cloneFamily(cell)))){
				//添加/克隆列：name
				if("name".equals(Bytes.toString(CellUtil.cloneQualifier(cell)))){
					//將該列cell加入到put對象中
					put.add(cell);
					//添加/克隆列:color
				}else if("color".equals(Bytes.toString(CellUtil.cloneQualifier(cell)))){
					//向該列cell加入到put對象中
					put.add(cell);
				}
			}
		}
		//將從fruit讀取到的每行數據寫入到context中作爲map的輸出
		context.write(key, put);
	}
}

構建WriteFruitMRReducer類，用於將讀取到的fruit表中的數據寫入到fruit_mr表中

import java.io.IOException;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableReducer;
import org.apache.hadoop.io.NullWritable;
public class WriteFruitMRReducer extends TableReducer<ImmutableBytesWritable, Put, NullWritable> {
	@Override
	protected void reduce(ImmutableBytesWritable key, Iterable<Put> values, Context context) 
	throws IOException, InterruptedException {
		//讀出來的每一行數據寫入到fruit_mr表中
		for(Put put: values){
			context.write(NullWritable.get(), put);
		}
	}
}

構建Fruit2FruitMRRunner extends Configured implements Tool用於組裝運行Job任務

//組裝Job
	public int run(String[] args) throws Exception {
		//得到Configuration
		Configuration conf = this.getConf();
		//創建Job任務
		Job job = Job.getInstance(conf, this.getClass().getSimpleName());
		job.setJarByClass(Fruit2FruitMRRunner.class);

		//配置Job
		Scan scan = new Scan();
		scan.setCacheBlocks(false);
		scan.setCaching(500);

		//設置Mapper，注意導入的是mapreduce包下的，不是mapred包下的，後者是老版本
		TableMapReduceUtil.initTableMapperJob(
		"fruit", //數據源的表名
		scan, //scan掃描控制器
		ReadFruitMapper.class,//設置Mapper類
		ImmutableBytesWritable.class,//設置Mapper輸出key類型
		Put.class,//設置Mapper輸出value值類型
		job//設置給哪個JOB
		);
		//設置Reducer
		TableMapReduceUtil.initTableReducerJob("fruit_mr", WriteFruitMRReducer.class, job);
		//設置Reduce數量，最少1個
		job.setNumReduceTasks(1);

		boolean isSuccess = job.waitForCompletion(true);
		if(!isSuccess){
			throw new IOException("Job running with error");
		}
		return isSuccess ? 0 : 1;
	}

主函數中調用運行該Job任務

public static void main( String[] args ) throws Exception{
Configuration conf = HBaseConfiguration.create();
int status = ToolRunner.run(conf, new Fruit2FruitMRRunner(), args);
System.exit(status);
}

打包運行任務

$ /opt/module/hadoop-2.7.2/bin/yarn jar ~/softwares/jars/hbase-0.0.1-SNAPSHOT.jar com.z.hbase.mr1.Fruit2FruitMRRunner

提示：運行任務前，如果待數據導入的表不存在，則需要提前創建。
提示：maven打包命令：-P local clean package或-P dev clean package install（將第三方jar包一同打包，需要插件：maven-shade-plugin）

自定義HBase-MapReduce2

目標：實現將HDFS中的數據寫入到HBase表中。
分步實現：

構建ReadFruitFromHDFSMapper於讀取HDFS中的文件數據

import java.io.IOException;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class ReadFruitFromHDFSMapper extends Mapper<LongWritable, Text, ImmutableBytesWritable, Put> {
	@Override
	protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
		//從HDFS中讀取的數據
		String lineValue = value.toString();
		//讀取出來的每行數據使用\t進行分割，存於String數組
		String[] values = lineValue.split("\t");
		
		//根據數據中值的含義取值
		String rowKey = values[0];
		String name = values[1];
		String color = values[2];
		
		//初始化rowKey
		ImmutableBytesWritable rowKeyWritable = new ImmutableBytesWritable(Bytes.toBytes(rowKey));
		
		//初始化put對象
		Put put = new Put(Bytes.toBytes(rowKey));
		
		//參數分別:列族、列、值  
        put.add(Bytes.toBytes("info"), Bytes.toBytes("name"),  Bytes.toBytes(name)); 
        put.add(Bytes.toBytes("info"), Bytes.toBytes("color"),  Bytes.toBytes(color)); 
        
        context.write(rowKeyWritable, put);
	}
}

構建WriteFruitMRFromTxtReducer類

import java.io.IOException;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableReducer;
import org.apache.hadoop.io.NullWritable;

public class WriteFruitMRFromTxtReducer extends TableReducer<ImmutableBytesWritable, Put, NullWritable> {
	@Override
	protected void reduce(ImmutableBytesWritable key, Iterable<Put> values, Context context) throws IOException, InterruptedException {
		//讀出來的每一行數據寫入到fruit_hdfs表中
		for(Put put: values){
			context.write(NullWritable.get(), put);
		}
	}
}

創建Txt2FruitRunner組裝Job

public int run(String[] args) throws Exception {
//得到Configuration
Configuration conf = this.getConf();

//創建Job任務
Job job = Job.getInstance(conf, this.getClass().getSimpleName());
job.setJarByClass(Txt2FruitRunner.class);
Path inPath = new Path("hdfs://hadoop102:9000/input_fruit/fruit.tsv");
FileInputFormat.addInputPath(job, inPath);

//設置Mapper
job.setMapperClass(ReadFruitFromHDFSMapper.class);
job.setMapOutputKeyClass(ImmutableBytesWritable.class);
job.setMapOutputValueClass(Put.class);

//設置Reducer
TableMapReduceUtil.initTableReducerJob("fruit_mr", WriteFruitMRFromTxtReducer.class, job);

//設置Reduce數量，最少1個
job.setNumReduceTasks(1);

boolean isSuccess = job.waitForCompletion(true);
if(!isSuccess){
throw new IOException("Job running with error");
}

return isSuccess ? 0 : 1;
}

調用執行Job

public static void main(String[] args) throws Exception {
		Configuration conf = HBaseConfiguration.create();
	    int status = ToolRunner.run(conf, new Txt2FruitRunner(), args);
	    System.exit(status);
}

打包運行

$ /opt/module/hadoop-2.7.2/bin/yarn jar hbase-0.0.1-SNAPSHOT.jar com.liujh.hbase.mr2.Txt2FruitRunner

提示：運行任務前，如果待數據導入的表不存在，則需要提前創建之。
提示：maven打包命令：-P local clean package或-P dev clean package install（將第三方jar包一同打包，需要插件：maven-shade-plugin）

簡書：https://www.jianshu.com/u/0278602aea1d
CSDN：https://blog.csdn.net/u012387141

HBase MapReduce 詳解

官方HBase-MapReduce

查看HBase的MapReduce任務的執行

環境變量的導入

運行官方的MapReduce任務

自定義HBase-MapReduce1

自定義HBase-MapReduce2

2024年DataOps趨勢預測：AI不會取代數據工程師

雲原生週刊：K8s 中的服務和網絡｜ 2024.4.29

通過Http鏈接地址爬取有贊微信商城商品信息及下載至EXCEL

多人同時導出 Excel 幹崩服務器！新來的阿里大佬給出的解決方案太優雅了！

[轉帖]cpupower

今天，昨天，近七天，近30天，近90天，js封裝

華爲云云原生FinOps解決方案，釋放雲原生最大價值

HBase 超詳細優化

HBase擴展布隆過濾器

Kafka超詳細生產者詳解

HBase詳細簡介

HBase詳細原理

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結