MapReduce跑數導入HBase

原創

maclaren001

2018-08-24 10:39

日常開發中可能會碰到需要編寫MapReduce從HDFS上讀取數據，然後導入HBase。一般會使用到兩種方式，下面分別介紹下。

第一種方式：

指定OutputFormatClass爲TableOutputFormat，構造Put對象，然後設置到OutputValueClass去。

		Configuration conf = ConfSource.getHBaseConf();
		Job j = new Job(conf, "Import table " + tbName + " into hbase table:bigtable from " + path);
		j.setMapperClass(Sync2HBaseMapper.class);
		j.setOutputFormatClass(TableOutputFormat.class);
		j.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, "bigtable");
		j.setOutputKeyClass(ImmutableBytesWritable.class);
		j.setOutputValueClass(Put.class);
		j.setNumReduceTasks(0);
		j.setJarByClass(Sync2HBaseJob.class);

但是，這種寫法在數據量大、節點比較多的情況效率不太好。Reduce節點的輸出在MapReduce運行過程中不斷導入到HBase，會造成很大的網絡開銷，而且事務控制也是難點，所以，只是在數據量較少的情況下可以使用該方法。

第二種方式：

使用HFileOutputFormat2類生成HFile， HFile是HBase中KeyValue數據的存儲格式，Hadoop的二進制格式文件，實際上StoreFile就是對HFile做了輕量級包裝，即StoreFile底層就是HFile。生成的HFile會放置在指定的HDFS目錄下，然後是使用completebulkload命令就可以快速地導入到HBase,相對跑MapReduce的時間，completebulkload的執行時間幾乎可以忽略不計，本人在16核，128G內存的機器下，600M的數據源MapReduce跑了20分鐘，而使用completebulkload導入HBase只需要幾秒，非常快。但是要注意的是運行completebulkload後，HDFS上的HFile會被自動刪除掉，最好做下備份。

		Configuration conf = ConfSource.getHBaseConf();
		Job job = new Job(conf, "Import into hbase table"
				+ confClz.getHbaseTable() + " from "
				+ confClz.getDownloadPath());
		job.setJarByClass(Sync2HBaseJobViaHFile.class);
		FileInputFormat.setInputPaths(job, confClz.getDownloadPath());
		job.setMapperClass(Sync2HBaseMapper.class);
		HTable table = new HTable(conf, confClz.getHbaseTable());
		job.setReducerClass(PutSortReducer.class);
		Path outputDir = new Path(confClz.getHfilePath());
		FileOutputFormat.setOutputPath(job, outputDir);
		job.setMapOutputKeyClass(ImmutableBytesWritable.class);
		job.setMapOutputValueClass(Put.class);
		HFileOutputFormat2.configureIncrementalLoad(job, table);
		TableMapReduceUtil.addDependencyJars(job);

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

MapReduce跑數導入HBase

前端使用 Konva 實現可視化設計器（13）- 折線 - 最優路徑應用【思路篇】

Lucence學習筆記

MapReduce編程

Tomcat內存配置

InitializingBean的作用

自定義事件機制

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結