使用MapReduce解析HDFS中的文件生成HFile文件導入HBase(三)

使用MapReduce生成HFile文件是導入大量數據到HBase的最快方法

總共分爲兩部分,生成HFile和導入到HBase

一、生成HFile

1.主程序ConvertToHFiles.java


public class ConvertToHFiles extends Configured implements Tool {

    private static final Log LOG = LogFactory.getLog(ConvertToHFiles.class);

    public static void main(String[] args) throws Exception {
        int res = ToolRunner.run(new Configuration(), new ConvertToHFiles(), args);
        System.exit(res);
    }

    @Override
    public int run(String[] args) throws Exception {
        try {
            Configuration conf = HBaseConfiguration.create();
            conf.set("fs.hdfs.impl", org.apache.hadoop.hdfs.DistributedFileSystem.class.getName());
            conf.set("fs.file.impl",org.apache.hadoop.fs.LocalFileSystem.class.getName());

            String inputPath = args[0];
            String outputPath = args[1];
            final TableName tableName = TableName.valueOf(args[2]);

            //create hbase connection
            Connection connection = ConnectionFactory.createConnection(conf);
            Table table = connection.getTable(tableName);

            //create job
            Job job = Job.getInstance(conf, "ConvertToHFiles: Convert File to HFiles");
            job.setInputFormatClass(TextInputFormat.class);
            job.setJarByClass(ConvertToHFiles.class); 

            job.setMapperClass(ConvertToHFilesMapper.class);
            job.setMapOutputKeyClass(ImmutableBytesWritable.class);
            job.setMapOutputValueClass(KeyValue.class);

            HFileOutputFormat2.configureIncrementalLoad(job, table, connection.getRegionLocator(tableName));

            FileInputFormat.setInputPaths(job, inputPath);
            HFileOutputFormat2.setOutputPath(job, new Path(outputPath));

            if (!job.waitForCompletion(true)) {
                LOG.error("Failure");
            } else {
                LOG.info("Success");
                return 0;
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
        return 1;
    }
}

2.Mapper端 ConvertToHFilesMapper.java

public class ConvertToHFilesMapper extends Mapper<LongWritable, Text, ImmutableBytesWritable, Cell> {

    public static final byte[] CF = Bytes.toBytes("f");
    public static final ImmutableBytesWritable rowKey = new ImmutableBytesWritable();
    static ArrayList<byte[]> qualifiers = new ArrayList<>();

    @Override
    protected void setup(Context context) throws IOException, InterruptedException {
        super.setup(context);
        context.getCounter("Convert", "mapper").increment(1);

        //列的字段,這裏是三列
        byte[] name = Bytes.toBytes("name");
        byte[] xxx = Bytes.toBytes("xxx");
        byte[] score = Bytes.toBytes("score");
        qualifiers.add(name);
        qualifiers.add(xxx);
        qualifiers.add(score);
    }

    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

        //字段以逗號分割
        String[] line = value.toString().split(",");

        byte[] rowKeyBytes = DigestUtils.md5Hex(line[0]).getBytes();
        rowKey.set(rowKeyBytes);

        context.getCounter("Convert", line[2]).increment(1);

        for (int i = 0; i < line.length - 1; i++) {
            KeyValue kv = new KeyValue(rowKeyBytes, CF, qualifiers.get(i), Bytes.toBytes(line[i + 1]));

            if (null != kv) {
                context.write(rowKey, kv);
            }
        }
    }
}

這樣就會在out目錄下生成_SUCCESS和對應columnFamily的文件夾,文件夾下就是HFile文件
out目錄
columnFamily的文件夾下的HFile文件:

這裏寫圖片描述

二、將生成的HFIle導入到HBase

public class HFile2HBase {

    public static void main(String[] args) {
        String table_name = args[0];
        String output_dir = args[1];
        //配置文件設置
        Configuration conf = HBaseConfiguration.create();
        conf.set("hbase.zookeeper.quorum", "192.168.x.xx");
        conf.set("hbase.metrics.showTableName", "false");

        Path dir = new Path(output_dir);

        //把生成的HFile導入到hbase當中
        try {
            Connection conn = ConnectionFactory.createConnection(conf);
            // get table
            Table table = conn.getTable(TableName.valueOf(table_name));
            //get regionLocator
            RegionLocator regionLocator = conn.getRegionLocator(TableName.valueOf(table_name));

            LoadIncrementalHFiles loader = new LoadIncrementalHFiles(conf);
            //run bulkLoad
            loader.doBulkLoad(dir, new HBaseAdmin(conn), table, regionLocator);

        } catch (IOException e) {
            e.printStackTrace();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

大功告成,去hbase裏查看就可以了~

同時遇到了一個問題,看了一些博客說HFile導入僅適合初次數據導入,即表內數據爲空,或者每次入庫表內都無數據的情況。但是我第二次導入了不同的HFIle文件到同一個表也導入成功了,數據也增加了,不知道怎麼回事,難道hbase更新了,這個問題待解決。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章