使用MapReduse 處理 SequenceFile

原創

2018-09-05 05:25

爲了解決大量小圖片在HDFS存儲是存儲在的問題，將小圖片存儲到SequenceFile中，然後通過MapReduce函數對SequenceFile文件進行操作。

用過設置，job的輸入文件格式得到SequenceFile中的數據，代碼如下：

package com.wang;
import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.SequenceFileOutputFormat;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Mapper.Context;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.ReflectionUtils;

public class Parral_Pyramid {

	static int i=0;
	public static void main(String[] args) throws Exception {
		// TODO Auto-generated method stub
		Configuration conf=new Configuration();
		Job job = Job.getInstance(conf);
		//ע設置main的主類
		job.setJarByClass(Parral_Pyramid.class);
		job.setInputFormatClass(SequenceFileInputFormat.class);
		//設置Mapper參數
		job.setMapperClass(Image_Mapper.class);
		job.setMapOutputKeyClass(Text.class);
		job.setMapOutputValueClass(Text.class);
		
		job.setOutputFormatClass(org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.class);
		job.setNumReduceTasks(0);		
		
		FileInputFormat.setInputPaths(job, new Path("hdfs://master:9000/wang/result1.seq/part-r-00000"));
		
		FileOutputFormat.setOutputPath(job, new Path("hdfs://master:9000/wang/result2.seq"));		
		job.waitForCompletion(true);
	}
	static class Image_Mapper extends Mapper<Text, Text, Text, Text>{
//		private  SequenceFile.Reader reader = null;
		@Override
		protected void map(Text key, Text value, Context context)
				throws IOException, InterruptedException {
			//得到文件內容
			i++;
			System.out.println("now_key:"+key.toString()+"value="+value); //這裏map讀進來的數據即SequenceFile中的key和value
//			context.write(new Text(next_key),value);
		}
	}
}

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

使用MapReduse 處理 SequenceFile

《Python進階》學習筆記

Leetcode 3161. 物塊放置查詢

leetcode 60 排列序列

一個docker容器暴露多個端口

微服務實踐之使用 Visual Studio 2022 調試Dapr 應用程序

wpf附加屬性理解 WPF附加屬性

Hadoop運行一段時間後無法stop-all.sh

Pycharm連接遠程服務器設置 Pycharm連接遠程服務器並實現遠程調試

linux cuda10.0使用pip安裝pytorch

hadoop paceavailable on volume '/dev/mapper/vg_master-lv_root' is 0

Hadoop中sequencefile和mapfile的區別

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結