5.MR多文件的輸入輸出

1.舊API:

org.apache.hadoop.mapred.lib.MultipleOutputFormat||MultipleInputFormat和org.apache.hadoop.mapred.lib.MultipleOutputs||MultipleInputs

MultipleOutputFormat allowing to write the output data to different output files.

MultipleOutputs creates multiple OutputCollectors. Each OutputCollector can have its own OutputFormat and types for the key/value pair. Your MapReduce program will decide what to output to each OutputCollector.

2.新API:

org.apache.hadoop.mapreduce.lib.output.MultipleOutputs||MultipleInputs

整合了上面舊API兩個的功能，沒有了MultipleOutputFormat||MultipleInputFormat

MultipleInputs：

默認一個job只能使用 job.setInputFormatClass 設置使用一個inputfomat處理一種格式的數據。

如果需要實現在一個job中同時讀取來自不同目錄的不同格式文件的功能

可以自己實現一個 MultiInputFormat來讀取不同格式的文件

hadoop裏面已經提供了 MultipleInputs 來實現對一個目錄指定一個inputformat和對應的map處理類

Mapper1<LongWritable, Text, Text, 自定義類>

Mapper2<LongWritable, Text, Text, 自定義類>

Reducer<Text,自定義類, Text, Text>

public static void main(String args[]) throws IOException

{

// args[0] file1 for MapA

String file_1 = args[0];

// args[1] file2 for MapB

String file_2 = args[1];

// args[2] outPath

String outPath = args[2];

job.setMapOutputKeyClass(Text.class);

job.setMapOutputValueClass(自定義類.class);

job.setOutputKeyClass(Text.class);

job.setOutputValueClass(Text.class);

job.setOutputFormat(TextOutputFormat.class);

FileOutputFormat.setOutputPath(conf, new Path(outPath));

MultipleInputs.addInputPath(job, new Path(file_1), TextInputFormat.class, MapA.class);

MultipleInputs.addInputPath(job, new Path(file_2), TextInputFormat.class, MapB.class);

...

MultipleOutputs:

1.輸出到多個文件或多個文件夾：

　　驅動中不需要額外改變，只需要在MapClass或Reduce類中加入如下代碼

　　private MultipleOutputs<Text,IntWritable> mos;

　　 public void setup(Context context) throws IOException,InterruptedException {

　　　　mos = new MultipleOutputs(context);

　　}

　　 public void cleanup(Context context) throws IOException,InterruptedException {

　　　　 mos.close();

　　}

然後就可以用mos.write(Key key,Value value,String baseOutputPath)代替context.write(key, value);

在MapClass或Reduce中使用，輸出時也會有默認的文件part-m-00*或part-r-00*，不過這些文件是無內容的，大小爲0. 而且只有part-m-00*會傳給Reduce。

2.以多種格式輸出：

public class TestwithMultipleOutputs extends Configured implements Tool {

public static class MapClass extends Mapper<LongWritable,Text,Text,IntWritable> {

private MultipleOutputs<Text,IntWritable> mos;

protected void setup(Context context) throws IOException,InterruptedException {

mos = new MultipleOutputs<Text,IntWritable>(context);

}

public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException{

String line = value.toString();

String[] tokens = line.split("-");

mos.write("MOSInt",new Text(tokens[0]), new IntWritable(Integer.parseInt(tokens[1]))); //（第一處）

mos.write("MOSText", new Text(tokens[0]),tokens[2]);　　　　 //（第二處）

mos.write("MOSText", new Text(tokens[0]),line,tokens[0]+"/");　　//（第三處）同時也可寫到指定的文件或文件夾中

}

protected void cleanup(Context context) throws IOException,InterruptedException {

mos.close();

　　　　}

public int run(String[] args) throws Exception {

　　　　Configuration conf = getConf();

　　　　Job job = new Job(conf,"word count with MultipleOutputs");

　　　　job.setJarByClass(TestwithMultipleOutputs.class);

　　　　Path in = new Path(args[0]);

　　　　Path out = new Path(args[1]);

　　　　FileInputFormat.setInputPaths(job, in);

　　　　FileOutputFormat.setOutputPath(job, out);

　　　　job.setMapperClass(MapClass.class);

　　　　job.setNumReduceTasks(0);　　

　　　　MultipleOutputs.addNamedOutput(job,"MOSInt",TextOutputFormat.class,Text.class,IntWritable.class);

　　　　MultipleOutputs.addNamedOutput(job,"MOSText",TextOutputFormat.class,Text.class,Text.class);

　　　　System.exit(job.waitForCompletion(true)?0:1);

　　　　return 0;

　　}

　　public static void main(String[] args) throws Exception {

int res = ToolRunner.run(new Configuration(), new TestwithMultipleOutputs(), args);

　　　　System.exit(res);

　　}

5.MR多文件的輸入輸出

11.iptables防火牆設置

1-1.java基礎總結

5.IO流系統詳解

12.ftp的vsftpd服務安裝

2-1.畢向東_棧，堆，方法區的理解

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結