- 作用
- 應用場景
- 示例
- 將hdfs中的文件copy到本地map/reduce程序端,供map/reduce端代碼使用
應用場景
- 大文件與小文件合併操作,如大文件10G,小文件10M,並且輸入格式可以完全不一樣
- 主函數端代碼
public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = new Job(conf); job.getConfiguration().set("xyz", "fileHdfsLocation"); }
- map或reduce類端
public static class LogMapper extends Mapper<Object, LongWritable, xxx, xxx> { private static HashSet<String> smallCollection = null; protected void setup(Context context) throws IOException, InterruptedException { smallCollection = new HashSet<String>(); Path fileIn = new Path(context.getConfiguration().get("xyz")); FileSystem hdfs = fileIn.getFileSystem(context.getConfiguration()); FSDataInputStream hdfsReader = hdfs.open(fileIn); Text line = new Text(); LineReader lineReader = new LineReader(hdfsReader); while (lineReader.readLine(line) > 0) { //you can do something here System.out.println(line.toString()); smallCollection.add(line.toString()); } lineReader.close(); hdfsReader.close(); } public void map(Object key, Text value, Context context) throws IOException, InterruptedException { // use this Hashset } }