Mapreduce 是Hadoop上一個進行分佈式數據運算和統計的框架,但是每次運行程序的時候都需要將程序打包並上傳的集羣環境中運行,這就會讓程序的調試變得十分不方便。所以在這裏寫下這篇博客和大家交流學習如何在本地調試Mapreduce程序。
我的本地開發環境是Mac10.11.4, Hadoop 2.6.4, 集羣操作系統是centos6.7
Mapreduce程序中main方法代碼片段
Configuration conf = new Configuration();
Job job = Job.getInstance(conf);
………………
FileInputFormat.setInputPaths(job, new Path("/Users/admin/Desktop/Telephone_Summary"));
FileOutputFormat.setOutputPath(job, new Path("/Users/admin/Desktop/mapreduceTestOutput3"));</span>
conf.set("mapred.job.tracker", "local");
conf.set("fs.default.name", "local");
………………
FileInputFormat.setInputPaths(job, new Path("/Users/admin/Desktop/Telephone_Summary"));
FileOutputFormat.setOutputPath(job, new Path("/Users/admin/Desktop/mapreduceTestOutput3"));
conf.set("mapred.job.tracker", "local");
conf.set("fs.defaultFS", "hdfs://Hadoop:9000");
………………
FileInputFormat.setInputPaths(job, new Path("/Users/admin/Telephone_Summary")); //hdfs的文件路徑
FileOutputFormat.setOutputPath(job, new Path("/Users/admin/mapreduceTestOutput"));//hdfs的文件路徑</span>
conf.set("mapred.job.tracker","local");
conf.set("fs.defaultFS", "hdfs://Hadoop:9000");
conf.set("mapreduce.framework.name", "yarn");
conf.set("yarn.resoucemanager.hostname", "Hadoop");
conf.set("yarn.resourcemanager.address", "172.16.124.130:8032");
System.setProperty("hadoop.home.dir","/Users/admin/Downloads/systemSoftware/Linux/hadoop-2.6.4");
Job job = Job.getInstance(conf);
job.setJar("/Users/admin/Desktop/hadoopBasic.jar");
………………
FileInputFormat.setInputPaths(job, new Path("/Users/admin/Telephone_Summary"));
FileOutputFormat.setOutputPath(job, new Path("/Users/admin/mapreduceTestOutput"));</span>
hadoop.root.logger=DEBUG, console
log4j.rootLogger = DEBUG, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.out
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{2}: %m%n