使用MapReduce從HBase中讀取數據存入HDFS路徑問題

使用MR讀取HBases數據進行計算，然後輸出到HDFS，在輸出到HDFS時遇到了路徑問題，讓我糾結了好久，今天終於理解解決了，記錄一下，希望對遇到同樣問題的人有所幫助。

原始代碼如下，出現了下面的異常，開始我百思不解，HDFS的路徑怎麼會和window本地路徑有衝突呢？怎麼會讀取的是本地的路徑？最後從網上查找資料和HBase源碼發現HBSAE的TableMapReduceUtil.initTableMapperJob方法默認的路徑是本地，所以只要把它改一下就可以了，解決方法在最下面，只需要將原來的true改爲false即可。

public class Origin_job {
	
	public static void main(String[] args) throws ClassNotFoundException, InterruptedException, IOException {
	
		long starttime = System.currentTimeMillis();
		String tablename = "test";
		Path outpath = new Path("/user/sky/output/");
		
		Configuration conf = new Configuration();
		conf.set("hbase.zookeeper.quorum","node1");
		conf.set("hbase.zookeeper.property.clientPort", "2181");
		conf.set("dfs.permissions.enabled", "false");
		conf.set("fs.defaultFS", "hdfs://node1:8020");
		conf.set("yarn.resourcemanager.hostname", "node1");
		FileSystem fs =FileSystem.get(conf);
		
		Job job =Job.getInstance(conf);//實例化一個Job
		job.setJobName("ReadHbase");//Job任務名稱
		job.setJarByClass(Origin_job.class); //Job入口類
		
		Scan scan = new Scan();
//		scan.setCaching(500);
//		scan.setCacheBlocks(false);
		TableMapReduceUtil.initTableMapperJob(tablename, scan, Origin_Mapper.class,Text.class,Text.class,job);
//		ImmutableBytesWritable.class, Put.class, job, false);
//		job.setReducerClass(Origin_Reducer.class);//Jobִ執行的Reducer程序
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(Text.class);
		if(fs.exists(outpath)){
			fs.delete(outpath, true);
		}
		FileOutputFormat.setOutputPath(job, outpath);
		
		boolean f= job.waitForCompletion(true);//判斷任務是否執行成功!ֵ
		if(f){
			System.out.println("job任務執行成功!");
		}else {
			System.out.println("job任務執行失敗!!");
		}
	    System.out.println(System.currentTimeMillis()-starttime+"毫秒!");

異常記錄如下:

2017-06-08 11:17:12,640 WARN [main] util.NativeCodeLoader (NativeCodeLoader.java:<clinit>(62)) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2017-06-08 11:17:14,475 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1019)) - session.id is deprecated. Instead, use dfs.metrics.session-id
2017-06-08 11:17:14,475 INFO [main] jvm.JvmMetrics (JvmMetrics.java:init(76)) - Initializing JVM Metrics with processName=JobTracker, sessionId=
2017-06-08 11:17:14,760 WARN [main] mapreduce.JobSubmitter (JobSubmitter.java:copyAndConfigureFiles(150)) - Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
2017-06-08 11:17:14,853 WARN [main] mapreduce.JobSubmitter (JobSubmitter.java:copyAndConfigureFiles(259)) - No job jar file set. User classes may not be found. See Job or Job#setJar(String).
2017-06-08 11:17:14,869 INFO [main] mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(441)) - Cleaning up the staging area file:/tmp/hadoop-shuke/mapred/staging/root778107143/.staging/job_local778107143_0001
Exception in thread "main" java.lang.IllegalArgumentException: Pathname /F:/HBaselib/metrics-core-2.2.0.jar from hdfs://node1:8020/F:/HBaselib/metrics-core-2.2.0.jar is not a valid DFS filename.
at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:187)
at org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:101)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1068)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1064)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1064)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:265)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:301)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:389)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Unknown Source)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
at Origin_MR.Origin_job.main(Origin_job.java:58

解決方法如下：添加個false就可以了。

TableMapReduceUtil.initTableMapperJob(tablename, scan, Origin_Mapper.class,Text.class,Text.class,job,false);

參考資料:http://www.it610.com/article/3388630.htm

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

使用MapReduce從HBase中讀取數據存入HDFS路徑問題

公司剛入職了一名 Java 中級開發，短短 4 行代碼居然湊齊了 3 個 bug！我哭了~~

公衆號5月C#/.NET熱文一覽

git 下載大陸鏡像地址

logstash常見數據清洗配置

Eclipse錯誤：找不到或無法加載主類解決辦法

啓動ArcGIS Serer 端口衝突問題解決方法

Oracle查詢某一列的重複部分

Oracle去除重複數據的方法

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結