Hadoop常見錯誤及解決辦法

1,錯誤一:java.io.IOException: Incompatible clusterIDs 時常出現在namenode重新格式化之後
2014-04-29 14:32:53,877 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed forblock pool Block pool BP-1480406410-192.168.1.181-1398701121586 (storage idDS-167510828-192.168.1.191-50010-1398750515421) service tohadoop-master/192.168.1.181:9000" 
java.io.IOException: Incompatible clusterIDs in/data/dfs/data: namenode clusterID = CID-d1448b9e-da0f-499e-b1d4-78cb18ecdebb;datanode clusterID = CID-ff0faa40-2940-4838-b321-98272eb0dee3! U8 t) L- F( @0 ~&39; H0 N9 I
atorg.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:391)5 ~" a4 j4 o6 M7 ~* r
atorg.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:191)8 {* e. t; f7 ? I8 I: \- v
atorg.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:219)
atorg.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:837)
atorg.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:808)9 l e( o1 o u# D
atorg.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:280)
atorg.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:222)
atorg.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664)* j) }9 t/ x* ~
atjava.lang.Thread.run(Thread.java:722)
2014-04-29 14:32:53,885 WARNorg.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for:Block pool BP-1480406410-192.168.1.181-1398701121586 (storage idDS-167510828-192.168.1.191-50010-1398750515421) service tohadoop-master/192.168.1.181:90002 
2014-04-29 14:32:53,889 INFOorg.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block poolBP-1480406410-192.168.1.181-1398701121586 (storage idDS-167510828-192.168.1.191-50010-1398750515421)
2014-04-29 14:32:55,897 WARNorg.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode

原因:每次namenode format會重新創建一個namenodeId,而data目錄包含了上次format時的id,namenode format清空了namenode下的數據,但是沒有清空datanode下的數據,導致啓動時失敗,所要做的就是每次fotmat前,清空data下的所有目錄.
解決辦法:停掉集羣,刪除問題節點的data目錄下的所有內容。即hdfs-site.xml文件中配置的dfs.data.dir目錄。重新格式化namenode。
另一個更省事的辦法:先停掉集羣,然後將datanode節點目錄/dfs/data/current/VERSION中的修改爲與namenode一致即可。

2,錯誤:org.apache.hadoop.yarn.exceptions.YarnException:Unauthorized request to start container
14/04/29 02:45:07 INFO mapreduce.Job: Jobjob_1398704073313_0021 failed with state FAILED due to: Applicationapplication_1398704073313_0021 failed 2 times due to Error launchingappattempt_1398704073313_0021_000002. Got exception:org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to startcontainer.
This token is expired. current time is1398762692768 found 1398711306590
atsun.reflect.GeneratedConstructorAccessor30.newInstance(Unknown Source)
atsun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
atjava.lang.reflect.Constructor.newInstance(Constructor.java:525)
atorg.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152)
at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
atorg.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:122)
atorg.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:249)
atjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
atjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
atjava.lang.Thread.run(Thread.java:722)
. Failing the application
14/04/29 02:45:07 INFO mapreduce.Job: Counters:0
問題原因:namenode,datanode時間同步問題
解決辦法:多個datanode與namenode進行時間同步,在每臺服務器執行:ntpdatetime.nist.gov,確認時間同步成功。
最好在每臺服務器的/etc/crontab 中加入一行:
0 2 * * * root ntpdate time.nist.gov && hwclock –w
3,錯誤:java.net.SocketTimeoutException: 480000 millistimeout while waiting for channel to be ready for write
2014-05-06 14:28:09,386 ERRORorg.apache.hadoop.hdfs.server.datanode.DataNode: hadoop-datanode1:50010 ataXceiver error processing READ_BLOCKoperation src: /192.168.1.191:48854 dest: /192.168.1.191:50010
java.net.SocketTimeoutException: 480000 millistimeout while waiting for channel to be ready for write. ch :java.nio.channels.SocketChannel[connected local=/192.168.1.191:50010remote=/192.168.1.191:48854]
atorg.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) 
atorg.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:172)
atorg.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:220)
at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:546)
atorg.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:710)
atorg.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:340)
atorg.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:101)
atorg.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:65)
atorg.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
atjava.lang.Thread.run(Thread.java:722)

原因:IO超時

解決方法:4 @3 A( S/ l3 z5 g
修改hadoop配置文件hdfs-site.xml,增加dfs.datanode.socket.write.timeout和dfs.socket.timeout兩個屬性的設置。
<property>
<name>dfs.datanode.socket.write.timeout</name>
<value>6000000</value>
</property>

<property>
<name>dfs.socket.timeout</name>
<value>6000000</value>
</property>
注意: 超時上限值以毫秒爲單位。0表示無限制。

4,錯誤:DataXceiver error processing WRITE_BLOCKoperation
2014-05-0615:21:30,378 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:hadoop-datanode1:50010 ataXceivererror processing WRITE_BLOCK operation src: /192.168.1.193:34147dest: /192.168.1.191:50010
java.io.IOException: Premature EOF from inputStream; 
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194)
atorg.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
atorg.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
atorg.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:435)
atorg.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:693)
atorg.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:569)
atorg.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:115)
atorg.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68)
atorg.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221);
at java.lang.Thread.run(Thread.java:722)

原因:文件操作超租期,實際上就是data stream操作過程中文件被刪掉了。
解決辦法:
修改hdfs-site.xml (針對2.x版本,1.x版本屬性名應該是:dfs.datanode.max.xcievers):
<property> 
<name>dfs.datanode.max.transfer.threads</name>
<value>8192</value> q G4 `1 b$ |
</property>
拷貝到各datanode節點並重啓datanode即可
5,錯誤:java.io.IOException: Failed to replace a baddatanode on the existing pipeline due to no more good datanodes being availableto try.
2014-05-07 12:21:41,820 WARN [Thread-115]org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Graceful stop failed 
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException:Failed to replace a bad datanode on the existing pipeline due to no more gooddatanodes being available to try. (Nodes: current=[192.168.1.191:50010,192.168.1.192:50010], original=[192.168.1.191:50010, 192.168.1.192:50010]). Thecurrent failed datanode replacement policy is DEFAULT, and a client mayconfigure this via &39;dfs.client.block.write.replace-datanode-on-failure.policy'in its configuration.
at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:514)
atorg.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.serviceStop(JobHistoryEventHandler.java:332)
atorg.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
atorg.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
atorg.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
at org.apache.hadoop.service.CompositeService.stop(CompositeService.java:159)
atorg.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132)
atorg.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.shutDownJob(MRAppMaster.java:548)
atorg.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler$1.run(MRAppMaster.java:599)
Caused by: java.io.IOException: Failed to replace a bad datanode on theexisting pipeline due to no more good datanodes being available to try. (Nodes:current=[192.168.1.191:50010, 192.168.1.192:50010],original=[192.168.1.191:50010, 192.168.1.192:50010]). The current faileddatanode replacement policy is DEFAULT, and a client may configure this via&39;dfs.client.block.write.replace-datanode-on-failure.policy&39; in itsconfiguration.: 
atorg.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:860)
atorg.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:925)
atorg.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1031)
atorg.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:823)
atorg.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:475)
原因:無法寫入;我的環境中有3個datanode,備份數量設置的是3。在寫操作時,它會在pipeline中寫3個機器。默認replace-datanode-on-failure.policy是DEFAULT,如果系統中的datanode大於等於3,它會找另外一個datanode來拷貝。目前機器只有3臺,因此只要一臺datanode出問題,就一直無法寫入成功。
解決辦法:修改hdfs-site.xml文件,添加或者修改如下兩項:
<property>
<name>dfs.client.block.write.replace-datanode-on-failure.enable</name>
<value>true</value>
</property>
<property>
<name>dfs.client.block.write.replace-datanode-on-failure.policy</name>
<value>NEVER</value>
</property>
對於dfs.client.block.write.replace-datanode-on-failure.enable,客戶端在寫失敗的時候,是否使用更換策略,默認是true沒有問題。
對於,dfs.client.block.write.replace-datanode-on-failure.policy,default在3個或以上備份的時候,是會嘗試更換結點嘗試寫入datanode。而在兩個備份的時候,不更換datanode,直接開始寫。對於3個datanode的集羣,只要一個節點沒響應寫入就會出問題,所以可以關掉。
6,錯誤:org.apache.hadoop.util.DiskChecker$DiskErrorException:Could not find any valid local directory for & a
14/05/08 18:24:59 INFO mapreduce.Job: Task Id :attempt_1399539856880_0016_m_000029_2, Status : FAILED
Error:org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any validlocal directory for attempt_1399539856880_0016_m_000029_2_spill_0.out% w1 N0 b0 d0 a
at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:398)
atorg.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150)
atorg.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131)
atorg.apache.hadoop.mapred.YarnOutputFiles.getSpillFileForWrite(YarnOutputFiles.java:159)
atorg.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1573)
atorg.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1467)
atorg.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:699)
atorg.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:769)
atorg.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
atorg.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
atjava.security.AccessController.doPrivileged(Native Method)
atjavax.security.auth.Subject.doAs(Subject.java:415)
atorg.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
atorg.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
Container killed by the ApplicationMaster.
原因:兩種可能,hadoop.tmp.dir或者data目錄存儲空間不足。
解決辦法:看了一下我的dfs狀態,data使用率不到40%,所以推測是hadoop.tmp.dir空間不足,導致無法創建Jog臨時文件。查看core-site.xml發現沒有配置hadoop.tmp.dir,因此使用的是默認的/tmp目錄,在這目錄一旦服務器重啓數據就會丟失,因此需要修改。添加:
<property>
<name>hadoop.tmp.dir</name>
<value>/data/tmp</value>
</property>
然後重新格式化:hadoopnamenode -format
重啓。
7.錯誤
2014-06-19 10:00:32,181 INFO[org.apache.hadoop.mapred.MapTask] - Ignoring exception during close fororg.apache.hadoop.mapred.MapTask$NewOutputCollector@17bda0f2
java.io.IOException: Spill failed
atorg.apache.hadoop.mapred.MapTask$MapOutputBuffer.checkSpillException(MapTask.java:1540)
atorg.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1447)
atorg.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:699)
atorg.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1997)
atorg.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:773)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
atorg.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:235)
atjava.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
atjava.util.concurrent.FutureTask.run(FutureTask.java:166)
atjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
atjava.lang.Thread.run(Thread.java:722)
Caused by:org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any validlocal directory for output/spill0.out
atorg.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:398). 
atorg.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150)
atorg.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131)
atorg.apache.hadoop.mapred.MROutputFiles.getSpillFileForWrite(MROutputFiles.java:146)
atorg.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1573)
atorg.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$900(MapTask.java:852)
atorg.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1510)
錯誤原因:本地磁盤空間不足非hdfs (我是在myeclipse中調試程序,本地tmp目錄佔滿)
解決辦法:清理、增加空間。
8錯誤
2014-06-23 10:21:01,479 INFO [IPC Server handler 3 on 45207]org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttemptattempt_1403488126955_0002_m_000000_0 is : 0.308017162014-06-23 10:21:01,512FATAL [IPC Server handler 2 on 45207] org.apache.hadoop.mapred.TaskAttemptListenerImpl:Task: attempt_1403488126955_0002_m_000000_0 - exited : java.io.IOException:Spill failed atorg.apache.hadoop.mapred.MapTask$MapOutputBuffer.checkSpillException(MapTask.java:1540) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1063) atorg.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691) atorg.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) atorg.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112) atcom.mediadc.hadoop.MediaIndex$SecondMapper.map(MediaIndex.java:180) at com.mediadc.hadoop.MediaIndex$SecondMapper.map(MediaIndex.java:1) atorg.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162) at java.security.AccessController.doPrivileged(NativeMethod) atjavax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)Causedby: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find anyvalid local directory forattempt_1403488126955_0002_m_000000_0_spill_53.out at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:398) atorg.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150) 
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131) atorg.apache.hadoop.mapred.YarnOutputFiles.getSpillFileForWrite(YarnOutputFiles.java:159) atorg.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1573) 
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$900(MapTask.java:852) atorg.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1510)
2014-06-2310:21:01,513 INFO [IPC Server handler 2 on 45207]org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report fromattempt_1403488126955_0002_m_000000_0: Error: java.io.IOException: Spillfailed atorg.apache.hadoop.mapred.MapTask$MapOutputBuffer.checkSpillException(MapTask.java:1540) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1063) atorg.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691) atorg.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112) atcom.mediadc.hadoop.MediaIndex$SecondMapper.map(MediaIndex.java:180) atcom.mediadc.hadoop.MediaIndex$SecondMapper.map(MediaIndex.java:1) 
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) atorg.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763) 
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162) at java.security.AccessController.doPrivileged(NativeMethod) atjavax.security.auth.Subject.doAs(Subject.java:415) atorg.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) 
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)Causedby: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find anyvalid local directory forattempt_1403488126955_0002_m_000000_0_spill_53.out at

org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:398) atorg.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150) atorg.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131) 
at org.apache.hadoop.mapred.YarnOutputFiles.getSpillFileForWrite(YarnOutputFiles.java:159) atorg.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1573)
 at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$900(MapTask.java:852) atorg.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1510)
2014-06-2310:21:01,514 INFO [AsyncDispatcher event handler]org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics reportfrom attempt_1403488126955_0002_m_000000_0: Error: java.io.IOException: Spillfailed atorg.apache.hadoop.mapred.MapTask$MapOutputBuffer.checkSpillException(MapTask.java:1540)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1063) atorg.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691) atorg.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
 at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112) atcom.mediadc.hadoop.MediaIndex$SecondMapper.map(MediaIndex.java:180) atcom.mediadc.hadoop.MediaIndex$SecondMapper.map(MediaIndex.java:1) 
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) atorg.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763) 
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339) atorg.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162) 
at java.security.AccessController.doPrivileged(NativeMethod) atjavax.security.auth.Subject.doAs(Subject.java:415) atorg.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) 
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)Causedby: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find anyvalid local directory forattempt_1403488126955_0002_m_000000_0_spill_53.out 
at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:398) atorg.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150) atorg.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131)
 at org.apache.hadoop.mapred.YarnOutputFiles.getSpillFileForWrite(YarnOutputFiles.java:159) atorg.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1573) atorg.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$900(MapTask.java:852) 
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1510)2014-06-2310:21:01,516 INFO [AsyncDispatcher event handler]org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:attempt_1403488126955_0002_m_000000_0 TaskAttempt Transitioned from RUNNING toFAIL_CONTAINER_CLEANUP.
錯誤很明顯,磁盤空間不足,但鬱悶的是,進各節點查看,磁盤空間使用不到40%,還有很多空間。
鬱悶很長時間才發現,原來有個map任務運行時輸出比較多,運行出錯前,硬盤空間一路飆升,直到100%不夠時報錯。隨後任務執行失敗,釋放空間,把任務分配給其它節點。正因爲空間被釋放,因此雖然報空間不足的錯誤,但查看當時磁盤還有很多剩餘空間。
這個問題告訴我們,運行過程中的監控很重要。
9.錯誤
2015-04-07 23:12:39,837 INFOorg.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics systemshutdown complete.) 
2015-04-07 23:12:39,838 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode:Exception in namenode join
java.io.IOException: There appears to be a gap in the edit log. Weexpected txid 1, but got txid 41.
atorg.apache.hadoop.hdfs.server.namenode.MetaRecoveryContext.editLogLoaderPrompt(MetaRecoveryContext.java:94)
atorg.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:184)
atorg.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:112)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:733)
atorg.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:647)
atorg.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:264)
atorg.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:787)
atorg.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:568)
atorg.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:443)
atorg.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:491)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:684)
atorg.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:669)
atorg.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1254)
atorg.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1320)
2015-04-07 23:12:39,842 INFO org.apache.hadoop.util.ExitUtil: Exiting withstatus 11 
原因:namenode元數據被破壞,需要修復
解決:恢復一下namenode
hadoop namenode -recover0 
一路選擇c,一般就OK了



發佈了49 篇原創文章 · 獲贊 7 · 訪問量 14萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章