由于hadoop自带的启动脚本start-dfs.sh 中 journalnode的启动在namenode之后。
[root@slave1 ~]# cd /apps/hadoop-2.8.0/sbin/
[root@slave1 sbin]# vi start-dfs.sh
原本的启动顺序是这样的(namenodes,datanodes,quorumjournal nodes)
#---------------------------------------------------------
# namenodes
NAMENODES=$($HADOOP_PREFIX/bin/hdfs getconf -namenodes)
echo "Starting namenodes on [$NAMENODES]"
"$HADOOP_PREFIX/sbin/hadoop-daemons.sh" \
--config "$HADOOP_CONF_DIR" \
--hostnames "$NAMENODES" \
--script "$bin/hdfs" start namenode $nameStartOpt
#---------------------------------------------------------
# datanodes (using default slaves file)
#---------------------------------------------------------
# secondary namenodes (if any)
#---------------------------------------------------------
# quorumjournal nodes (if any)
SHARED_EDITS_DIR=$($HADOOP_PREFIX/bin/hdfs getconf -confKey dfs.namenode.shared.edits.dir 2>&-)
case "$SHARED_EDITS_DIR" in
qjournal://*)
JOURNAL_NODES=$(echo "$SHARED_EDITS_DIR" | sed 's,qjournal://\([^/]*\)/.*,\1,g; s/;/ /g; s/:[0-9]*//g')
echo "Starting journal nodes [$JOURNAL_NODES]"
"$HADOOP_PREFIX/sbin/hadoop-daemons.sh" \
--config "$HADOOP_CONF_DIR" \
--hostnames "$JOURNAL_NODES" \
--script "$bin/hdfs" start journalnode ;;
esac
#---------------------------------------------------------
# ZK Failover controllers, if auto-HA is enabled
如果按照这样的启动顺序启动,可能在日志中报这样的错:
2018-07-14 10:46:59,455 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master.hadoop/192.168.1.2:8485. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-07-14 10:46:59,457 WARN org.apache.hadoop.ipc.Client: Failed to connect to server: master.hadoop/192.168.1.2:8485: retries get failed due to exceeded maximum allowed retries number: 10
java.net.ConnectException: 拒绝连接
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:681)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:777)
at org.apache.hadoop.ipc.Client$Connection.access$3500(Client.java:409)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1542)
at org.apache.hadoop.ipc.Client.call(Client.java:1373)
at org.apache.hadoop.ipc.Client.call(Client.java:1337)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
at com.sun.proxy.$Proxy11.getEditLogManifest(Unknown Source)
at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.getEditLogManifest(QJournalProtocolTranslatorPB.java:246)
at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$13.call(IPCLoggerChannel.java:556)
at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$13.call(IPCLoggerChannel.java:553)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
解决办法一:
启动几群的时候按照这样的顺序启动:
启动顺序
1、启动zookeeper(每台都执行)
zkServer.sh status
2、启动journalnode(每台都执行)
hadoop-daemon.sh start journalnode
3、启动hdfs
start-dfs.sh
4、启动yarn
start-yarn.sh
5、启动单个结点的yarn进程
yarn-daemon.sh start resourcemanager
解决办法二:
修改hadoop配置路径下的sbin目录下的start-dfs.sh中namenode和journalnode的启动顺序(将journalnode的启动代码剪切到namenode启动代码之前)
[root@slave1 ~]# cd /apps/hadoop-2.8.0/sbin/
[root@slave1 sbin]# vi start-dfs.sh
#---------------------------------------------------------
# quorumjournal nodes (if any)
SHARED_EDITS_DIR=$($HADOOP_PREFIX/bin/hdfs getconf -confKey dfs.namenode.shared.edits.dir 2>&-)
case "$SHARED_EDITS_DIR" in
qjournal://*)
JOURNAL_NODES=$(echo "$SHARED_EDITS_DIR" | sed 's,qjournal://\([^/]*\)/.*,\1,g; s/;/ /g; s/:[0-9]*//g')
echo "Starting journal nodes [$JOURNAL_NODES]"
"$HADOOP_PREFIX/sbin/hadoop-daemons.sh" \
--config "$HADOOP_CONF_DIR" \
--hostnames "$JOURNAL_NODES" \
--script "$bin/hdfs" start journalnode ;;
esac
#---------------------------------------------------------
# namenodes
NAMENODES=$($HADOOP_PREFIX/bin/hdfs getconf -namenodes)
echo "Starting namenodes on [$NAMENODES]"
"$HADOOP_PREFIX/sbin/hadoop-daemons.sh" \
--config "$HADOOP_CONF_DIR" \
--hostnames "$NAMENODES" \
--script "$bin/hdfs" start namenode $nameStartOpt
#---------------------------------------------------------
# datanodes (using default slaves file)
#---------------------------------------------------------
# secondary namenodes (if any)
#---------------------------------------------------------
# ZK Failover controllers, if auto-HA is enabled