Centos7 安装Hadoop3.x 完全分布式部署

1. 最小化安装CentOS 7 系统

1.1 安装net-tools启用 ifconfig

  yum install net-tools vim

1.2 更新系统

    yum update

1.3 配置系统IP为固定IP

1.3.1 查看网卡(文件 ifcfg-enp* 为网卡文件)

ls /etc/sysconfig/network-scripts/

1.3.2 配置网卡(virtualBox 分配 host-only网卡,并使用固定IP)

vi /etc/sysconfig/network-scripts/ifcfg-enp*
# 启用host-only网卡
cd /etc/sysconfig/network-scripts/
cp ifcfg-enp0s3  ifcfg-enp0s8

修改网卡为静态IP
1. 修改BOOTPROTO为static
2. 修改NAME为enp0s8
3. 修改UUID(可以随意改动一个值,只要不和原先的一样)
4. 添加IPADDR,可以自己制定,用于主机连接虚拟机使用。
5. 添加NETMASK=255.255.255.0 (网管 也可以和网段一样 x.x.x.255)
image

1.3.3 重启网卡

service network restart

1.3.4 修改主机名称(可以在安装时候指定)

vim /etc/hostname

1.4 配置Host,可以使用名称直接访问

vim /etc/hosts
# 复制到其他机器
scp /etc/hosts  root@192.168.56.12:/etc/hosts

增加内容
image

1.5 配置免密码登录,生成各种密码文件

    ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
    cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
    chmod 0600 ~/.ssh/authorized_keys
    # 拷贝公钥到远程服务器
    cat ~/.ssh/id_rsa.pub | ssh root@192.168.56.101  "cat - >> ~/.ssh/authorized_keys"

    # 如果需要互相免密码登录,则执行下面命令
    scp .ssh/authorized_keys  root@192.168.56.14:~/.ssh/authorized_keys

2. 安装JDK

2.1 下载JDK 下载

2.2 将下载的JDK放到 opt目录下解压

    cd /opt/
    tar -xzvf server-jre-8u161-linux-x64.tar.gz
    # 创建快捷方式
    ln -sf jdk1.8.0_161/ jdk

2.2 将JDK添加到环境变量中

    vim /etc/profile
    # 添加如下内容
    export JAVA_HOME=/opt/jdk
    export PATH=.:$PATH:$JAVA_HOME/bin
    # 使修改生效
    source /etc/profile

2.3 验证JDK是否安装成功

    java -version

image

3. 安装Hadoop

3.1 下载Hadoop,下载地址

3.2 将下载的Hadoop放入/opt目录

    # 1. 解压Hadoop
    tar -xzvf hadoop-3.0.0.tar.gz 
    # 2. 创建超连接
    ln -sf hadoop-3.0.0 hadoop

3.3 安装Zookeeper

3.3.1 下载Zookeeper 下载地址

3.3.2 拷贝zookeeper到需要的机器上

    scp /opt/zookeeper-3.4.11.tar.gz node2:/opt/

3.3.3 解压zookeeper

    tar -xzvf zookeeper-3.4.11.tar.gz

3.3.4 创建连接文件

    ln -sf zookeeper-3.4.11 zookeeper

3.3.5 配置环境变量

    vim /etc/profilve
        # 添加如下内容
        export ZOOKEEPER_HOME = /opt/zookeeper
        export PATH = $PATH:$ZOOKEEPER_HOME/bin

3.3.6 配置zookeeper集群,修改配置文件

    cp /opt/zookeeper/conf/zoo_sample.cfg /opt/zookeeper/conf/zoo.cfg
        # 5.1 在zoo.cfg 文件末尾追加(zoo1 为 服务器名称)
        # 具体配置见:http://zookeeper.apache.org/doc/r3.4.11/zookeeperStarted.html#sc_RunningReplicatedZooKeeper
        tickTime=2000
        dataDir=/opt/data/zookeeper # 数据存放路径
        clientPort=2181
        initLimit=5
        syncLimit=2
        server.1=node2:2888:3888
        server.2=node3:2888:3888
        server.3=node4:2888:3888

3.3.7 将配置文件复制到其他节点

    scp /opt/zookeeper/conf/zoo.cfg node2:/opt/zookeeper/conf/

3.3.8 创建节点ID,在配置的 dataDir 路径中添加myid文件

    echo "1" > myid

3.3.9 启动 zookeeper(已经添加到环境变量了)

    zkServer.sh start

3.3.10 检验是否启动成功

    jps

如果看到 如下图进程,表示启动成功

image

3.3.11 (可选) zookeeper Centos7 配置开机自启动

  1. 在/etc/systemd/system/文件夹下创建一个启动脚本zookeeper.service
    内容如下:
[Unit]
Description=zookeeper
After=syslog.target network.target

[Service]
Type=forking
# 指定zookeeper 日志文件路径,也可以在zkServer.sh 中定义
Environment=ZOO_LOG_DIR=/opt/data/zookeeper/logs
# 指定JDK路径,也可以在zkServer.sh 中定义
Environment=JAVA_HOME=/opt/jdk
ExecStart=/opt/zookeeper/bin/zkServer.sh start
ExecStop=/opt/zookeeper/bin/zkServer.sh stop
Restart=always
User=root
Group=root

[Install]
WantedBy=multi-user.target
  1. 重新加载服务
systemctl daemon-reload
  1. 启动zookeeper
systemctl start zookeeper
  1. 开机自启动
systemctl enable zookeeper
  1. 查看zookeeper状态
systemctl status zookeeper

问题:

nohup: 无法运行命令”java”: 没有那个文件或目录 \
nohup: failed to run command `java’: No such file or directory

解决方法: \
主要是找不到Java造成的,配置下环境变量即可,可以在zkServer.sh 中添加如下:

    JAVA_HOME=/opt/jdk

或者在zookeeper.service中指定:

    Environment=JAVA_HOME=/opt/jdk

3.4 修改Hadoop配置(完全分布式)

参考文档:
1. (Hadoop HDFS分布式配置)http://hadoop.apache.org/docs/r3.0.0/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html
2. (Hadoop Yarn 分布式配置) http://hadoop.apache.org/docs/r3.0.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html

3.4.1 配置Hadoop 环境变量

    # 添加hadoop环境变量
    export HADOOP_HOME = /opt/hadoop
    export PATH = $PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:
    # 启用环境变量
    source /etc/profile

3.4.2 HADOOP 节点分布如下:

节点 NN DN ZK ZKFC JN RM NM
Node1 1 1 1
Node2 1 1 1 1 1 1 1
Node3 1 1 1 1
Node4 1 1 1 1

上面已经配置好了zookeeper,这里就不需要在配置了

3.4.3 修改Hadoop环境配置文件 hadoop-env.sh

    # 设置Java环境变量
    exprot JAVA_HOME = /opt/jdk
    export HADOOP_HOME = /opt/hadoop

3.4.4 参考 官方文档 配置高可用HDFS

  1. 配置 hdfs-site.xml 文件如下:
<configuration>
    <property>
        <!--这里配置逻辑名称,可以随意写 -->
        <name>dfs.nameservices</name>
        <value>hbzx</value>
    </property>
    <property>
        <!-- 禁用权限 -->
        <name>dfs.permissions.enabled</name>
        <value>false</value>
    </property>
    <property>
        <!-- 配置namenode 的名称,多个用逗号分割  -->
        <name>dfs.ha.namenodes.hbzx</name>
        <value>nn1,nn2</value>
    </property>
    <property>
        <!-- dfs.namenode.rpc-address.[nameservice ID].[name node ID] namenode 所在服务器名称和RPC监听端口号  -->
        <name>dfs.namenode.rpc-address.hbzx.nn1</name>
        <value>node1:9820</value>
    </property>
    <property>
        <!-- dfs.namenode.rpc-address.[nameservice ID].[name node ID] namenode 所在服务器名称和RPC监听端口号  -->
        <name>dfs.namenode.rpc-address.hbzx.nn2</name>
        <value>node2:9820</value>
    </property>
    <property>
        <!-- dfs.namenode.http-address.[nameservice ID].[name node ID] namenode 监听的HTTP协议端口 -->
        <name>dfs.namenode.http-address.hbzx.nn1</name>
        <value>node1:9870</value>
    </property>
    <property>
        <!-- dfs.namenode.http-address.[nameservice ID].[name node ID] namenode 监听的HTTP协议端口 -->
        <name>dfs.namenode.http-address.hbzx.nn2</name>
        <value>node2:9870</value>
    </property>

    <property>
        <!-- namenode 共享的编辑目录, journalnode 所在服务器名称和监听的端口 -->
        <name>dfs.namenode.shared.edits.dir</name>
        <value>qjournal://node2:8485;node3:8485;node4:8485/hbzx</value>
    </property>

    <property>
        <!-- namenode高可用代理类 -->
        <name>dfs.client.failover.proxy.provider.hbzx</name>
        <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>

    <property>
        <!-- 使用ssh 免密码自动登录 -->
        <name>dfs.ha.fencing.methods</name>
        <value>sshfence</value>
    </property>

    <property>
        <name>dfs.ha.fencing.ssh.private-key-files</name>
        <value>/root/.ssh/id_rsa</value>
    </property>

    <property>
        <!-- journalnode 存储数据的地方 -->
        <name>dfs.journalnode.edits.dir</name>
        <value>/opt/data/journal/node/local/data</value>
    </property>

    <property>
        <!-- 配置namenode自动切换 -->
        <name>dfs.ha.automatic-failover.enabled</name>
        <value>true</value>
    </property>

</configuration>
  1. 配置 core-site.xml
<configuration>
    <property>
        <!-- 为Hadoop 客户端配置默认的高可用路径  -->
        <name>fs.defaultFS</name>
        <value>hdfs://hbzx</value>
    </property>
    <property>
        <!-- Hadoop 数据存放的路径,namenode,datanode 数据存放路径都依赖本路径,不要使用 file:/ 开头,使用绝对路径即可
            namenode 默认存放路径 :file://${hadoop.tmp.dir}/dfs/name
            datanode 默认存放路径 :file://${hadoop.tmp.dir}/dfs/data
        -->
        <name>hadoop.tmp.dir</name>
        <value>/opt/data/hadoop/</value>
    </property>

    <property>
        <!-- 指定zookeeper所在的节点 -->
        <name>ha.zookeeper.quorum</name>
        <value>node2:2181,node3:2181,node4:2181</value>
    </property>

</configuration>
  1. 配置yarn-site.xml 为单节点默认,多节点参考:官方文档
<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.env-whitelist</name>
        <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
    </property>

    <property>
        <!-- 配置yarn为高可用 -->
        <name>yarn.resourcemanager.ha.enabled</name>
        <value>true</value>
    </property>
    <property>
        <!-- 集群的唯一标识 -->
        <name>yarn.resourcemanager.cluster-id</name>
        <value>hbzx</value>
    </property>
    <property>
        <!--  ResourceManager ID -->
        <name>yarn.resourcemanager.ha.rm-ids</name>
        <value>rm1,rm2</value>
    </property>
    <property>
        <!-- 指定ResourceManager 所在的节点 -->
        <name>yarn.resourcemanager.hostname.rm1</name>
        <value>node1</value>
    </property>
    <property>
        <!-- 指定ResourceManager 所在的节点 -->
        <name>yarn.resourcemanager.hostname.rm2</name>
        <value>node2</value>
    </property>
    <property>
        <!-- 指定ResourceManager Http监听的节点 -->
        <name>yarn.resourcemanager.webapp.address.rm1</name>
        <value>node1:8088</value>
    </property>
    <property>
        <!-- 指定ResourceManager Http监听的节点 -->
        <name>yarn.resourcemanager.webapp.address.rm2</name>
        <value>node2:8088</value>
    </property>
    <property>
        <!-- 指定zookeeper所在的节点 -->
        <name>yarn.resourcemanager.zk-address</name>
        <value>node2:2181,node3:2181,node4:2181</value>
    </property>

    <property>
        <!-- 启用节点的内容和CPU自动检测,最小内存为1G -->
        <name>yarn.nodemanager.resource.detect-hardware-capabilities</name>
        <value>true</value>
    </property>
</configuration>
  1. 配置mapred-site.xml
<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>
  1. 将配置文件复制到其他机器
scp ./* node4:/opt/hadoop/etc/hadoop/

3.5 启动HDFS

3.5.1 先启动zookeeper

    zkServer.sh start

3.5.2 在其中一个namenode上格式化zookeeper

hdfs zkfc -formatZK

如下图表示格式化成功
image

3.5.3 启动journalnode,需要启动所有节点的journalnode

hdfs --daemon start journalnode

使用JPS命令查看journalnode是否启动成功,成功之后能看到JournalNode如下图:
image

3.5.4 格式化namenode

hdfs namenode -format 
# 如果有多个namenode名称,可以使用  hdfs namenode -format xxx 指定

如果没有Error日志输出表示格式化成功

3.5.5 启动namenode,以便同步其他namenode

hdfs --daemon start namenode

启动之后使用jps命令查询是否启动成功
image

3.5.6 其他namenode同步

  1. 如果是使用高可用方式配置的namenode,使用下面命令同步(需要同步的namenode执行).
hdfs namenode -bootstrapStandby

image
2. 如果不是使用高可用方式配置的namenode,使用下面命令同步:

hdfs namenode -initializeSharedEdits

3.5.7 配置datanode

修改workers 文件,添加datanode节点

node2
node3
node4

3.5.7 启动hdfs

start-dfs.sh

jps 查看结果:
image
image
image
image

通过浏览器访问hdfs
http://192.168.56.11:9870
image
image

4. Hadoop 配置日志聚合和jobhistoryserver

4.1 yarn-site.xml 配置resourcemanager web监听

<property>
         <name>yarn.resourcemanager.webapp.address</name>
         <value>rmhost:8088</value>
 </property>

4.2 mapred-site.xml配置jobhistoryserver

<property>
    <name>mapreduce.jobhistory.address</name>
    <value>rmhost:10020</value>
</property>
<property>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>rmhost:19888</value>
</property>
<property>
    <name>mapreduce.jobhistory.intermediate-done-dir</name>
    <value>/mr-history/tmp</value>
</property>
<property>
    <name>mapreduce.jobhistory.done-dir</name>
    <value>/mr-history/done</value>
</property>

注意:jobhistoryserver需单独启动

mapred --daemon start historyserver

4.3 yarn-site.xml配置日志聚合

<!-- 开启日志聚合 -->
<property>
    <name>yarn.log-aggregation-enable</name>
    <value>true</value>
</property>
<!-- 日志聚合目录 -->
<property>
    <name>yarn.nodemanager.remote-app-log-dir</name>
    <value>/user/container/logs</value>
</property> 

错误处理

1. zkfc 格式化错误

image

java.net.NoRouteToHostException: 没有到主机的路由
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
    at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141)
2018-02-06 11:34:01,218 ERROR ha.ActiveStandbyElector: Connection timed out: couldn't connect to ZooKeeper in 5000 milliseconds
2018-02-06 11:34:01,461 INFO zookeeper.ClientCnxn: Opening socket connection to server node2/192.168.56.12:2181. Will not attempt to authenticate using SASL (unknown error)

解决方法:

关闭防火墙,并禁止防火墙启动

systemctl stop firewalld.service #停止firewall
systemctl disable firewalld.service #禁止firewall开机启动

2. 格式化namenode 报错,一直在尝试连接

如图:
image

2018-02-06 11:43:58,061 INFO ipc.Client: Retrying connect to server: node2/192.168.56.12:8485. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-02-06 11:43:58,062 INFO ipc.Client: Retrying connect to server: node4/192.168.56.14:8485. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-02-06 11:43:58,062 INFO ipc.Client: Retrying connect to server: node3/192.168.56.13:8485. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

解决办法:

启用 journalnode,需要分别启动所有节点

hdfs --daemon start journalnode

使用JPS命令查看journalnode是否启动成功,成功之后能看到JournalNode如下图:
image

3. hdfs 启动报错

image

Starting namenodes on [node1 node2]
ERROR: Attempting to operate on hdfs namenode as root
ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation.
Starting datanodes
ERROR: Attempting to operate on hdfs datanode as root
ERROR: but there is no HDFS_DATANODE_USER defined. Aborting operation.
Starting journal nodes [node2 node3 node4]
ERROR: Attempting to operate on hdfs journalnode as root
ERROR: but there is no HDFS_JOURNALNODE_USER defined. Aborting operation.
Starting ZK Failover Controllers on NN hosts [node1 node2]
ERROR: Attempting to operate on hdfs zkfc as root
ERROR: but there is no HDFS_ZKFC_USER defined. Aborting operation.

解决方法:
在start-dfs.sh,stop-dfs.sh 开始位置增加如下配置:

# 注意等号前后不要有空格
HDFS_NAMENODE_USER=root
HDFS_DATANODE_USER=root
HDFS_JOURNALNODE_USER=root
HDFS_ZKFC_USER=root

4. yarn 启用报错

image

Starting resourcemanager
ERROR: Attempting to operate on yarn resourcemanager as root
ERROR: but there is no YARN_RESOURCEMANAGER_USER defined. Aborting operation.
Starting nodemanagers
ERROR: Attempting to operate on yarn nodemanager as root
ERROR: but there is no YARN_NODEMANAGER_USER defined. Aborting operation.

解决办法:

在start-yarn.sh 文件开始处添加:

# 注意等号前后不要有空格
YARN_RESOURCEMANAGER_USER=root
YARN_NODEMANAGER_USER=root

5. NodeManager 启动报错

2018-02-06 15:22:36,169 ERROR org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.ConnectException: Your endpoint configuration is wrong; For more details see:  http://wiki.apache.org/hadoop/UnsetHostnameOrPort
    at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:259)
    at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
    at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
    at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStart(NodeManager.java:451)
    at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
    at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:834)
    at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:894)

解决办法:
让 NodeManager自动检测内容和CPU,在yarn-size.xml 添加如下配置:

    <property>
        <!-- 启用节点的内容和CPU自动检测 -->
        <name>yarn.nodemanager.resource.detect-hardware-capabilities</name>
        <value>true</value>
    </property>

6. NodeManager启动之后又结束

2018-02-06 16:50:31,210 ERROR org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Received SHUTDOWN signal from Resourcemanager, Registration of NodeManager failed, Message from ResourceManager: NodeManager from  node4 doesn't satisfy minimum allocations, Sending SHUTDOWN signal to the NodeManager. Node capabilities are <memory:256, vCores:1>; minimums are 1024mb and 1 vcores
    at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:259)
    at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
    at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
    at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStart(NodeManager.java:451)
    at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
    at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:834)
    at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:894)
Caused by: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Received SHUTDOWN signal from Resourcemanager, Registration of NodeManager failed, Message from ResourceManager: NodeManager from  node4 doesn't satisfy minimum allocations, Sending SHUTDOWN signal to the NodeManager. Node capabilities are <memory:256, vCores:1>; minimums are 1024mb and 1 vcores
    at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:375)
    at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:253)
    ... 6 more

解决办法:升级内存,NodeManager内存最小要求为1024M 和 1核CPU

7. hdfs 安全模式开(safe mode is on)

解决办法:

hadoop dfsadmin -safemode leave
发布了132 篇原创文章 · 获赞 303 · 访问量 119万+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章