一. 准备环境
1. 三台物理服务器: 192.168.2.222(master), 192.168.2.223(slave1), 192.168.2.224(slave2)
2. linux系统:debian9
3. jdk版本:1.8
4. hadoop版本:3.1+
二. SSH免密登录配置
slave与master之间需要进行免密登录,首先对master进行免密登录配置
1. 修改master服务器SSH 配置文件
vim /etc/ssh/sshd_config
将PermitRootLogin no 或者 PermitRootLogin without-password 修改成 PermitRootLogin yes
2. 生成秘钥对
ssh-keygen -t rsa
-t参数表示类型,这里选择rsa。选择保存位置的时候直接回车,使用默认的/root/.ssh/id_rsa。提示输入密码的时候,直接回车
3. 将秘钥拷贝到.ssh目录下的authorized_keys
cat /root/.ssh/id_rsa.pub >> /root/.ssh/authorized_keys
4. 将公钥复制到两台slave服务器
ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected]
ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected]
设置完成后,同理设置slave1,slave2两台服务器
三. 部署Hadoop集群
1. 修改三台服务器的hosts文件
vim /etc/hosts
192.168.2.222 hadoop1
192.168.2.223 hadoop2
192.168.2.224 hadoop3
2. 设置三台服务器主机名
vim /etc/hostname
3. 规划master slave
目前是一主双从的结构
主从 | IP | 主机名 | 用户 | HDFS | YARN |
---|---|---|---|---|---|
master | 192.168.2.222 | hadoop1 | hadoop | NameNode | NodeManager,ResourceManager |
slave1 | 192.168.2.223 | hadoop2 | hadoop | DataNode | NodeManager |
slave2 | 192.168.2.224 | hadoop3 | hadoop | DataNode | NodeManager |
4. 创建hadoop文件存放目录
mkdir /opt/data/hadoop datanode hdfs journal log namenode tmp
scp -r /opt/data/hadoop/ [email protected]:/opt/data/
scp -r /opt/data/hadoop/ [email protected]:/opt/data/
5. 解压hadoop安装文件
tar -xvf hadoop-3.1.1.tar.gz -C /usr/local
6. 创建slaves文件
vim /home/hadoop-2.6.0/etc/hadoop/slaves
7. 修改hadoop-env.sh配置文件
vim /usr/local/hadoop-3.1.1/etc/hadoop/hadoop-env.sh
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
export JAVA_HOME=/usr/local/jdk1.8.0_201
export HADOOP_HOME=/usr/local/hadoop-3.1.1
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export HADOOP_LOG_DIR=/opt/data/hadoop/log
8. 修改core-site.xml文件
vim /usr/local/hadoop-3.1.1/etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop1:9000/</value>
</property>
<!-- 指定zookeeper地址 -->
<property>
<name>ha.zookeeper.quorum</name>
<value>192.168.2.222:2181,192.168.2.223:2181,192.168.2.224:2181</value>
</property>
<!-- 指定hadoop临时目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/data/hadoop/tmp</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
</configuration>
9. 修改hdfs-site.xml
vim /usr/local/hadoop-3.1.1/etc/hadoop/hdfs-site.xml
<configuration>
<!--HDFS 的数据块的副本存储个数, 默认是3-->
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/opt/data/hadoop/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/opt/data/hadoop/datanode</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop1:9001</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<!--secondarynamenode 运行节点的信息,和 namenode 不同节点-->
<!-- <property>
<name>dfs.secondary.http.address</name>
<value>hadoop2:50090</value>
</property>-->
</configuration>
10. 修改yarn-site.xml
hadoop classpath
vim /usr/local/hadoop-3.1.1/etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop1</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>hadoop1:18040</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hadoop1:18030</value>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>hadoop1:18088</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hadoop1:18025</value>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>hadoop1:18141</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.application.classpath</name>
<value>输入刚才返回的Hadoop classpath路径</value>
</property>
</configuration>
11. 修改mapred-site.xml
vim /usr/local/hadoop-3.1.1/etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
12. 将hadoop整个文件copy到两台slave
cd /usr/local
scp -r hadoop-3.1.1 [email protected]:/usr/local
scp -r hadoop-3.1.1 [email protected]:/usr/local
四. 启动Hadoop集群
1. 格式化master服务器namenode
/usr/local/hadoop-3.1.1/bin/hadoop namenode -format
2. 启动master服务器namenode
/usr/local/hadoop-3.1.1/sbin/hadoop-daemon.sh start namenode
3. 启动yarn
/usr/local/hadoop-3.1.1/sbin/yarn-daemon.sh start resourcemanager
/usr/local/hadoop-3.1.1/sbin/yarn-daemon.sh start nodemanager
4. 启动2台slave服务器datanode
/usr/local/hadoop-3.1.1/sbin/hadoop-daemon.sh start datanode
5. 查看启动结果
/usr/local/jdk1.8.0_201/bin/jps
启动命令也可以使用 start-all.sh 启动全部
6. 配置环境变量
vim /etc/profile
export HADOOP_HOME=/usr/local/hadoop-3.1.1
export PATH=$PATH:$HADOOP_HOME/bin
source /etc/profile
查看HDFS管理页面地址: http://192.168.2.222:9870
查看YARN管理页面地址: http://192.168.2.222:18088