文章目錄
1 運行環境
1.1 軟件環境
- 四個節點
- 64位CentOS 7.0
- JVM:預裝64位JDK1.8及以上版本
1.2 IP與hostname設置
IP | HOSTNAME |
---|---|
192.168.36.134 | hadoop01 |
192.168.36.135 | hadoop02 |
192.168.36.136 | hadoop03 |
192.168.36.138 | hadoop04 |
2 安裝準備
2.1 準備虛擬機
- 準備四個虛擬機節點
修改主機名
- 四個節點分別使用root用戶修改/etc/hostname文件,修改裏面的內容爲對應的主機名
- 主機名分別爲 hadoop01, hadoop02, hadoop03, hadoop04
vi /etc/hostname
2.2 關閉防火牆
- 分別將四個節點的防火牆都關閉,命令如下
systemctl stop firewalld.service //關閉防火牆
systemctl diable firewalld.service //關閉防火牆開機啓動
systemctl status firewalld //查看防火牆狀態
2.3 修改hosts列表
- 分別修改四個節點上的 /etc/hosts 文件,添加以下內容,四個節點都一樣
- 添加完後使用 ping [hostname] 測試配置是否成功
vi /etc/hosts
192.168.36.134 hadoop01
192.168.36.135 hadoop02
192.168.36.136 hadoop03
192.168.36.138 hadoop04
ping hadoop04
2.4 配置時鐘同步
- 使四個節點的時鐘同步
//查看時間命令
[lan@hadoop01 ~]$ date
- 如果時間不同步,可通過以下命令同步網絡時間
//同步網絡時間
ntpdate time.nuri.net
2.5 配置免祕鑰登錄
- lan用戶登錄hadoop01節點,執行以下指令生成一對祕鑰
[lan@hadoop01 ~]$ ssh-keygen -t rsa
//回車——回車——回車
- 登錄hadoop02節點生成祕鑰,並傳給hadoop01節點
[lan@hadoop02 ~]$ ssh-keygen -t rsa
[lan@hadoop02 ~]$ scp ~/.ssh/id_rsa.pub lan@hadoop01:~/.ssh/id_rsa.pub02
- 登錄hadoop03節點生成祕鑰,並傳給hadoop01節點
[lan@hadoop03 ~]$ ssh-keygen -t rsa
[lan@hadoop03 ~]$ scp ~/.ssh/id_rsa.pub lan@hadoop01:~/.ssh/id_rsa.pub03
- 登錄hadoop04節點生成祕鑰,並傳給hadoop01節點
[lan@hadoop04 ~]$ ssh-keygen -t rsa
[lan@hadoop04 ~]$ scp ~/.ssh/id_rsa.pub lan@hadoop01:~/.ssh/id_rsa.pub04
- 登錄hadoop01節點,組合所有公鑰
- 注意修改文件權限
[lan@hadoop01 ~]$ cd ~/.ssh
[lan@hadoop01 .ssh]$ cat id_rsa.pub >> authorized_keys
[lan@hadoop01 .ssh]$ cat id_rsa.pub02 >> authorized_keys
[lan@hadoop01 .ssh]$ cat id_rsa.pub03 >> authorized_keys
[lan@hadoop01 .ssh]$ cat id_rsa.pub04 >> authorized_keys
[lan@hadoop01 .ssh]$ chmod 600 authorized_keys //修改文件權限
- 在hadoop01上分發祕鑰文件給其他節點
[lan@hadoop01 .ssh]$ scp ~/.ssh/authorized_keys lan@hadoop02:~/.ssh/
[lan@hadoop01 .ssh]$ scp ~/.ssh/authorized_keys lan@hadoop03:~/.ssh/
[lan@hadoop01 .ssh]$ scp ~/.ssh/authorized_keys lan@hadoop04:~/.ssh/
//測試免密
ssh hadoop02
- 注:以上所有登錄節點、傳輸文件過程都需要輸入對應節點lan用戶的登錄祕鑰
- 到此,免密配置成功,所有節點都可以相互之間免密登錄
2.6 安裝jdk
-
注:因爲hadoop所有組件都是在JVM環境中運行,所以在安裝其他組件之前必須首先安裝JDK
-
jdk版本推薦安裝1.8,可去官網自行下載
-
下載後上傳至服務器用戶家目錄下
-
解壓,將JDK文件解壓,放到/usr/java/目錄下,使用root用戶
[root@hadoop01 ~]# mkdir /usr/java/
[root@hadoop01 ~]# mv /home/lan/jdk-8u144-linux-x64.tar.gz /usr/java
[root@hadoop01 ~]# cd /usr/java/
[root@hadoop01 ~]# tar -zxvf jdk-8u144-linux-x64.tar.gz
- 配置環境變量,使用lan用戶
[lan@hadoop01 ~]$ vim .bash_profile
- 添加以下內容
export JAVA_HOME=/usr/java/jdk1.8.0_144
export PATH=$JAVA_HOME/bin:$PATH
- 測試
[lan@hadoop01 ~]$ java -version
java version "1.8.0_144"
Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)
- 其他節點也要按如上步驟進行安裝配置
3 安裝其他組件
3.1 安裝zookeeper
- 注:zookeeper安裝只在hadoop01進行配置,其他節點通過hadoop01分發已配置好的安裝目錄進行配置,這樣操作可以避免重複
- 解壓軟件包,將zookeeper-3.4.6.tar.gz解壓縮
[lan@hadoop01 ~]$ tar -zxvf zookeeper-3.4.6.tar.gz
- 修改配置文件
- 修改zookeeper配置文件/home/lan/zookeeper-3.4.6/conf/zoo_sample.cfg重命名爲zoo.cfg
- 進入到conf目錄下,執行:
[lan@hadoop01 ~]$ mv zoo_sample.cfg zoo.cfg
- 修改zoo.cfg添加如下內容
[lan@hadoop01 ~]$ vi zoo.cfg
server.1=hadoop01:2888:3888
server.2=hadoop02:2888:3888
server.3=hadoop04:2888:3888
- 創建相關目錄
- 創建/tmp/zookeeper目錄,並在此目錄下創建myid文件
[lan@hadoop01 ~]$ mkdir /tmp/zookeeper
[lan@hadoop01 ~]$ cd /tmp/zookeeper
[lan@hadoop01 ~]$ vim myid
- 在文件中寫入數字
1
- 分發zookeeper軟件包
[lan@hadoop01 ~]$ scp -r ~/zookeeper-3.4.6 lan@hadoop02:~/
[lan@hadoop01 ~]$ scp -r ~/zookeeper-3.4.6 lan@hadoop04:~/
- 修改myid文件
ssh lan@hadoop02
mkdir /tmp/zookeeper/
vim /tmp/zookeeper/myid
//修改文件中數字爲2
2
ssh lan@hadoop04
mkdir /tmp/zookeeper/
vim /tmp/zookeeper/myid
//修改文件中數字爲3
3
- 分別在hadoop01、hadoop02、hadoop04節點上修改環境變量,添加以下行
export ZOOKEEPER_HOME=/home/lan/zookeeper-3.4.6
export PATH=$ZOOKEEPER_HOME/bin:$PATH
- 啓動zookeeper
- 分別在hadoop01、hadoop02、hadoop04上執行,這裏以hadoop01做示例
[lan@hadoop01 ~]$ zkServer.sh start
- 查看進程
[lan@hadoop01 ~]$ jps
17683 QuorumPeerMain
17701 Jps
- 在三個節點上都啓動了zookeeper後,可查看zookeeper狀態
[lan@hadoop01 ~]$ zkServer.sh status
JMX enabled by default
Using config: /home/lan/zookeeper-3.4.6/bin/../conf/zoo.cfg
Mode: follower
3.2 安裝hadoop
- hadoop部分的配置分爲兩部分hdfs和yarn
3.2.1 HDFS
- 上傳並解壓安裝包(這裏只在hadoop01進行,其他節點由hadoop01節點進行分發配置)
- 將hadoop-2.7.7.tar.gz安裝包上傳至服務器
- 解壓安裝包
[lan@hadoop01 ~]$ tar -zxvf hadoop-2.7.7.tar.gz
- 修改配置文件
- 修改core-site.xml
[lan@hadoop01 ~]$ vim ~/hadoop-2.7.7/etc/hadoop/core-site.xml
- 修改爲以下內容
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://beh</value>
<final>false</final>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/lan/hadoopdata</value>
<final>false</final>
</property>
<!-- 設置zookeeper參與選舉的節點 -->
<property>
<name>ha.zookeeper.quorum</name>
<value>hadoop01:2181,hadoop02:2181,hadoop04:2181</value>
<final>false</final>
</property>
</configuration>
- 修改hdfs-site.xml
[lan@hadoop01 ~]$ vim ~/hadoop-2.7.7/etc/hadoop/hdfs-site.xml
- 修改爲以下內容
<configuration>
<property>
<name>dfs.nameservices</name>
<value>beh</value>
<final>false</final>
</property>
<!-- 指定雙namenode各自的代稱 -->
<property>
<name>dfs.ha.namenodes.beh</name>
<value>nn1,nn2</value>
<final>false</final>
</property>
<!-- 指定nn1節點地址 -->
<property>
<name>dfs.namenode.rpc-address.beh.nn1</name>
<value>hadoop01:9000</value>
<final>false</final>
</property>
<property>
<name>dfs.namenode.http-address.beh.nn1</name>
<value>hadoop01:50070</value>
<final>false</final>
</property>
<!-- 指定nn2節點地址 -->
<property>
<name>dfs.namenode.rpc-address.beh.nn2</name>
<value>hadoop02:9000</value>
<final>false</final>
</property>
<property>
<name>dfs.namenode.http-address.beh.nn2</name>
<value>hadoop02:50070</value>
<final>false</final>
</property>
<!-- 指定zk集羣地址 -->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://hadoop01:8485;hadoop02:8485;hadoop04:8485/beh</value>
<final>false</final>
</property>
<!-- 開啓故障自動轉換 -->
<property>
<name>dfs.ha.automatic-failover.enabled.beh</name>
<value>true</value>
<final>false</final>
</property>
<!-- 導入高可用所需jar包 -->
<property>
<name>dfs.client.failover.proxy.provider.beh</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
<final>false</final>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/home/lan/metadata/journal</value>
<final>false</final>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
<final>false</final>
</property>
<!-- 配置免密私鑰所在目錄 -->
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/lan/.ssh/id_rsa</value>
<final>true</final>
</property>
<!-- 配置datanode節點個數 -->
<property>
<name>dfs.replication</name>
<value>2</value>
<final>false</final>
</property>
</configuration>
- 修改slaves
[lan@hadoop01 ~]$ vim ~/hadoop-2.7.7/etc/hadoop/slaves
- 修改爲以下內容(datanode對應的節點主機名)
hadoop03
hadoop04
3.2.2 YARN
- 修改mapred-site.xml(此處要將mapred-site.xml.template文件轉存爲mapred-site.xml,執行cp mapred-site.xml.template mapred-site.xml)
[lan@hadoop01 ~]$ vim ~/hadoop-2.7.7/etc/hadoop/mapred-site.xml
- 修改爲以下內容
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
- 修改yarn-site.xml
[lan@hadoop01 ~]$ vim ~/hadoop-2.7.7/etc/hadoop/yarn-site.xml
- 修改爲以下內容
<configuration>
<!-- 開啓RM高可用 -->
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<!-- 指定RM的cluster id -->
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>beh</value>
</property>
<!-- 指定RM的名字 -->
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<!-- 分別指定RM的地址 -->
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>hadoop01</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>hadoop02</value>
</property>
<!-- 指定zk集羣地址 -->
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>hadoop01:2181,hadoop02:2181,hadoop04:2181</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!--開啓故障自動切換-->
<property>
<name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.address.rm1</name>
<value>hadoop01:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address.rm1</name>
<value>hadoop01:8030</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>hadoop01:8088</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm1</name>
<value>hadoop01:8031</value>
</property>
<property>
<name>yarn.resourcemanager.address.rm2</name>
<value>hadoop02:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address.rm2</name>
<value>hadoop02:8030</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>hadoop02:8088</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm2</name>
<value>hadoop02:8031</value>
</property>
</configuration>
- 修改環境變量文件(hadoop-env.sh,yarn-env.sh)
[lan@hadoop01 ~]$ vim ~/hadoop-2.7.7/etc/hadoop/hadoop-env.sh
[lan@hadoop01 ~]$ vim ~/hadoop-2.7.7/etc/hadoop/yarn-env.sh
- 修改爲以下內容
export JAVA_HOME=/usr/java/jdk1.8.8_144
3.2.3 分發配置文件
- **將以上配置分發至所有節點
[lan@hadoop01 ~]$ scp -r ~/hadoop-2.7.7 lan@hadoop02:~/
[lan@hadoop01 ~]$ scp -r ~/hadoop-2.7.7 lan@hadoop03:~/
[lan@hadoop01 ~]$ scp -r ~/hadoop-2.7.7 lan@hadoop04:~/
3.2.4 啓動HDFS
- 啓動journalnode(進程名:JournalNode),哪些節點配置了此項,就在哪些節點啓動(hadoop01,hadoop02,hadoop04),命令如下
hadoop-daemon.sh start journalnode
- 格式化zookeeper,在hadoop01上執行
[lan@hadoop01 ~]$ hdfs zkfs -formatZK
- 對hadoop01節點進行格式化和啓動namenode(進程名:NameNode)
//格式化
[lan@hadoop01 ~]$ hdfs name -format
//啓動namenode
[lan@hadoop01 ~]$ hadoop-daemon.sh start namenode
- 對hadoop02節點進行格式化和啓動
//格式化
[lan@hadoop02 ~]$ hdfs namenode -bootstrapStandby
//啓動namenode
[lan@hadoop02 ~]$ hadoop-daemon.sh start namenode
- 在hadoop01和hadoop02上啓動zkfc服務(zkfc服務進程名:DFSZKFailoverController):此時hadoop01和hadoop02就會有一個節點變爲active狀態
[lan@hadoop01 ~]$ hadoop-daemon.sh start zkfc
[lan@hadoop02 ~]$ hadoop-daemon.sh start zkfc
- 啓動datanode(進程名:DataNode):在hadoop01上執行
[lan@hadoop01 ~]$ hadoop-daemons.sh start datanode
3.2.5 驗證是否成功
- 打開瀏覽器,訪問192.168.36.134:50070以及192.168.36.135:50070,將會看到兩個namenode,一個是active而另一個是standby
- 然後kill掉其中active的namenode進程,另一個standby的namenode將會自動轉換爲active狀態
- 殺掉hadoop01節點上的namenode進程
[lan@hadoop01 ~]$ jps
17683 QuorumPeerMain
19364 Jps
18487 JournalNode
19179 DFSZKFailoverController
18669 NameNode
[lan@hadoop01 ~]$ kill -9 18669
[lan@hadoop01 ~]$ jps
17683 QuorumPeerMain
18487 JournalNode
19417 Jps
19179 DFSZKFailoverController
- 原來爲standby的節點轉換爲了active
[lan@hadoop01 ~]$ hadoop-daemon.sh start namenode
starting namenode, logging to /home/lan/hadoop-2.7.7/logs/hadoop-lan-namenode-hadoop01.out
[lan@hadoop01 ~]$ jps
19473 NameNode
17683 QuorumPeerMain
18487 JournalNode
19562 Jps
19179 DFSZKFailoverController
3.2.6 啓動yarn
- 在hadoop01上啓動(此腳本將會啓動hadoop01上的resourcemanager及所有nodemanager進程)
[lan@hadoop01 ~]$ start-yarn.sh
- 在hadoop02上啓動resourcemanager
[lan@hadoop02 ~]$ yarn-daemon.sh start resourcemanager
3.2.7 驗證是否成功
- 打開瀏覽器,訪問192.168.36.134:8088以及192.168.36.135:8088
3.3 關閉集羣
- 關閉yarn
//先在hadoop01節點關閉
[lan@hadoop01 ~]$ stop-yarn.sh
//再在hadoop02節點關閉resourcemanager
[lan@hadoop02 ~]$ yarn-daemon.sh stop resourcemanager
- 關閉HDFS
- 在hadoop01下執行
[lan@hadoop01 ~]$ stop-dfs.sh
- 關閉zkfc
- 分別在hadoop01和hadoop02節點關閉
[lan@hadoop01 ~]$ hadoop-daemon.sh stop zkfc
[lan@hadoop02 ~]$ hadoop-daemon.sh stop zkfc
- 關閉zookeeper
- 分別在hadoop01、hadoop02和hadoop04節點關閉
[lan@hadoop01 ~]$ zkServer.sh stop
[lan@hadoop02 ~]$ zkServer.sh stop
[lan@hadoop04 ~]$ zkServer.sh stop