今天我們來實際搭建一下Hadoop 2.2.0版,實戰環境爲目前主流服務器操作系統RedHat6.2系統,本次環境搭建時,各類介質均來自互聯網,在搭建環境之前,請提前準備好各類介質。
一、 環境規劃
功能 | Hostname | IP地址 |
Namenode | Master | 192.168.200.2 |
Datanode | Slave1 | 192.168.200.3 |
Datanode | Slave2 | 192.168.200.4 |
Datanode | Slave3 | 192.168.200.5 |
Datanode | Slave4 | 192.168.200.6 |
軟件 | 版本 |
操作系統 | RedHat 6.2-64 |
Hadoop | Hadoop 2.2.0 |
Jdk | Jdk 1.7-linux |
二、基礎環境配置
1. 安裝操作系統並進行基本配置
規劃好服務器用途後,對服務器進行系統安裝,並配置網絡。
此處省略
(1)對操作系統安裝完成後,關閉所有節點的防火牆服務和selinux服務。
service iptablesstop
chkconfigiptables off
cat/etc/selinux/config
# This filecontrols the state of SELinux on the system.
# SELINUX= cantake one of these three values:
# enforcing - SELinux security policy isenforced.
# permissive - SELinux prints warningsinstead of enforcing.
# disabled - No SELinux policy is loaded.
SELINUX=disabled
# SELINUXTYPE=can take one of these two values:
# targeted - Targeted processes areprotected,
# mls - Multi Level Security protection.
SELINUXTYPE=targeted
(2)複製hadoop軟件,jdk軟件到服務器中。
[root@masterhome]# ls
jdk-7u67-linux-x64.rpm hadoop-2.2.0.tar.gz
(3)修改各個服務器主機名和網絡
cat/etc/sysconfig/network
NETWORKING=yes
HOSTNAME=slave2
cat/etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
HWADDR=f2:85:cd:9a:30:0d
NM_CONTROLLED=yes
ONBOOT=yes
IPADDR=192.168.200.4
BOOTPROTO=none
NETMASK=255.255.255.0
TYPE=Ethernet
GATEWAY=192.168.200.254
IPV6INIT=no
USERCTL=no
(4)在各服務器上配置/etc/hosts文件
127.0.0.1 localhost localhost.localdomain localhost4localhost4.localdomain4
::1 localhost localhost.localdomainlocalhost6 localhost6.localdomain6
192.168.200.2 master
192.168.200.3 slave1
192.168.200.4 slave2
192.168.200.5 slave3
192.168.200.6 slave4
2. 創建用戶
一般我們不會經常使用root用戶運行hadoop,所以要創建一個平常運行和管理Hadoop的用戶;
master和slave節點機都要創建相同的用戶和用戶組,即在所有集羣服務器上都要建hdtest用戶和用戶組;
使用以下命令創建用戶
useradd hdtest
password hdtest
把hadoop-2.2.0.tar.gz拷貝到hdtest用戶下,並修改所屬組。
3. 安裝jdk包
此次使用的jdk1.7,從官網上下載jdk1.7-linux ,複製到每臺服務器上進行安裝。
使用root用戶安裝
rpm -ivh jdk-7u67-linux-x64.rpm
4. 配置環境變量
本次使用的是hdtest用戶安裝hadoop,故需要對hdtest用戶進行配置。
需要在master和slave所有節點上配置環境變量
4.1 配置java環境變量
[root@master ~]# find / -name java
………………
/usr/java/jdk1.7.0_67/bin/java
……………………
[root@master home]# su - hdtest
[hdtest@master ~]$ cat .bash_profile
# .bash_profile
…………
PATH=$PATH:$HOME/bin
export PATH
export JAVA_HOME=/usr/java/jdk1.7.0_67
export PATH=$JAVA_HOME/bin:$PATH
exportCLASSPATH=$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:./
export HADOOP_HOME=/home/hdtest/hadoop-2.2.0
export PATH=$PATH:$HADOOP_HOME/bin/
exportJAVA_LIBRARY_PATH=/home/hdtest/hadoop-2.2.0/lib/native
4.2 配置用戶互信
在所有節點使用hdtest用戶生成祕鑰
[[email protected]]$ ssh-keygen -t rsa
[hdtest@slave1 .ssh]$ ssh-keygen -t rsa
[hdtest@slave2 .ssh]$ ssh-keygen -t rsa
[hdtest@slave3 .ssh]$ ssh-keygen -t rsa
[hdtest@slave4 .ssh]$ ssh-keygen -t rsa
[hdtest@slave2 .ssh]$ ll
total 16
-rw------- 1 hdtest hdtest 1675 Sep 4 14:53 id_rsa
-rw-r--r-- 1 hdtest hdtest 395 Sep 4 14:53 id_rsa.pub
-rw-r--r-- 1 hdtest hdtest 783 Sep 4 14:58 known_hosts
各節點生成公鑰複製到其中一臺機器,並進行合併。
[hdtest@slave1 .ssh]$ scp id_rsa.pub192.168.200.2:/home/hdtest/.ssh/slave1.pub
[hdtest@slave2 .ssh]$ scp id_rsa.pub192.168.200.2:/home/hdtest/.ssh/slave2.pub
[hdtest@slave3 .ssh]$ scp id_rsa.pub192.168.200.2:/home/hdtest/.ssh/slave3.pub
[hdtest@slave4 .ssh]$ scp id_rsa.pub192.168.200.2:/home/hdtest/.ssh/slave4.pub
[hdtest@master .ssh]$ cat *.pub >>authorized_keys
把master上生成的authorized文件分別拷貝到每臺機器上。
scp authorized_keys slave1:/home/hdtest/.ssh/
scp authorized_keys slave2:/home/hdtest/.ssh/
scp authorized_keys slave3:/home/hdtest/.ssh/
scp authorized_keys slave4:/home/hdtest/.ssh/
在所有節點修改文件權限
[hdtest@master ~]$ chmod 700 .ssh/
[hdtest@master .ssh]$ chmod 600authorized_keys
以上步驟完成後,進行測試
[hdtest@master .ssh]$ ssh slave1
Last login: Thu Sep 4 15:58:39 2014 from master
[hdtest@slave1 ~]$ ssh slave3
Last login: Thu Sep 4 15:58:42 2014 from master
[hdtest@slave3 ~]$
使用ssh登陸各服務器,不用輸入密碼,證明配置完成。
三、 安裝hadoop
在安裝hadoop之前,需要新建幾個目錄
[hdtest@master ~]$ pwd
/home/hdtest
mkdir dfs/name -p
mkdir dfs/data -p
mkdir mapred/local -p
mkdir mapred/system
1. 修改配置文件
每臺機器服務器都要配置,且都是一樣的,配置完一臺其他的只需要拷貝,每臺
機上的core-site.xml和mapred-site.xml都是配master服務器的hostname,因爲都是配
置hadoop的入口
[hdtest@master hadoop]$ pwd
/home/hdtest/hadoop-2.2.0/etc/hadoop
(1)core-site.xml配置文件
[hdtest@master hadoop]$ cat core-site.xml
<?xml version="1.0"encoding="UTF-8"?>
<?xml-stylesheettype="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the"License");
youmay not use this file except in compliance with the License.
Youmay obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS"BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
Seethe License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific propertyoverrides in this file. -->
<configuration>
<property>
<name>io,native.lib.available</name>
<value>true</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
<final>true</final>
</property>
</configuration>
(2)hdfs-site.xml配置文件
[hdtest@master hadoop]$ cat hdfs-site.xml
<?xml version="1.0"encoding="UTF-8"?>
<?xml-stylesheettype="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the"License");
youmay not use this file except in compliance with the License.
Youmay obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS"BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
Seethe License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific propertyoverrides in this file. -->
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hdtest/dfs/name</value>
<description>Determines where on the local filesystemthe DFS name node should store the name table.If this is a comma-delimited listof directories,then name table is replicated in all of the directories,forredundancy.</description>
<final>true</final>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hdtest/dfs/data</value>
<description>Determines where on the local filesystemthe DFS name node should store the name table.If this is a comma-delimited listof directories,then name table is replicated in all of the directories,forredundancy.</description>
<final>true</final>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
<description>副本數量</description>
</property>
<property>
<name>dfs.permission</name>
<value>false</value>
</property>
</configuration>
(3)mapred-site.xml配置文件
[hdtest@master hadoop]$ cat mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheettype="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the"License");
youmay not use this file except in compliance with the License.
Youmay obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS"BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
Seethe License for the specific language governing permissions and
limitationsunder the License. See accompanying LICENSE file.
-->
<!-- Put site-specific propertyoverrides in this file. -->
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>hdfs://master:9001</value>
<final>true</final>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>1536</value>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx1024M</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>3072</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx2560M</value>
</property>
<property>
<name>mapreduce.task.io.sort.mb</name>
<value>512</value>
</property>
<property>
<name>mapreduce.task.io.sort.factor</name>
<value>100</value>
</property>
<property>
<name>mapreduce.reduce.shuffle.parallelcopies</name>
<value>50</value>
</property>
<property>
<name>mapred.system.dir</name>
<value>file:/home/hdtest/mapred/system</value>
<final>true</final>
</property>
<property>
<name>mapred.local.dir</name>
<value>file:/home/hdtest/mapred/local</value>
</property>
</configuration>
(4)yarn-site.xml配置文件
[hdtest@master hadoop]$ cat yarn-site.xml
<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the"License");
youmay not use this file except in compliance with the License.
Youmay obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS"BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
Seethe License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<property>
<name>yarn.resourcemanager.address</name>
<value>
<spanstyle="font-family:Arial,Helvetica,sans-serif">master</span>
<spanstyle="font-family:Arial,Helvetica,sans-serif">:8080</span>
</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8081</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8082</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<!-- Site specificYARN configuration properties -->
</configuration>
(5)環境變量配置文件
修改hadoop-env.sh,vi yarn-env.sh,mapred-env.sh文件
修改以下路徑
export JAVA_HOME=/usr/java/jdk1.7.0_67
2. Hadoop集羣配置文件
只需要配置namemode節點機,這裏的HDM01即做namenode也兼datanode,一般情況
namenode要求獨立機器,namemode不兼datanode
[hdtest@master hadoop]$ pwd
/home/hdtest/hadoop-2.2.0/etc/hadoop
[hdtest@master hadoop]$ cat masters
192.168.200.2
[hdtest@master hadoop]$ cat slaves
192.168.200.3
192.168.200.4
192.168.200.5
192.168.200.6
3.複製hadoop包
以上配置完成後,需要把hadoop目錄分發到各slave節點上。
[hdtest@master ~]$ scp -r hadoop-2.2.0slave1:/home/hdtest/
[hdtest@master ~]$ scp -r hadoop-2.2.0slave2:/home/hdtest/
[hdtest@master ~]$ scp -r hadoop-2.2.0slave3:/home/hdtest/
[hdtest@master ~]$ scp -r hadoop-2.2.0slave4:/home/hdtest/
4.格式化namenode
[hdtest@master bin]$ pwd
在master節點運行以下命令格式化
/home/hdtest/hadoop-2.2.0/bin
[hdtest@master bin]$ ./hadoop namenode –format
出現以下字符表示格式化成功。
重新format時,系統提示如下:
Re-format filesystem in /home/hadoop/tmp/dfs/name ? (Y or N) 必須輸入大寫Y,輸入小寫y不會報輸入錯誤,但format出錯。
4. 啓動hadoop服務
使用以下命令啓動hadoop服務,只需在namenode節點啓動。
[hdtest@master sbin]$ pwd
/home/hdtest/hadoop-2.2.0/sbin
[hdtest@master sbin]$ ./start-all.sh (stop-all.sh停止服務)
出現以下字符表示啓動成功,並可以使用jps命令進行驗證。
5. 驗證測試
1. 使用命令驗證
(1)查看端口是否啓動
[hdtest@master ~]$ netstat -ntpl
(2)使用hadoop dfsadmin –report查看集羣狀態
[hdtest@master sbin]$ hadoop dfsadmin-report
DEPRECATED: Use of this script to executehdfs command is deprecated.
Instead use the hdfs command for it.
Java HotSpot(TM) 64-Bit Server VM warning:You have loaded library /home/hdtest/hadoop-2.2.0/lib/native/libhadoop.so.1.0.0which might have disabled stack guard. The VM will try to fix the stack guardnow.
It's highly recommended that you fix thelibrary with 'execstack -c <libfile>', or link it with '-z noexecstack'.
14/09/05 10:48:23 WARNutil.NativeCodeLoader: Unable to load native-hadoop library for yourplatform... using builtin-java classes where applicable
Configured Capacity: 167811284992 (156.29GB)
Present Capacity: 137947226112 (128.47 GB)
DFS Remaining: 137947127808 (128.47 GB)
DFS Used: 98304 (96 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Datanodes available: 4 (4 total, 0 dead)
Live datanodes:
Name: 192.168.200.5:50010 (slave3)
Hostname: slave3
Decommission Status : Normal
Configured Capacity: 41952821248 (39.07 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 7465213952 (6.95 GB)
DFS Remaining: 34487582720 (32.12 GB)
DFS Used%: 0.00%
DFS Remaining%: 82.21%
Last contact: Fri Sep 05 10:48:23 CST 2014
Name: 192.168.200.3:50010 (slave1)
Hostname: slave1
Decommission Status : Normal
Configured Capacity: 41952821248 (39.07 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 7465467904 (6.95 GB)
DFS Remaining: 34487328768 (32.12 GB)
DFS Used%: 0.00%
DFS Remaining%: 82.21%
Last contact: Fri Sep 05 10:48:24 CST 2014
Name: 192.168.200.6:50010 (slave4)
Hostname: slave4
Decommission Status : Normal
Configured Capacity: 41952821248 (39.07 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 7467925504 (6.96 GB)
DFS Remaining: 34484871168 (32.12 GB)
DFS Used%: 0.00%
DFS Remaining%: 82.20%
Last contact: Fri Sep 05 10:48:24 CST 2014
Name: 192.168.200.4:50010 (slave2)
Hostname: slave2
Decommission Status : Normal
Configured Capacity: 41952821248 (39.07 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 7465451520 (6.95 GB)
DFS Remaining: 34487345152 (32.12 GB)
DFS Used%: 0.00%
DFS Remaining%: 82.21%
Last contact: Fri Sep 05 10:48:22 CST 2014
2. 瀏覽器訪問
通過瀏覽器訪問以下地址http://192.168.200.2:50070/
訪問以下地址http://192.168.200.2:8088/cluster