1、環境準備
虛擬機:VMware Workstation
系統:CentOS-5.5-x86_64-bin
JDK文件:jdk-7u80-linux-x64.gz
Hadoop文件:hadoop-2.7.3.tar.gz
這裏搭建3個節點,一個Master節點,兩個salve節點,各個節點需要在同一局域網
2、環境設置(三個節點做相同的配置)
1)修改主機名稱及設置各節點IP與主機名映射
# vi /etc/hosts #127.0.0.1 localhost bigdata01 localhost4 localhost4.localdomain4 #::1 localhost bigdata01 localhost6 localhost6.localdomain6 192.168.88.128 bigdata01 192.168.88.129 bigdata02 192.168.88.131 bigdata03
# vi /etc/sysconfig/network NETWORKING=yes HOSTNAME=bigdata03
2)關閉防火牆
即時生效關閉
# service iptables stop
重啓後依然生效
重啓後檢查防火牆# chkconfig iptables off
# service iptables status iptables: Firewall is not running.
3)三個節點免密碼互聯
a、生成公鑰密鑰,會產生兩個文件:id_rsa id_rsa.pub
#ssh-keygen -t rsa
b、將公鑰複製到另兩個節點的對應路徑下,取名爲authorized_keys#scp ./id_rsa.pub [email protected]:/root/.ssh/authorized_keys
c、每個節點生成的公鑰都複製到其他兩個節點的authorized_keys文件中,並複製到到本機的authorized_keys文件中,這樣才能達到三個節點互聯
4)給hadoop設置環境變量
vi /root/.bash_profile export HADOOP_HOME=/opt/hadoop/hadoop-2.7.3 export PATH=$PATH:$HADOOP_HOME/bin
3、安裝JDK
1)解壓jdk-7u80-linux-x64.gz包
# tar -zxvf jdk-7u80-linux-i586.gz # mv jdk1.7.0_80 /usr/java
2)配置環境變量
# vi /root/.bash_profile export JAVA_HOME=/usr/java/jdk1.7.0_80 PATH=$JAVA_HOME/bin:$PATH
3)使環境變量生效# source ~/.bash_profile
4)測試
# java -version Java(TM) SE Runtime Environment (build 1.7.0_80-b15) Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode)
4、Hadoop配置文件修改,在$hadoop-2.7.3/etc/hadoop文件夾下
配置文件修改清單
core-site.xml
hdfs-site.xml
mapred-site.xml
yarn-site.xml
hadoop-env.sh
yarn-env.sh
slaves修改配置文件前準備
在hadoop-2.7.3目錄下創建以下文件# mkdir tmp # mkdir hdfs
在hdfs文件創建name和data文件
# cd hdfs # mkdir name # mkdir data
1)core-site.xml<configuration> <property> <name>hadoop.tmp.dir</name> <value>file:/opt/hadoop-2.7.3/tmp</value> <description>Abase for other temporary directories.</description> </property> <property> <name>fs.defaultFS</name> <value>hdfs://bigdata01:9000</value> </property> </configuration>
2)hdfs-site.xml<configuration> <property> <name>dfs.name.dir</name> <value>file:/opt/hadoop-2.7.3/hdfs/name</value> <description>namenode上存儲hdfs名字空間元數據 </description> </property> <property> <name>dfs.data.dir</name> <value>file:/opt/hadoop-2.7.3/hdfs/data</value> <description>datanode上數據塊的物理存儲位置</description> </property> <property> <name>dfs.replication</name> <value>1</value> <description>副本個數,配置默認是3,應小於datanode機器數量</description> </property> </configuration>
3)mapred-site.xml<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
4)yarn-site.xml<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>bigdata01:8099</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>bigdata01:8031</value> </property> </configuration>
5)hadoop-env.sh#export JAVA_HOME=${JAVA_HOME} export JAVA_HOME=/usr/java/jdk1.7.0_79
6)yarn-env.sh#export JAVA_HOME=/home/y/libexec/jdk1.7.0/ export JAVA_HOME=/usr/java/jdk1.7.0_79
7)slaves 增加namenode節點bigdata02 bigdata03
5、將hadoop-2.7.3整個文件夾copy到其他兩個節點上
# scp -r ./hadoop-2.7.3 root@bigdata02:/opt/hadoop/ # scp -r ./hadoop-2.7.3 root@bigdata03:/opt/hadoop/
6、格式化namenode
# bin/hdfs namenode -format
7、啓動NameNode 和 DataNode 守護進程
# sbin/start-dfs.sh
8、啓動ResourceManager 和 NodeManager 守護進程
# sbin/start-yarn.sh
9、查看啓動的線程
master節點
# /usr/java/jdk1.7.0_80/bin/jps 2764 SecondaryNameNode 2918 ResourceManager 2574 NameNode 10809 Jps
salve節點# /usr/java/jdk1.7.0_80/bin/jps 2559 NodeManager 5805 Jps 2452 DataNode
啓動後有如下線程:
HDFS的守護進程
主節點:Namenode、SecondaryNamenode
從節點:DatanodeYARN的守護進程
主節點:ResourceManager
從節點:NodeManager
瀏覽器訪問測試
1、訪問Hadoop的默認端口號爲50070.使用以下URL來獲得瀏覽器的Hadoop服務
http://localhost:50070/ (localhost 必須寫成主機名稱或者IP)
2、訪問集羣中的所有應用程序的默認端口號爲8088
http://localhost:8088/ (localhost 必須寫成主機名稱或者IP 端口號是yarn-site.xml裏面配置的yarn.resourcemanager.webapp.address的端口)