1.安裝環境,使用vmvare四臺虛擬機,操作系統centOS6.5-x86_64:
(1)ip:192.168.169.10 , hostname:master #做爲hadoop集羣的master 配置:1核1G
(2)ip:192.168.169.11 , hostname:slave1 #做爲hadoop集羣的slave1 配置:1核2G
(3)ip:192.168.169.12 , hostname:slave2 #做爲hadoop集羣的slave2 配置:1核2G
(4)ip:192.168.169.13 , hostname:slave3 #做爲hadoop集羣的slave3 配置:1核2G
2.首先配置每臺主機的網絡之間能互相ping通,具體步驟如下:
(1)配置主機名和網管:vi /etc/sysconfig/network
(2)配置主機網絡:vi /etc/sysconfig/network-scripts/ifcfg-eth0
(3)配置主機ip和域名映射: vi /etc/hosts,可以是使用scp命令將本機配置好的hosts文件複製到其他節點。
3.配置SSH免密碼登陸:
(1)在master主機上生成祕鑰:ssh-keygen -t rsa -P ''
(2)把id_rsa.pub追加到授權的key裏面去:cat
~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
(3)修改authorized_keys文件權限:chmod 600 ~/.ssh/authorized_keys
(4)修改/etc/ssh/sshd_config文件中的配置內容:
RSAAuthentication yes
PubkeyAuthentication yes
AuthorizedKeysFile .ssh/authorized_keys
(5)重啓SSH服務:service sshd restart
(7)將/etc/ssh/sshd_config複製到集羣中的其它主機上:scp
/etc/ssh/sshd_config [email protected]:/etc/ssh/
(8)在其他主機上把id_rsa.pub追加到授權的key裏面去:cat
~/id_rsa.pub >> ~/.ssh/authorized_keys
(9)修改authorized_keys文件權限:chmod 600 ~/.ssh/authorized_keys
(10)刪除複製的公鑰:rm -rf ~/id_rsa.pub
(11)重啓SSH服務:service sshd restart
注:6~7步爲在其他主機上的操作
4.安裝JDK:安裝目錄:/usr/java/,在每個節點上都安裝上jdk
(1)執行安裝命令:rpm -ivh jdk-7u25-linux-x64.rpm
(2)配置環境變量:將下面jdk的安裝目錄添加/etc/profile文件末尾,保存後執行 source /etc/profile命令重新編譯profile文件,並執行
java -version命令查看是否生效。
配置變量:
export JAVA_HOME=/usr/java/jdk1.7.0_25
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
5. 安裝配置hadoop2.2.0,直接在apache官網下載二進制版:
(1)hadoop目錄規劃:
hadoop安裝目錄:
/usr/hadoop/hadoop-2.2.0
在每個節點上創建數據存儲目錄,用來存放集羣數據:
/usr/hadoop/storage/hadoop-2.2.0/hdfs
在主節點master上創建目錄,用來存放文件系統元數據:
/usr/hadoop/storage/hadoop-2.2.0/hdfs/name
在每個從節點上創建目錄,用來存放真正的數據:
/usr/hadoop/storage/hadoop-2.2.0/hdfs/data
在每個從節點上的日誌目錄爲:
/usr/hadoop/storage/hadoop-2.2.0/logs
在每個從節點上的臨時目錄爲:
/usr/hadoop/storage/hadoop-2.2.0/tmp
(2)在每個節點上創建相關目錄(/usr/hadoop/storage/hadoop-2.2.0/hdfs/name)除外,只在master節點創建:
mkdir -p /usr/hadoop/
mkdir -p /usr/hadoop/storage/hadoop-2.2.0/hdfs
mkdir
-p /usr/hadoop/storage/hadoop-2.2.0/hdfs/name
mkdir -p /usr/hadoop/storage/hadoop-2.2.0/hdfs/data
mkdir -p /usr/hadoop/storage/hadoop-2.2.0/logs
mkdir -p /usr/hadoop/storage/hadoop-2.2.0/tmp
(3)解壓hadoop-2.2.0.tar.gz:tar -zvxf hadoop-2.2.0.tar.gz
(4)在每個節點上都添加hadoop根目錄到環境變量中:
export HADOOP_HOME=/usr/hadoop/hadoop-2.2.0
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_LOG_DIR=/usr/hadoop/storage/hadoop-2.2.0/logs
export YARN_LOG_DIR=$HADOOP_LOG_DIR
(5)每個節點上重新編譯profile文件:source
/etc/profile
(6)需要配置的文件爲:core-site.xml,hdfs-site.xml,yarn-site.xml,mapred-site.xml,hadoop-env.sh,yarn-env.sh, mapred-env.sh
core-site.xml:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000/</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/usr/hadoop/storage/hadoop-2.2.0/tmp/hadoop-${user.name}</value>
</property>
</configuration>
hdfs-site.xml:
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/hadoop/storage/hadoop-2.2.0/hdfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/hadoop/storage/hadoop-2.2.0/hdfs/data</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
yarn-site.xml:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8088</value>
</property>
</configuration>
mapred-site.xml:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>512</value>
</property>
<property>
<name>mapreduce.map.cpu.vcores</name>
<value>1</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>512</value>
</property>
<property>
<name>mapreduce.reduce.shuffle.parallelcopies</name>
<value>5</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
</property>
</configuration>
hadoop-env.sh、yarn-env.sh、mapred-env.sh:修改其中JAVA_HOME,如果被註釋掉,去掉註釋。
export JAVA_HOME=/usr/java/jdk1.7.0_25
修改slaves文件:每行添加slave節點域名
(7)將/usr/hadoop/hadoop-2.2.0目錄遠程同步到其他節點:
scp -r /usr/hadoop/hadoop-2.2.0/
root@slave1:/usr/hadoop/
scp -r /usr/hadoop/hadoop-2.2.0/ root@slave2:/usr/hadoop/
scp -r /usr/hadoop/hadoop-2.2.0/ root@slave3:/usr/hadoop/
(8)關閉各個節點的防火牆:
chkconfig iptables off
chkconfig ip6tables off
(9)格式namenode:hadoop
namenode -format
(10)啓動hdfs集羣:start-dfs.sh
如果出現【Starting
secondary namenodes [0.0.0.0]
The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.
RSA key fingerprint is 45:83:90:4f:7f:6b:d7:1b:22:be:70:4a:38:67:92:c3.
Are you sure you want to continue connecting (yes/no)? yes】,輸入yes,可以正常啓動集羣。
(11)啓動yarn:start-yarn.sh
(12)使用jps命令查看當前所有java進程
(12)訪問web界面查看namenode狀態:
http://192.168.169.10:50070/
(13)關閉yarn:stop-yarn.sh
(14)關閉hdfs:stop-dfs.sh
這裏我只是爲了學習hadoop親自搭建測試hadoop集羣,不保證集羣的高可用和完全正確。