搭建分佈式hadoop

搭建hadoop-2.5.0.tar.gz,下載地址:http://pan.baidu.com/s/1pKWe1L5

  • 集羣規劃:三臺服務器:hadoop-senior.orange.com、hadoop-senior.banana.com和hadoop-senior.pear.com
主機名
banana resource datanode nodemanager
orange namenode datanode nodemanager
pear secondarynamenode datanode nodemanager historyserver
  • 改變時區,orange下操作:
    # mv /etc/localtime /etc/localtime_bak
    # ln -s /usr/share/zoneinfo/Asia/Shanghai  /etc/localtime

  • 同步時間並保存至硬件:
    # ntpdate asia.pool.ntp.org
    # hwclock -w

  • 修改ntp.conf
    # vi /etc/ntp.conf
                    //刪除前面的#,把網段改成自己的網段
    		restrict 192.168.181.0 mask 255.255.255.0 nomodify notrap
    		//下面三行添加#
    		#server 0.centos.pool.ntp.org
    		#server 1.centos.pool.ntp.org
    		#server 2.centos.pool.ntp.org
    		//刪除前面的#
    		server  127.127.1.0     # local clock
    		fudge   127.127.1.0 stratum 10
# service ntpd restart
#  chkconfig ntpd on

  • 設置slave的時間同步,banana和pear下操作:
# crontab -e
*/10 * * * * /usr/sbin/ntpdate hadoop-senior.orange.com
# service crond restart
  • 配置SSH無密鑰登錄,現在orange上操作:
# ssh-keygen -t rsa
此操作全部回車
# ssh-copy-id [email protected]
第一次輸入yes
第二次輸入密碼
同理執行:
# ssh-copy-id [email protected]
# ssh-copy-id [email protected]
按照上述方式在banana和pear上操作。
  • 配置Hadoop分佈式環境:
修改.sh文件中的JAVA_HOME,如下三個文件:
[root@hadoop-senior hadoop]# grep -n "/usr/java/jdk1.7.0_79" *
hadoop-env.sh:25:export JAVA_HOME=/usr/java/jdk1.7.0_79
mapred-env.sh:16:export JAVA_HOME=/usr/java/jdk1.7.0_79
yarn-env.sh:23:export JAVA_HOME=/usr/java/jdk1.7.0_79
修改.xml配置文件(參考集羣規劃),如下四個文件:
core-site.xml:
<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://hadoop-senior.orange.com:8020</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/opt/modules/hadoop-2.5.0/data</value>
    </property>
</configuration>
hdfs-site.xml:
<configuration>
    <property>
        <name>dfs.replication</name>
        <value>3</value>
    </property>
    <property>
        <name>dfs.namenode.http-address</name>
        <value>hadoop-senior.orange.com:50070</value>
    </property>
	<property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>hadoop-senior.pear.com:50090</value>
    </property>
    <property>
        <name>dfs.permissions.enabled</name>
        <value>true</value>
    </property>
</configuration>
yarn-site.xml:
<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>hadoop-senior.banana.com</value>
    </property>
    <property>
        <name>yarn.log-aggregation-enable</name>
        <value>true</value>
    </property>
    <property>
        <name>yarn.log-aggregation.retain-seconds</name>
        <value>86400</value>
    </property>
</configuration>
mapred-site.xml:
<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>hadoop-senior.pear.com:10020</value>
    </property>
	<property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>hadoop-senior.pear.com:19888</value>
    </property>
</configuration>
在slaves文件中添加:
hadoop-senior.orange.com
hadoop-senior.banana.com
hadoop-senior.pear.com
  • 將hadoop安裝目錄拷貝到banana、pear:
# scp -r hadoop-2.5.0/ [email protected]:/opt/modules/
# scp -r hadoop-2.5.0/ [email protected]:/opt/modules/


格式化:
# /opt/modules/hadoop-2.5.0/bin/hdfs namenode -format
根據集羣規劃啓動響應的服務:
在orange下執行:
# /opt/modules/hadoop-2.5.0/bin/start-dfs.sh 
在banana下執行:
# /opt/modules/hadoop-2.5.0/bin/start-yarn.sh

在pear下執行:
# /opt/modules/hadoop-2.5.0/bin/mr-jobhistory-daemon.sh  start historyserver

  • 測試,上傳文件,執行wordcount:
# /opt/modules/hadoop-2.5.0/bin/hdfs dfs -put sort.txt /input/
# /opt/modules/hadoop-2.5.0/bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar wordcount /input/sort.txt /output
# /opt/modules/hadoop-2.5.0/bin/hdfs dfs -text /output/part*
[beifeng@hadoop-senior hadoop-2.5.0]$ ./bin/hdfs dfs -text /output/part*
abcs	1
ddfs	1
hadoop	3
word	1

  • 評測HDFS:
寫評測:
$ bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.5.0-tests.jar  TestDFSIO -write -nrFile 10 -fileSize 1000
寫10個,每個1000MB的文件,文件默認被寫到hdfs的/benchmarks/TestDFSIO
讀評測:
$ bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.5.0-tests.jar  TestDFSIO -read -nrFiles 10 -fileSize 1000
清理測試數據:
$ bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.5.0-tests.jar TestDFSIO -clean

評測mapreduce
1、生產若干隨機數據
$ bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar teragen 100 /input/
2、運行terasort對數據庫進行排序
$ bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar terasort /input /output
3、運行tera  驗證排序過的teragen
$ bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar teravalidate /output /output2



發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章