1. 準備服務器(我用的是Centos 6.x)
服務器:192.168.0.20 運行服務:active NameNode, ResourceManager
服務器:192.168.0.21 運行服務:standbyNameNode, NodeManager, JournalNode, DataNode
服務器:192.168.0.199 運行服務:NodeManager, JournalNode,DataNode
服務器:192.168.0.186 運行服務:NodeManager, JournalNode, DataNode
2. 服務器時間要進行同步
date -R
yum install ntp
/usr/sbin/ntpdate 要同步服務器的IP(ntpdate time.nist.gov)
3. 關閉防火牆(iptables)
查看防火牆:
/etc/init.d/iptables status
關閉防火牆:
/etc/init.d/iptables stop
4. 修改hostname 修改原因可參考此鏈接:http://blog.csdn.net/shirdrn/article/details/6562292
vim /etc/sysconfig/network
NETWORKING=yes
NETWORKING_IPV6=yes
HOSTNAME=localhost.localdomain
修改/etc/sysconfig/network中HOSTNAME的值爲localhost,或者自己指定的主機名,保證localhost在/etc/hosts文件中映射爲正確的IP地址,然後重新啓動網絡服務:
/etc/rc.d/init.d/network restart
5. 配置 /etc/hosts文件
6. 配置免密碼登陸 免密碼登陸轉自:http://haitao.iteye.com/blog/1744272
ssh配置
需要配置主機hadoop-nn1無密碼登錄主機 hadoop-nn1,hadoop-nn2, hadoop-dn1, hadoop-dn2
先確保所有主機的防火牆處於關閉狀態。
在主機hadoop-nn1上執行如下:
a. $cd ~/.ssh
b. $ssh-keygen -t rsa -然後一直按回車鍵,會生成密鑰保存在.ssh/id_rsa文件中。
c. $cp id_rsa.pub authorized_keys
這步完成後,正常情況下就可以無密碼登錄本機了,即ssh localhost,無需輸入密碼。
d. $scp authorized_keys [email protected]:/home/xiaojin/.ssh -把剛剛產生的authorized_keys文件拷一份到主機其他各個主機上.
e. $chmod 600 authorized_keys 進入主機B的.ssh目錄,改變authorized_keys文件的許可權限。
正常情況下上面幾步執行完成後,從主機A所在機器向主機A、主機B所在機器發起ssh連接,只有在第一次登錄時需要輸入密碼,以後則不需要。但是你這裏用IP第一次登陸和hosts名第一次登陸是不一樣的,如果你hadoop裏面配置的是 hosts名 那在免密碼第一次登陸的時候用hosts名 先登陸一遍,以後就不用輸入密碼。
可能遇到的問題:
a. 進行ssh登錄時,出現:Agent admitted failure to sign using the key .
$ssh-add
b. 如果無任何錯誤提示,可以輸密碼登錄,但就是不能無密碼登錄,在被連接的主機上(如A向B發起ssh連接,則在B上)執行以下幾步:
$chmod o-w ~/
$chmod 700 ~/.ssh
$chmod 600 ~/.ssh/authorized_keys
c.如果執行了第2步,還是不能無密碼登錄,再試試下面幾個
$ps -Af | grep agent 檢查ssh代理是否開啓,如果有開啓的話,kill掉該代理,然後執行下面,重新打開一個ssh代理,如果沒有開啓,直接執行下面:
$ssh-agent 還是不行的話,執行下面,重啓一下ssh服務
$sudo service sshd restart
7. 安裝JDK
vim /etc/profile #修改配置文件
#添加以下內容
#set java environment
JAVA_HOME=/home/xiaojin/jdk1.8.0_73
CLASSPATH=.:$JAVA_HOME/lib.tools.jar
PATH=$JAVA_HOME/bin:$PATH
export JAVA_HOME CLASSPATH PATH
#執行profile文件source /etc/profile
#下載 tar文件
tar -zxvf hadoop-2.7.0.tar.gz
10. hadoop 目錄介紹
bin: hadoop 執行文件
sbin: hadoop 啓動及執行目錄
11. hadoop配置文件修改
#hadoop存放目錄下/etc/hadoop/hadoop-env.sh 修改JDK PATH 其他不用修改
export JAVA_HOME=/home/xiaojin/jdk1.8.0_73
#hadoop/etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://hadoop-nn1:8020</value>
<!-- Hadoop.HA active namenode -->
</property>
</configuration>
#/hadoop/etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<description>The runtime framework for executing MapReduce jobs.</description>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop-nn2:10020</value>
<description>MapReduce JobHistory Server IPC host:port</description>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop-nn2:19888</value>
<description>MapReduce JobHistory Server Web UI host:port</description>
</property>
</configuration>
#hadoop/etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.nameservices</name>
<value>hadoop-server</value>
<description>Comma-sperated list of nameservices</description>
</property>
<property>
<name>dfs.ha.namenodes.hadoop-server</name>
<value>nn1,nn2</value>
<description>The prefix for a given nameservice, contains a comma-sperated list of namenodes for a given nameservice(eg EXAMPLENAMESERVICE)</description>
</property>
<property>
<name>dfs.namenode.rpc-address.hadoop-server.nn1</name>
<value>hadoop-nn1:8020</value>
<description>RPC address for namenode1 of hadoop-test</description>
</property>
<property>
<name>dfs.namenode.rpc-address.hadoop-server.nn2</name>
<value>hadoop-nn2:8020</value>
<description>RPC address for namenode2 of hadoop-test</description>
</property>
<property>
<name>dfs.namenode.http-address.hadoop-server.nn1</name>
<value>hadoop-nn1:50070</value>
<description>The address and the base port where the dfs namenode1 web ui will listion on.</description>
</property>
<property>
<name>dfs.namenode.http-address.hadoop-server.nn2</name>
<value>hadoop-nn2:50070</value>
<description>The address and the base port where the dfs namenode2 web ui will listion on.</description>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///home/xiaojin/hadoop/hdfs/name</value>
<description>Determines where on the local filesystem the DFS name node should store the name table(fsimage). If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.</description>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://hadoop-nn2:8485;hadoop-dn1:8485;hadoop-dn2:8485/hadoop-journal</value>
<description>Determines where on the local filesystem the DFS name node should store the name table(fsimage). If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.</description>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///home/xiaojin/hadoop/hdfs/data</value>
<description>Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. Directories that do not exsit are ignored.</description>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>false</value>
<description>Whether automatic failover is enabled. See the HDFS High Availability documentation for details on automic HA configuration.</description>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/home/xiaojin/hadoop/hdfs/journal/</value>
</property>
</configuration>
#hadoop/etc/hadoop/yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop-nn1</value>
<description>The hostname of the RM.</description>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>${yarn.resourcemanager.hostname}:8032</value>
<description>The address of the appliactions manager interface in the RM.</description>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>${yarn.resourcemanager.hostname}:8030</value>
<description>The address of the scheduler interface.</description>
</property>
<property>
<name>yarn.resoucemanager.webapp.address</name>
<value>${yarn.resourcemanager.hostname}:8088</value>
<description>The http address of the RM web appliaction.</description>
</property>
<property>
<name>yarn.resourcemanager.webapp.https.address</name>
<value>${yarn.resourcemanager.hostname}:8090</value>
<description>The https addresses of the RM web appliaction.</description>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>${yarn.resourcemanager.hostname}:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>${yarn.resourcemanager.hostname}:8033</value>
<description>The address of the RM admin interface</description>
</property>
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
<description>The class to use as the resource scheduler.</description>
</property>
<property>
<name>yarn.scheduler.fair.allocation.file</name>
<value>${yarn.home.dir}/etc/hadoop/fairscheduler.xml</value>
<description>fair-scheduler conf location.</description>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/home/xiaojin/hadoop/yarn/local</value>
<description>List of directories to store localized files in. An application's localized file directory will be found in:${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}.Individual containers' work directories, called container_${contid}, will be subdirectories of this.</description>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
<description>Whether to enable log aggregation</description>
</property>
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/tmp/logs</value>
<description>Where to aggregate logs to.</description>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>30720</value>
<description>Amount of physical memory, in MB, that can be allocated for containers.</description>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>12</value>
<description>Nmuber of CPU cores that can be allocated for containers.</description>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
<description>the valid service name should only contain a-zA-z0-9_ and can not start with numbers.</description>
</property>
</configuration>
#hadoop/etc/hadoop/slaves
hadoop-nn2
hadoop-dn1
hadoop-dn2
#hadoop/etc/hadoop/fairscheduler.xml
<?xml version="1.0" ?>
<allocations>
<queue name="infrastructure">
<minResources>102400 mb, 50 vcores </minResources>
<maxResources>153600 mb, 100vcores </maxResources>
<maxRunningApps>200</maxRunningApps>
<minSharePreemptionTimeout>300</minSharePreemptionTimeout>
<weight>1.0</weight>
<aclSubmitApps>root,xiaojin,yarn,search,hdfs</aclSubmitApps>
</queue>
<queue name="tool">
<minResources>102400 mb, 30 vcores </minResources>
<maxResources>153600 mb, 50 vcores </maxResources>
</queue>
<queue name="sentiment">
<minResources>102400 mb, 30 vcores</minResources>
<maxResources>153500 mb, 50 vcores</maxResources>
</queue>
</allocations>
12.啓動hadoop集羣
啓動Hadoop集羣:
Step1:
在[nn1]上執行啓動JournalNode命令
sbin/hadoop-daemons.sh start journalnode
Step2:
在[nn1]上,對其進行格式化,並啓動;
bin/hdfs namenode -format
sbin/hadoop-daemon.sh start namenode
Step3:
在[nn2]上,同步nn1的元數據信息;
bin/hdfs namenode -bootstrapStandby
Step4:
啓動[nn2]:
sbin/hadoop-daemon.sh start namenode
經過以上四步操作,nn1和nn2均處理standby狀態
step5:
將[nn1]切換爲Active
bin/hdfs haadmin -transitionToActive nn1
Step6:
在[nn1]上,啓動所有datanode
sbin/hadoop-daemons.sh start datanode
啓動Yarn服務:
在[nn1]上,啓動Yarn服務
sbin/start-yarn.sh
關閉Hadoop集羣:
在[nn1]上,輸入以下命令
sbin/stop-dfs.sh
13. 執行 MapReduce 示例程序 bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.0.jar pi 2 100000