這裏使用cloudrea的rpm源,安裝hadoop
環境爲:
192.168.255.132 test01.linuxjcq.com =》master
192.168.255.133 test02.linuxjcq.com =》slave01
192.168.255.134 test03.linuxjcq.com =》slave02
每臺主機中的/etc/hosts文件有以上配置和基本的java環境設置,使用的java包爲openjdk
1. 安裝cloudrea
wget http://archive.cloudera.com/redhat/6/x86_64/cdh/cdh3-repository-1.0-1.noarch.rpm -P /usr/local/src
yum localinstall --nogpgcheck /usr/local/src/cdh3-repository-1.0-1.noarch.rpm
rpm --import http://archive.cloudera.com/redhat/6/x86_64/cdh/RPM-GPG-KEY-cloudera
2. 安裝hadoop包
yum install hadoop-0.20 hadoop-0.20-namenode hadoop-0.20-secondarynamenode hadoop-0.20-datanode hadoop-0.20-jobtracker hadoop-0.20-tasktracker hadoop-0.20-source
將hadoop按照各個部分的功能分爲了
source:hadoop-0.20-source
base:hadoop-0.20
namenode:hadoop-0.20-namenode
secondnamenode:hadoop-0.20-secondarynamenode
jobtracker:hadoop-0.20-jobtracker
tasktracker:hadoop-0.20-tasktracker
同時會默認添加兩個用戶和一個組
hdfs用戶用於操作hdfs文件系統
mapred用戶用於mapreduce工作
這兩個用戶都屬於hadoop組,不存在hadoop用戶。
以上1,2在每一個節點都需進行操作
3. 配置master節點
a. 創建配置
cloudrea配置可以通過alternatives工具
cp -r /etc/hadoop-0.20/conf.empty /etc/hadoop-0.20/conf.my_cluster
複製配置文件
alternatives --display hadoop-0.20-conf
alternatives --install /etc/hadoop-0.20/conf
hadoop-0.20-conf /etc/hadoop-0.20/conf.my_cluster 50
查看配置,並安裝新的配置
alternatives --display hadoop-0.20-conf
hadoop-0.20-conf - status is auto.
link currently points to /etc/hadoop-0.20/conf.my_cluster
/etc/hadoop-0.20/conf.empty - priority 10
/etc/hadoop-0.20/conf.my_cluster - priority 50
Current `best' version is /etc/hadoop-0.20/conf.my_cluster.
確認安裝了新配置
b. 設置java主目錄
- vi hadoop-env.sh
- export JAVA_HOME=/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64
JAVA_HOME爲JAVA的主目錄,可以使用OPENJDK
c. 設置core-site.xml
- vi core-site.xml
- <configuration>
- <property>
- <name>fs.default.name</name>
- <value>hdfs://test01.linuxjcq.com:9000/</value>
- </property>
- </configuration>
使用這個訪問hdfs文件系統
d. 設置hdfs-site.xml
- vi /etc/hadoop/hdfs-site.xml
- <configuration>
- <property>
- <name>dfs.replication</name>
- <value>2</value>
- </property>
- <property>
- <name>dfs.name.dir</name>
- <value>/data/hadoop/hdfs/name</value>
- </property>
- <property>
- <name>dfs.data.dir</name>
- <value>/data/hadoop/hdfs/data</value>
- </property>
- </configuration>
e. 設置mapred-site.xml
- <configuration>
- <property>
- <name>mapred.system.dir</name>
- <value>/mapred/system</value>
- </property>
- <property>
- <name>mapred.local.dir</name>
- <value>/data/hadoop/mapred/local</value>
- </property>
- <property>
- <name>mapred.job.tracker</name>
- <value>test01.linuxjcq.com:9001</value>
- </property>
- </configuration>
f. 設置secondnamenode和datanode
secondnamenode
- vi /etc/hadoop/masters
- test02.linuxjcq.com
datanode
- test02.linuxjcq.com
- test03.linuxjcq.com
g. 創建相應的目錄
創建dfs.name.dir和dfs.data.dir
mkdir -p /data/hadoop/hdfs/{name,data}
創建mapred.local.dir
mkdir -p /data/hadoop/mapred/local
修改dfs.name.dir和dfs.data.dir擁有者爲hdfs,組擁有者爲hadoop,目錄權限爲0700
chown -R hdfs:hadoop /data/hadoop/hdfs/{name,data}
chmod -R 0700 /data/hadoop/hdfs/{name,data}
修改mapred.local.dir擁有者爲mapred,組擁有者爲hadoop,目錄權限爲755
chown -R mapred:hadoop /data/hadoop/mapred/local
chmod -R 0755 /data/hadoop/mapred/local
4. 配置secondnamenode和datanode節點
重複3中的步驟a-f
5. 在master節點上格式化namenode
sudo -u hdfs hadoop namenode -format
6. 啓動節點
master啓動namenode
service hadoop-0.20-namenode start
secondnamenode啓動
service hadoop-0.20-secondnamenode start
啓動各個數據節點
service hadoop-0.20-datanode start
7. 創建hdfs的/tmp目錄和mapred.system.dir
sudo -u hdfs hadoop fs -mkdir /mapred/system
sudo -u hdfs hadoop fs -chown mapred:hadoop /mapred/system
sudo -u hdfs hadoop fs -chmod 700 /mapred/system
mapred.system.dir需要在jobtracker啓動前創建
sudo -u hdfs hadoop dfs -mkdir /tmp
sudo -u hdfs hadoop dfs -chmod -R 1777 /tmp
8. 開啓mapreduce
在datanode節點上執行
service hadoop-0.20-tasktracker start
在namenode節點上啓動jobtracker
service hadoop-0.20-jobtasker start
9. 設置開機啓動
namenode節點:需要啓動的爲namenode何jobtracker,關閉其他的服務
chkconfig hadoop-0.20-namenode on
chkconfig hadoop-0.20-jobtracker on
chkconfig hadoop-0.20-secondarynamenode off
chkconfig hadoop-0.20-tasktracker off
chkconfig hadoop-0.20-datanode off
datanode節點:需要啓動datanode和tasktracker
chkconfig hadoop-0.20-namenode off
chkconfig hadoop-0.20-jobtracker off
chkconfig hadoop-0.20-secondarynamenode off
chkconfig hadoop-0.20-tasktracker on
chkconfig hadoop-0.20-datanode on
secondarynamenode節點:需要啓動secondarynamenode
chkconfig hadoop-0.20-secondarynamenode on
說明:
這些hadoop包作爲獨立的服務啓動,不需要通過ssh,也可以配置ssh,通過使用start-all.sh和stop-all.sh來管理服務。