一.CDH4安裝
在官網https://ccp.cloudera.com/display/CDH4DOC/CDH4+Installation下載針對本機操作系統的CDH4引導包,這裏我的操作系統是redhat5.4
Step 1a: Optionally Add a Repository Key
rpm --import http://archive.cloudera.com/cdh4/redhat/5/x86_64/cdh/RPM-GPG-KEY-cloudera
Step 2: Install CDH4 with MRv1
yum -y install hadoop-0.20-mapreduce-jobtracker
Step 3: Install CDH4 with YARN
yum -y install hadoop-yarn-resourcemanager
yum -y install hadoop-hdfs-namenode
yum -y install hadoop-hdfs-secondarynamenode
yum -y install hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce
yum -y install hadoop-mapreduce-historyserver hadoop-yarn-proxyserver
yum -y install hadoop-client
另:請實現裝好jdk,Postgresql
二.CDH4配置
1.配置網絡主機
(1).配置網絡主機
爲確保主機之間可以互信,/etc/hosts內容,/etc/sysconfig/network
注意跟ip地址保持一直
(2).複製hadoop配置
cp -r /etc/hadoop/conf.empty /etc/hadoop/conf.my_cluster
(3).自定義配置文件
/etc/hadoop/conf/core-site.xml
fs.default.name(老版的,已棄用,但仍然兼容可用) or fs.defaultFS指定namenode的文件系統
例:
<property>
<name>fs.defaultFS</name>
<value>hdfs://namenode-host.company.com/</value>
</property>
/etc/hadoop/conf/hdfs-site.xml
dfs.permissions.superusergroup指定的UNIX組包含用戶,將被視爲由HDFS的超級用戶
例:
<property>
<name>dfs.permissions.superusergroup</name>
<value>hadoop</value>
</property>
(4).配置本地存儲目錄
①./etc/hadoop/conf/hdfs-site.xml
namenode:
dfs.name.dir(老版的,已棄用,但仍然兼容可用)或dfs.namenode.name.dir此屬性指定的目錄,其中NameNode的存儲元數據和編輯日誌。Cloudera的建議指定至少兩個目錄,其中之一是位於一個NFS掛載點
例:
<property>
<name>dfs.namenode.name.dir</name>
<value>/data/1/dfs/nn,/nfsmount/dfs/nn</value>
</property>
datanode:
dfs.data.dir(老版的,已棄用,但仍然兼容可用)或dfs.datanode.data.dir此屬性指定目錄下的DataNode塊存儲。Cloudera的建議配置一個獨立的磁盤,掛載到上面
例:
<property>
<name>dfs.datanode.data.dir</name>
<value>/data/1/dfs/dn,/data/2/dfs/dn,/data/3/dfs/dn</value>
</property>
②.創建上面所用到的目錄
mkdir -p /data/1/dfs/nn /nfsmount/dfs/nn
mkdir -p /data/1/dfs/dn /data/2/dfs/dn /data/3/dfs/dn /data/4/dfs/dn
chown -R hdfs:hdfs /data/1/dfs/nn /nfsmount/dfs/nn /data/1/dfs/dn /data/2/dfs/dn /data/3/dfs/dn /data/4/dfs/dn
③.最後正確的權限是
dfs.name.dir or or dfs.namenode.name.dir | hdfs:hdfs | drwx------
-----------------------------------------------------------------
dfs.data.dir or dfs.datanode.data.dir | hdfs:hdfs | drwx------
④.注:Hadoop守護進程會自動爲你設置正確的權限dfs.data.dir或dfs.datanode.data.dir。但在dfs.name.dir或dfs.namenode.name.dir的情況下,權限是當前不正確的設置爲默認的文件系統,通常是drwxr - XR的-X(755)。使用命令將dfs.name.dir或dfs.namenode.name.dir目錄權限設置爲drwx ------
chmod 700 /data/1/dfs/nn /nfsmount/dfs/nn
chmod go-rx /data/1/dfs/nn /nfsmount/dfs/nn
(5).格式化namenode
service hadoop-hdfs-namenode init
(6).配置遠程NameNode的存儲目錄
mount -t nfs -o tcp,soft,intr,timeo=10,retrans=10, <server>:<export> <mount_point>
如果針對的是一個HA高可用集羣的話,則
mount -t nfs -o tcp,soft,intr,timeo=50,retrans=12, <server>:<export> <mount_point>
2.在集羣上部署mapreduce的MRv1集羣
(1).Step 1: Configuring Properties for MRv1 Clusters
/etc/hadoop/conf/mapred-site.xml
mapred.job.tracker指定JobTracker的RPC服務器的主機名和(可選)端口<HOST>:<PORT>,指定的必須是主機而不是ip地址
例:
<property>
<name>mapred.job.tracker</name>
<value>jobtracker-host.company.com:8021</value>
</property>
(2).Step 2: Configure Local Storage Directories for Use by MRv1 Daemons
/etc/hadoop/conf/mapred-site.xml
mapred.local.dir指定用於存放臨時數據和中間文件的目錄
例:
<property>
<name>mapred.local.dir</name>
<value>/data/1/mapred/local,/data/2/mapred/local,/data/3/mapred/local</value>
</property>
創建這幾個目錄
mkdir -p /data/1/mapred/local /data/2/mapred/local /data/3/mapred/local /data/4/mapred/local
配置屬主屬組
chown -R mapred:hadoop /data/1/mapred/local /data/2/mapred/local /data/3/mapred/local /data/4/mapred/local
(3).Step 3: Configure a Health Check Script for DataNode Processes
健康檢查,這裏提供一個官方的腳本
#!/bin/bash
if ! jps | grep -q DataNode ; then
echo ERROR: datanode not up
fi
(4).Step 4: Deploy your Custom Configuration to your Entire Cluster
設置每個節點的配置文件
alternatives --set hadoop-conf /etc/hadoop/conf.my_cluster
(5).Step 5: Start HDFS
for service in /etc/init.d/hadoop-hdfs-*
> do
> sudo $service start
> done
(6).Step 6: Create the HDFS /tmp Directory
sudo -u hdfs hadoop fs -mkdir /tmp
sudo -u hdfs hadoop fs -chmod -R 1777 /tmp
注意:這是使用本地文件系統HDFS,這是根的hadoop.tmp.dir
(7).Step 7: Create MapReduce /var directories
sudo -u hdfs hadoop fs -mkdir /var
sudo -u hdfs hadoop fs -mkdir /var/lib
sudo -u hdfs hadoop fs -mkdir /var/lib/hadoop-hdfs
sudo -u hdfs hadoop fs -mkdir /var/lib/hadoop-hdfs/cache
sudo -u hdfs hadoop fs -mkdir /var/lib/hadoop-hdfs/cache/mapred
sudo -u hdfs hadoop fs -mkdir /var/lib/hadoop-hdfs/cache/mapred/mapred
sudo -u hdfs hadoop fs -mkdir /var/lib/hadoop-hdfs/cache/mapred/mapred/staging
sudo -u hdfs hadoop fs -chmod 1777 /var/lib/hadoop-hdfs/cache/mapred/mapred/staging
sudo -u hdfs hadoop fs -chown -R mapred /var/lib/hadoop-hdfs/cache/mapred
(8).Step 8: Verify the HDFS File Structure
檢查hdfs文件結構
sudo -u hdfs hadoop fs -ls -R /
(9).Step 9: Create and Configure the mapred.system.dir Directory in HDFS
①.sudo -u hdfs hadoop fs -mkdir /mapred/system
sudo -u hdfs hadoop fs -chown mapred:hadoop /mapred/system
②.正確的權限
mapred.system.dir | mapred:hadoop | drwx------ 1
/ | hdfs:hadoop | drwxr-xr-x
(10).Step 10: Start MapReduce
在TaskTracker系統
sudo service hadoop-0.20-mapreduce-tasktracker start
在JobTracker系統
sudo service hadoop-0.20-mapreduce-jobtracker start
(11).Step 11: Create a Home Directory for each MapReduce User
創建每個MapReduce用戶的主目錄
sudo -u hdfs hadoop fs -mkdir /user/<user>
sudo -u hdfs hadoop fs -chown <user> /user/<user>