win10下設置虛擬機配置hadoop-yarn單機僞分佈環境
本文以CentOS6.5爲主,虛擬機以VirtualBox,hadoop版本爲2.6.2:
一、設置ssh及網絡
1. 配置ssh免密碼登錄
ssh-keygen -t rsa
cat id_rsa.pub >> authorized_keys
2.修改host主機配置
linux通過這個文件知道某個ip對應於某個主機名,如比方說google的ip是10.23.56.238,那麼可以在這個文件的最後加上一行:
10.23.56.238 google.com
vi /etc/hosts
二、配置hadoop
1.下載並安裝Hadoop
#mkdir -p /opt/yarn
#cd /opt/yarn
#tar xvzf hadoop-2.5.2.tar.gz
2.設置JAVA_HOME
本文以內置的openJDK爲例,
#echo "export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/jre" > /etc/profile.d/java.sh
#source /etc/profile.d/java.sh
3.創建用戶和用戶組
#groupadd hadoop
#useradd -g hadoop yarn
#useradd -g hadoop hdfs
#useradd -g hadoop mapred
4.創建數據和日誌目錄
Hadoop需要不同權限的數據和日誌目錄,
#mkdir -p /var/data/hadoop/hdfs/nn
#mkdir -p /var/data/hadoop/hdfs/snn
#mkdir -p /var/data/hadoop/hdfs/dn
#chown hdfs:hadoop /var/data/hadoop/hdfs -R
#mkdir -p /var/log/hadoop/yarn
#chown yarn:hadoop /var/log/hadoop/yarn -R
進入YARN安裝目錄
#cd /opt/yarn/hadoop-2.5.2
#mkdir logs
#chmod g+w logs
chown yarn:hadoop . -R
5.配置core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.http.staticuser.user</name>
<value>hdfs</value>
</property>
</configuration>
6.配置hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/var/data/hadoop/hdfs/nn</value>
</property>
<property>
<name>fs.checkpoint.dir</name>
<value>file:/var/data/hadoop/hdfs/snn</value>
</property>
<property>
<name>fs.checkpoint.edits.dir</name>
<value>file:/var/data/hadoop/hdfs/snn</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/var/data/hadoop/hdfs/dn</value>
</property>
</configuration>
7.配置mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
8配置yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
9.調整Java堆大小
安裝Hadoop時將使用環境變量來決定每個Hadoop進程的堆大小。etc/hadoop/*-env.sh。
編輯文件etc/hadoop/hadoop-env.sh
export HADOOP_HEAPSIZE="500"
export HADOOP_NAMENODE_INIT_HEAPSIZE="500"
然後編輯mapred-env.sh
JAVA_HEAP_MAX=-Xmx500m
YARN_HEAPSIZE=500
10.格式化HDFS
切換到bin目錄
#su -hdfs
#cd /opt/yarn/hadoop-2.5.2/bin
#./hdfs namenode -format
11.啓動HDFS服務
#cd ../sbin
#./hadoop-daemon.sh start namenode
starting namenode,logging to /opt/yarn/hadoop-2.5.2/logs/hadoop-hdfs-namenode-limulus.out
#./hadoop-daemon.sh start secondarynamenode
starting namenode,logging to /opt/yarn/hadoop-2.5.2/logs/hadoop-hdfs-secondarynamenode-limulus.out
#./hadoop-daemon.sh start datanode
starting datanode,logging to /opt/yarn/hadoop-2.5.2/logs/hadoop-hdfs-datanode-limulus.out
停止hadoop服務
#./hadoop-daemon.sh stop datanode
12.啓動YARN服務
$exit
logout
#su - yarn
$cd /opt/yarn/hadoop-2.5.2/sbin
$./yarn-daemon.sh start resourcemanager
starting resourcemanager,logging to /opt/yarn/hadoop-2.5.2/logs/hadoop-hdfs-resourcemanager-limulus.out
$./yarn-daemon.sh start nodemanager
starting nodemanager,logging to /opt/yarn/hadoop-2.5.2/logs/hadoop-hdfs-nodemanager-limulus.out
停止服務
#./yarn-daemon.sh stop nodemanager
13.通過Web接口驗證正在運行的服務
namenode
firefox http://localhost:50070
ResourceManager
firefox http://localhost:8088
三、運行MapReduce示例程序
#su hdfs
$cd /opt/yarn/hadoop-2.5.2/bin
$export YARN_EXAMPLES=/opt/yarn/hadoop-2.5.2/share/hadoop/mapreduce
$./yarn jar $YARN_EXAMPLES/hadoop-mapreduce-examples-2.5.2.jar pi 16 1000
……
Estimated value of Pi is 3.142500000000000000000
所有例子的詳細清單
./yarn $YARN_EXAMPLES/hadoop-mapreduce-examples-2.5.2.jar