Hadoop2.2.0使用之初體驗

離10月15日Hadoop發佈2.2.0這個穩定版本已經過去2個月了,最近終於抽了點時間搭建了一個3節點的集羣,體驗了一把YARN上如何跑Map/Reduce程序了。

每次搭建Hadoop測試集羣都或多或少的碰到一些問題,幾乎沒有一次是一步到位的了,這次也不例外,也碰到過幾個問題,不過Google很快幫忙解決了。


安裝使用Hadoop2.2.0務必要先搞清除OS是32位還是64位的,由於Apache社區的版本是使用32位的Linux來編譯的,所以如果是使用64位的機器跑Hadoop節點的時候,加載本地庫就會存在問題,需要自行下載源代碼來編輯Hadoop源碼來獲取部署包。如果OS是32位的則簡單多了,直接使用社區版本即可。對於源代碼編輯Hadoop2.2.0的源代,碼,需要準備好maven、protoc2.5.0等環境。具體可以參見:hadoop2.2.0 centos 編譯安裝詳解


這次的機器是機房的3臺備用機器:

OS:64位的CentOS 6.4

【192.168.1.216】  RAM32G/CPU2*4Core/STAT300G        (NameNode/SecondaryNameNode/DataNode/ResourceManager)

【192.168.1.217】  RAM8G/CPU2*4Core/STAT500G          (DataNode/NodeManager)

【192.168.1.218】  RAM8G/CPU2*4Core/STAT500G          (DataNode/NodeManager)

(1)修改hostname

在3臺機器的/etc/hosts文件末尾都加入:

192.168.1.216 lrts216
192.168.1.217 lrts217
192.168.1.218 lrts218

A:修改192.168.1.217 hostname

在192.168.1.216這臺機器上修改hostname,以root用戶來執行:hostname lrts216

同時修改/etc/sysconfig/network文件:

NETWORKING=yes
HOSTNAME=lrts216

B:修改192.168.1.217 hostname

在192.168.1.217這臺機器上修改hostname,以root用戶來執行:hostname lrts217

同時修改/etc/sysconfig/network文件:

NETWORKING=yes
HOSTNAME=lrts217

C:修改192.168.1.218 hostname

在192.168.1.218這臺機器上修改hostname,以root用戶來執行:hostname lrts218

同時修改/etc/sysconfig/network文件:

NETWORKING=yes
HOSTNAME=lrts218

(2)配置SSH無密碼登錄

在3臺機器上創建hadoop用戶組,以及hadoop用戶,家目錄統一爲/home/hadoop

以root用戶分別在3臺機器上執行如下命令:

groupadd hadoop

useradd -g hadoop -d /home/hadoop hadoop

lrts216:

以用戶hadoop登錄lrts216,在/home/hadoop目錄下執行下述命令:

ssh-keygen -t rsa

一路回車下去即可在目錄/home/hadoop/.ssh/下建立兩個文件 id_rsa.pub和id_rsa。

進入/home/hadoop/.ssh目錄,用SCP命令把is_rsa.pub文件複製到lrts217和lrts218上去。

scp id_rsa.pub hadoop@lrts217:/home/hadoop/.ssh/lrts216keys

scp id_rsa.pub hadoop@lrts218:/home/hadoop/.ssh/lrts216keys

當然這裏都少不了要輸入密碼哦

lrts217:

以用戶hadoop登錄lrts217,在/home/hadoop目錄下執行下述命令:

ssh-keygen -t rsa

一路回車下去即可在目錄/home/hadoop/.ssh/下建立兩個文件 id_rsa.pub和id_rsa。

進入/home/hadoop/.ssh目錄,用SCP命令把is_rsa.pub文件複製到lrts216上去。

scp id_rsa.pub hadoop@lrts216:/home/hadoop/.ssh/lrts217keys

當然這裏都少不了要輸入密碼哦

lrts218:

以用戶hadoop登錄lrts218,在/home/hadoop目錄下執行下述命令:

ssh-keygen -t rsa

一路回車下去即可在目錄/home/hadoop/.ssh/下建立兩個文件 id_rsa.pub和id_rsa。

進入/home/hadoop/.ssh目錄,用SCP命令把is_rsa.pub文件複製到lrts216上去。

scp id_rsa.pub hadoop@lrts216:/home/hadoop/.ssh/lrts218keys

當然這裏都少不了要輸入密碼哦

上述方式分別爲lrts216lrts217lrts218機器生成了rsa密鑰,並且把lrts216的id_rsa.pub複製到lrts217/lrts218上去了,而把lrts217lrts218上的id_rsa.pub複製到lrts216上去了,接下來還需要執行如下步驟纔算完事。

lrts216:

以hadoop用戶登錄lrts216,並且進入目錄/home/hadoop/.ssh下,執行如下命令:

cat id_rsa.pub >> authorized_keys
cat lrts217keys >> authorized_keys
cat lrts218keys >> authorized_keys
chmod 644 authorized_keys

lrts217:

以hadoop用戶登錄lrts217,並且進入目錄/home/hadoop/.ssh下,執行如下命令:

cat id_rsa.pub >> authorized_keys
cat lrts216keys >> authorized_keys
chmod 644 authorized_keys

lrts218:

以hadoop用戶登錄lrts218,並且進入目錄/home/hadoop/.ssh下,執行如下命令:

cat id_rsa.pub >> authorized_keys
cat lrts216keys >> authorized_keys
chmod 644 authorized_keys

至此,以用戶hadoop登錄lrts216後即可以無密鑰認證方式訪問lrts217lrts218了,同樣以用戶hadoop登錄lrts217lrts218也可以無密鑰認證方式訪問lrts216

上述的SSH配置非常容易遺漏,遺漏了後續啓動Hadoop必然會報各種hostname找不到的錯誤了,反正是網絡錯誤要優先來排查這個點上配置是否靠譜了。

(3)下載Hadoop2.2.0

從hadoop官網上下載hadoop-2.2.0.tar.gz

進入/home/hadoop/目錄,執行:wget http://mirrors.cnnic.cn/apache/hadoop/common/hadoop-2.2.0/hadoop-2.2.0.tar.gz即可下載

使用tar -zxvf hadoop-2.2.0.tar.gz來解壓縮,解壓縮後的目錄爲/home/hadoop/hadoop-2.2.0

(4)配置JDK(3臺機器相同)

我使用的是jdk-7u45-linux-x64.gz

最終的JAVA_HOME目錄爲/usr/local/jdk1.7.0_45

以root用戶登錄3臺機器,修改/etc/profile文件,在末尾加入:

export JAVA_HOME=/usr/local/jdk1.7.0_45
export PATH=$PATH:$JAVA_HOME/jre/bin:$JAVA_HOME/bin

執行java -version可以看到如下結果則表示成功!如果這個都搞不定,也別玩什麼hadoop了,還早着呢!

[root@lrts216 ~]# java -version
java version "1.7.0_45"
Java(TM) SE Runtime Environment (build 1.7.0_45-b18)
Java HotSpot(TM) 64-Bit Server VM (build 24.45-b08, mixed mode)

其實,到這裏爲止,所有的配置與之前的版本的Hadoop一模一樣的,我又寫一次,無非是覺得每次折騰的時候,都或多或少會碰到問題,所以也就把詳細的過程都記錄下來,權當加強記憶吧了!之前也寫過一篇在Redhat AS6上搭建Hadoop集羣總結,前半部分基本是一致的。

從下面開始與原來的版本相比就有些不同了。

(5)配置環境變量(3臺機器同配置)

分別修改的如下文件/home/hadoop/.bash_profile,在文件末尾加入:

# Hadoop
export HADOOP_PREFIX="/home/hadoop/hadoop-2.2.0"
export PATH=$PATH:$HADOOP_PREFIX/bin
export PATH=$PATH:$HADOOP_PREFIX/sbin
export HADOOP_MAPRED_HOME=${HADOOP_PREFIX}
export HADOOP_COMMON_HOME=${HADOOP_PREFIX}
export HADOOP_HDFS_HOME=${HADOOP_PREFIX}
export YARN_HOME=${HADOOP_PREFIX}
# Native Path
export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_PREFIX}/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_PREFIX/lib/native"


修改如下兩個文件(該修改的修改,該添加的添加)

/home/hadoop/hadoop-2.2.0/etc/hadoop/hadoop-env.sh

/home/hadoop/hadoop-2.2.0/etc/hadoop/yarn-env.sh

原來的內容:

# The java implementation to use.
export JAVA_HOME=${JAVA_HOME}
修改爲:

# The java implementation to use.
export JAVA_HOME=/usr/local/jdk1.7.0_45

追加如下部分:

export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_PREFIX}/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_PREFIX/lib/native"

(6)配置hadoop那些xml文件(3臺機器同配置)

現在提到的配置文件,統一指目錄hadoop-2.2.0/etc/hadoop下的同名文件。

core-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
	<property>
		<name>fs.defaultFS</name>
		<value>hdfs://lrts216:9000</value>
	</property>

	<property>
		<name>io.file.buffer.size</name>
		<value>131072</value>
	</property>

	<property>
		<name>hadoop.tmp.dir</name>
		<value>file:/home/hadoop/hadoop-2.2.0/tmp</value>
		<description>Abase for other temporary directories.</description>
	</property>
</configuration>
core-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
	<property>
		<name>dfs.namenode.secondary.http-address</name>
		<value>lrts218:50090</value>
	</property>

	<property>
		<name>dfs.namenode.name.dir</name>
		<value>file:/home/hadoop/hadoop-2.2.0/dfs/name</value>
	</property>

	<property>
		<name>dfs.datanode.data.dir</name>
		<value>file:/home/hadoop/hadoop-2.2.0/dfs/data</value>
	</property>

	<property>
		<name>dfs.replication</name>
		<value>2</value>
	</property>
</configuration>
yarn-site.xml
<?xml version="1.0"?>
<configuration>
	<property>
		<name>yarn.nodemanager.aux-services</name>
		<value>mapreduce_shuffle</value>
	</property>

	<property>
		<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
		<value>org.apache.hadoop.mapred.ShuffleHandler</value>
	</property>

	<property>
		<name>yarn.resourcemanager.address</name>
		<value>lrts216:8032</value>
	</property>

	<property>
		<name>yarn.resourcemanager.scheduler.address</name>
		<value>lrts216:8030</value>
	</property>

	<property>
		<name>yarn.resourcemanager.resource-tracker.address</name>
		<value>lrts216:8031</value>
	</property>

	<property>
		<name>yarn.resourcemanager.admin.address</name>
		<value>lrts216:8033</value>
	</property>

	<property>
		<name>yarn.resourcemanager.webapp.address</name>
		<value>lrts216:8088</value>
	</property>
</configuration>
slaves

lrts216
lrts217
lrts218
(7)啓動HDFS和YARN

先進入hadoop-2.2.0/bin/目錄執行如下命令:./hdfs namenode -format,完成格式化命名節點。

進入hadoop-2.2.0/sbin/目錄,執行./start-dfs.ssh和./start-yarm.sh來啓動hdfs和yarn。

啓動成功則在lrts216節點上執行jps命令查看Java進程,能看到如下進程則表示啓動成功了。

[hadoop@lrts216 sbin]$ jps
5859 NodeManager
5741 ResourceManager
4163 NameNode
4467 SecondaryNameNode
31747 Jps
4290 DataNode

進入目錄/home/hadoop/hadoop-2.2.0,執行如下命令即可看到hadoop集羣的概況

[hadoop@lrts216 hadoop-2.2.0]$ pwd
/home/hadoop/hadoop-2.2.0
[hadoop@lrts216 hadoop-2.2.0]$ bin/hdfs dfsadmin -report
13/12/21 13:48:56 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Configured Capacity: 710826487808 (662.01 GB)
Present Capacity: 604646686720 (563.12 GB)
DFS Remaining: 604646334464 (563.12 GB)
DFS Used: 352256 (344 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 2 (3 total, 1 dead)

Live datanodes:
Name: 192.168.1.216:50010 (lrts216)
Hostname: lrts216
Decommission Status : Normal
Configured Capacity: 227023720448 (211.43 GB)
DFS Used: 176128 (172 KB)
Non DFS Used: 79267975168 (73.82 GB)
DFS Remaining: 147755569152 (137.61 GB)
DFS Used%: 0.00%
DFS Remaining%: 65.08%
Last contact: Sat Dec 21 13:48:55 CST 2013


Name: 192.168.1.218:50010 (lrts218)
Hostname: lrts218
Decommission Status : Normal
Configured Capacity: 483802767360 (450.58 GB)
DFS Used: 176128 (172 KB)
Non DFS Used: 26911825920 (25.06 GB)
DFS Remaining: 456890765312 (425.51 GB)
DFS Used%: 0.00%
DFS Remaining%: 94.44%
Last contact: Sat Dec 21 13:48:54 CST 2013


Dead datanodes:
Name: 192.168.1.217:50010 (lrts217)
Hostname: lrts217
Decommission Status : Normal
Configured Capacity: 0 (0 B)
DFS Used: 0 (0 B)
Non DFS Used: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used%: 100.00%
DFS Remaining%: 0.00%
Last contact: Sat Dec 21 13:19:59 CST 2013
(8)執行wordcount程序
[hadoop@lrts216 bin]$ hdfs dfs -mkdir -p /lrts/zhangzk
13/12/19 22:20:41 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[hadoop@lrts216 bin]$ hdfs dfs -ls /
13/12/19 22:20:48 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
drwxr-xr-x   - hadoop supergroup          0 2013-12-19 22:20 /lrts
[hadoop@lrts216 bin]$ hdfs dfs -put hdfs.cmd /lrts/zhangzk/
13/12/19 22:21:16 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[hadoop@lrts216 bin]$ ll
total 252
-rwxr--r--. 1 hadoop hadoop 78809 Oct  7 14:38 container-executor
-rwxr--r--. 1 hadoop hadoop  4857 Oct  7 14:38 hadoop
-rwxr--r--. 1 hadoop hadoop  7530 Oct  7 14:38 hadoop.cmd
-rwxr--r--. 1 hadoop hadoop  7954 Oct  7 14:38 hdfs
-rwxr--r--. 1 hadoop hadoop  5138 Oct  7 14:38 hdfs.cmd
-rwxr--r--. 1 hadoop hadoop  4989 Oct  7 14:38 mapred
-rwxr--r--. 1 hadoop hadoop  5560 Oct  7 14:38 mapred.cmd
-rwxr--r--. 1 hadoop hadoop  1776 Oct  7 14:38 rcc
-rwxr--r--. 1 hadoop hadoop 95102 Oct  7 14:38 test-container-executor
-rwxr--r--. 1 hadoop hadoop  8548 Oct  7 14:38 yarn
-rwxr--r--. 1 hadoop hadoop  8322 Oct  7 14:38 yarn.cmd
[hadoop@lrts216 bin]$ hadoop jar /home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount /lrts/zhangzk /lrts/out
13/12/19 22:24:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/12/19 22:24:56 INFO client.RMProxy: Connecting to ResourceManager at lrts216/192.168.1.216:8032
13/12/19 22:24:57 INFO input.FileInputFormat: Total input paths to process : 1
13/12/19 22:24:57 INFO mapreduce.JobSubmitter: number of splits:1
13/12/19 22:24:57 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
13/12/19 22:24:57 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
13/12/19 22:24:57 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
13/12/19 22:24:57 INFO Configuration.deprecation: mapreduce.combine.class is deprecated. Instead, use mapreduce.job.combine.class
13/12/19 22:24:57 INFO Configuration.deprecation: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
13/12/19 22:24:57 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
13/12/19 22:24:57 INFO Configuration.deprecation: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class
13/12/19 22:24:57 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
13/12/19 22:24:57 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
13/12/19 22:24:57 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
13/12/19 22:24:57 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
13/12/19 22:24:57 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
13/12/19 22:24:57 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1387462237922_0001
13/12/19 22:24:58 INFO impl.YarnClientImpl: Submitted application application_1387462237922_0001 to ResourceManager at lrts216/192.168.1.216:8032
13/12/19 22:24:58 INFO mapreduce.Job: The url to track the job: http://lrts216:8088/proxy/application_1387462237922_0001/
13/12/19 22:24:58 INFO mapreduce.Job: Running job: job_1387462237922_0001
13/12/19 22:25:06 INFO mapreduce.Job: Job job_1387462237922_0001 running in uber mode : false
13/12/19 22:25:06 INFO mapreduce.Job:  map 0% reduce 0%
13/12/19 22:25:11 INFO mapreduce.Job:  map 100% reduce 0%
13/12/19 22:25:17 INFO mapreduce.Job:  map 100% reduce 100%
13/12/19 22:25:17 INFO mapreduce.Job: Job job_1387462237922_0001 completed successfully
13/12/19 22:25:18 INFO mapreduce.Job: Counters: 43
        File System Counters
                FILE: Number of bytes read=4715
                FILE: Number of bytes written=167555
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=5244
                HDFS: Number of bytes written=3660
                HDFS: Number of read operations=6
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters 
                Launched map tasks=1
                Launched reduce tasks=1
                Rack-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=2410
                Total time spent by all reduces in occupied slots (ms)=2941
        Map-Reduce Framework
                Map input records=171
                Map output records=523
                Map output bytes=6735
                Map output materialized bytes=4715
                Input split bytes=106
                Combine input records=523
                Combine output records=264
                Reduce input groups=264
                Reduce shuffle bytes=4715
                Reduce input records=264
                Reduce output records=264
                Spilled Records=528
                Shuffled Maps =1
                Failed Shuffles=0
                Merged Map outputs=1
                GC time elapsed (ms)=35
                CPU time spent (ms)=750
                Physical memory (bytes) snapshot=398831616
                Virtual memory (bytes) snapshot=1765736448
                Total committed heap usage (bytes)=354942976
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters 
                Bytes Read=5138
        File Output Format Counters 
                Bytes Written=3660
[hadoop@lrts216 bin]$ hdfs dfs -ls /lrts/out
13/12/19 22:26:17 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
-rw-r--r--   2 hadoop supergroup          0 2013-12-19 22:25 /lrts/out/_SUCCESS
-rw-r--r--   2 hadoop supergroup       3660 2013-12-19 22:25 /lrts/out/part-r-00000
[hadoop@lrts216 bin]$ hdfs dfs -cat /lrts/out/part-r-00000
13/12/19 22:27:30 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
"%1"    2
"%HADOOP_BIN_PATH:~-1%" 1
"--config"      2
"AS     1
"License");     1
"\"     1
%*      2
%1      1
%1,     1

%2      1

安裝參考:

http://www.yongbok.net/blog/how-to-install-hadoop-2-2-0-pseudo-distributed-mode/

http://hadoop.apache.org/docs/r2.2.0/hadoop-project-dist/hadoop-common/ClusterSetup.html

由於我使用的OS是centos 64位的,所以其中出了些問題,需要自行編譯hadoop。

參考資料如下:

http://blog.csdn.net/w13770269691/article/details/16883663

http://blog.csdn.net/bamuta/article/details/13506843

http://blog.csdn.net/lalaguozhe/article/details/10580727

http://hadoop.apache.org/docs/r2.2.0/hadoop-project-dist/hadoop-common/NativeLibraries.html



發佈了94 篇原創文章 · 獲贊 20 · 訪問量 73萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章