hadoop 2.2.0 cluster setup-linux

Apache hadoop2.2.0作爲新一代hadoop版本，突破原來hadoop1.x的集羣機器最多4000臺的限制，並有效解決以前常遇到的OOM（內存溢出）問題，其創新的計算框架YARN被稱爲hadoop的操作系統，不僅兼容原有的mapreduce計算模型而且還可支持其他並行計算模型。

假設我們要搭建2個節點的hadoop2.2.0的集羣。一個節點主機名爲master，作爲集羣master兼slave角色運行namenode, datanode, secondarynamenode,resourcemanager和node manager 等daemon進程；另一個節點名爲slave1作爲集羣slave角色運行datanode 和nodemanager進程.

1. 獲取hadoop二進制包或者源碼包： http://mirrors.cnnic.cn/apache/hadoop/common/hadoop-2.2.0/ , 使用 hadoop-2.2.0.tar.gz 或者 hadoop-2.2.0-src.tar.gz

2. 在每臺機器上建立同名用戶，比如hduser；並安裝java (1.6 or 1.7)

解壓軟件包，比如到目錄 /home/hduser/hadoop-2.2.0

如果要編譯源代碼，請參考以下3,4,5步驟

----------------for compile source file-----------------------

3. 下載 protocbuf2.5.0 : https://code.google.com/p/protobuf/downloads/list, 下載最新的 maven : http://maven.apache.org/download.cgi

編譯protocbuf 2.5.0:

tar -xvf protobuf-2.5.0.tar.gz
cd protobuf-2.5.0
./configure --prefix=/opt/protoc/
make && make install

4. 安裝必須的軟件包

如果是rmp linux:

yum install gcc
yum intall gcc-c++
yum install make
yum install cmake
yum install openssl-devel
yum install ncurses-devel

如果是Debian linux:

sudo apt-get install gcc
sudo apt-get install intall g++
sudo apt-get install make

      4. sudo apt-get install cmake
      5. sudo apt-get install libssl-dev
      6. sudo apt-get install libncurses5-dev

5.開始編譯hadoop-2.2.0源碼：

mvn clean install –DskipTests

mvn package -Pdist,native -DskipTests -Dtar

6 如果你已經得到了編譯好的包（比如hadoop-2.2.0.tar.gz），以下爲安裝配置過程。

用hduser登錄到master機器：

6.1 安裝ssh

For example on Ubuntu Linux:

$ sudo apt-get install ssh
$ sudo apt-get install rsync

Now check that you can ssh to the localhost without a passphrase:
$ ssh localhost

If you cannot ssh to localhost without a passphrase, execute the following commands:
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

then can ssh from master to slaves: scp ~/.ssh/authorized_keys slave1:/home/hduser/.ssh/

6.2 設置 JAVA_HOME in hadoop-env.sh and yarn-env.sh inhadoop_home/etc/hadoop

6.3 編輯 core-site.xml, hdfs-site.xml, mapred-site.xml,yarn-site.xml inhadoop_home/etc/hadoop

A sample core-site.xml:

<configuration>
                       <property>
                              <name>fs.defaultFS</name>
                              <value>hdfs://master:9000</value>
                     </property>
                       <property>
                              <name>hadoop.tmp.dir</name>
                              <value>/home/hduser/temp</value>
                       </property>
</configuration>

A sample hdfs-site.xml :

<configuration>
<property>
        <name>dfs.replication</name>
        <value>2</value>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>/home/hduser/dfs/name</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>/home/hduser/dfs/data</value>
    </property>

</configuration>

A sample mapred-site.xml :

<configuration>

<property>
     <name>mapreduce.framework.name</name>
      <value>yarn</value>
</property>
<property>
      <name>yarn.app.mapreduce.am.staging-dir</name>
       <value>/home/hduser/temp/hadoop-yarn/staging</value>
   </property>

</configuration>

A sample yarn-site.xml :

<configuration>

    <property>
       <name>yarn.nodemanager.aux-services</name>
       <value>mapreduce_shuffle</value>
    </property>

     <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>master</value>
      </property>

<property>
    <description>CLASSPATH for YARN applications. A comma-separated list of CLASSPATH entries</description>
    <name>yarn.application.classpath</name>
    <value>
        hadoop_home/etc/hadoop,
       hadoop_home/share/hadoop/common/*,
       hadoop_home/share/hadoop/common/lib/*,
       hadoop_home/share/hadoop/hdfs/*,
       hadoop_home/share/hadoop/hdfs/lib/*,
       hadoop_home/share/hadoop/mapreduce/*,
       hadoop_home/share/hadoop/mapreduce/lib/*,
       hadoop_home/share/hadoop/yarn/*,
       hadoop_home/share/hadoop/yarn/lib/*
    </value>
</property>

</configuration>

6.4 編輯 slaves file in hadoop_home/etc/hadoop ，使其具有以下內容

master

slave1

以上完成後，在master機器以hduser用戶使用scp命令拷貝hadoop-2.2.0目錄及內容到其他機器的同樣路徑：

scp hadoop folder 到各個機器 : scp /home/hduser/hadoop-2.2.0 slave1:/home/hduser/hadoop-2.2.0

7. 格式化hdfs (一般只進行一次，除非hdfs故障 ), 依次執行以下命令

cd /hduser/hadoop-2.2.0/bin/
./hdfs namenode -format

8 啓動、停止hadoop集羣(可多次進行，一般啓動後不停否則Application運行信息會丟失)

[hadoop@master bin]$ cd ../sbin/
[hadoop@master sbin]$ ./start-all.sh

9.驗證:

hdfs WEB界面： http://master:50070

RM(ResourceManager)界面： http://master:8088

10 運行wordcount示例

1）用hdfs dfs -mkdir -p /user/yarn/wordcount/input

hdfs dfs -mkdir -p /user/yarn/wordcount/output 分別建2個目錄

2）穿件一個文本文件test.txt包含如下內容：

hello world

hello hadoop!

再用 hdfs dfs -put test.txt /user/yarn/wordcount/input 將文件上傳到hdfs

3)在hadoop\bin目錄下運行：

yarn jar ../share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount /user/yarn/wordcount/input /user/yarn/wordcount/output

成功後在RM界面看到對應的Application狀態應爲Succeed, 在 /user/yarn/wordcount/output 可看到 part-r-00000

tonyhuang_google_com

發佈了30 篇原創文章 · 獲贊 7 · 訪問量 8萬+

私信關注

hadoop 2.2.0 cluster setup-linux

985 碩士程序員，空窗 4 個月沒有 Offer！

一文搞懂 Spring 循環依賴

賽博鬥地主——使用大語言模型扮演Agent智能體玩牌類遊戲。

VScode右鍵打開(添加到右鍵)

cygwin, minGW and JNI

hadoop集羣搭建-Windows (hadoop cluster on Windows)

GIS知識-常用概念與術語

一個常用的ANT打包腳本

什麼是大數據?

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結