Ubuntu系統下Hadoop 2.0.4集羣安裝配置

Hadoop 2已經將HDFS和YARN分開管理,這樣分開管理,可以是HDFS更方便地進行HA或Federation,實現HDFS的線性擴展(Scale out),從而保證HDFS集羣的高可用性。從另一個方面們來說,HDFS可以作爲一個通用的分佈式存儲系統,而爲第三方的分佈式計算框架提供方便,就像類似YARN的計算框架,其他的如,Spark等等。YARN就是MapReduce V2,將原來Hadoop 1.x中的JobTracker拆分爲兩部分:一部分是負責資源的管理(Resource Manager),另一部分負責任務的調度(Scheduler)。

安裝配置

1、目錄結構 

下載hadoop-2.0.4軟件包,解壓縮後,可以看到如下目錄結構:
shirdrn@master:~/cloud/hadoop2/hadoop-2.0.4-alpha$ ls
bin  etc  include  lib  libexec  LICENSE.txt  logs  NOTICE.txt  README.txt  sbin  share
  • etc目錄
HDFS和YARN的配置文件,都存放在etc/hadoop目錄下面,可以多各個文件進行配置:
shirdrn@master:~/cloud/hadoop2/hadoop-2.0.4-alpha$ ls etc/hadoop/
capacity-scheduler.xml      hadoop-metrics.properties  httpfs-site.xml             ssl-client.xml.example
configuration.xsl           hadoop-policy.xml          log4j.properties            ssl-server.xml.example
container-executor.cfg      hdfs-site.xml              mapred-env.sh               yarn-env.sh
core-site.xml               httpfs-env.sh              mapred-queues.xml.template  yarn-site.xml
hadoop-env.sh               httpfs-log4j.properties    mapred-site.xml.template
hadoop-metrics2.properties  httpfs-signature.secret    slaves
  • bin目錄
bin目錄下是用來管理HDFS、YARN的工具,如下所示:
shirdrn@master:~/cloud/hadoop2/hadoop-2.0.4-alpha$ ls bin
container-executor  hadoop  hdfs  mapred  rcc  test-container-executor  yarn
下面,對Hadoop進行配置,Hadoop 2.x已經沒有了mapred-site.xml配置文件,完全由yarn-site.xml取代。

2、HDFS安裝配置

配置etc/hadoop/core-site.xml文件內容:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
     <property>
                <name>fs.defaultFS</name>
                <value>hdfs://master:9000/</value>
                <description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.</description>
        </property>
        <property>
                <name>dfs.replication</name>
                <value>3</value>
        </property>
        <property>
                <name>hadoop.tmp.dir</name>
                <value>/tmp/hadoop-${user.name}</value>
                <description></description>
        </property>
</configuration>
配置etc/hadoop/hdfs-site.xml文件內容:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
     <property>
                <name>dfs.namenode.name.dir</name>
                <value>/home/shirdrn/storage/hadoop2/hdfs/name</value>
                <description>Path on the local filesystem where the NameNode stores the namespace and transactions logs persistently.</description>
        </property>
        <property>
                <name>dfs.datanode.data.dir</name>
                <value>/home/shirdrn/storage/hadoop2/hdfs/data1,/home/shirdrn/storage/hadoop2/hdfs/data2,/home/shirdrn/storage/hadoop2/hdfs/data3</value>
                <description>Comma separated list of paths on the local filesystem of a DataNode where it should store its blocks.</description>
        </property>
     <property>
                <name>hadoop.tmp.dir</name>
                <value>/home/shirdrn/storage/hadoop2/hdfs/tmp/hadoop-${user.name}</value>
                <description>A base for other temporary directories.</description>
        </property>
</configuration>

3、YARN安裝配置

配置etc/hadoop/yarn-site.xml文件內容:
<?xml version="1.0"?>

<configuration>
<property>
    <name>yarn.resourcemanager.resource-tracker.address</name>
    <value>master:8031</value>
    <description>host is the hostname of the resource manager and
    port is the port on which the NodeManagers contact the Resource Manager.
    </description>
  </property>

  <property>
    <name>yarn.resourcemanager.scheduler.address</name>
    <value>master:8030</value>
    <description>host is the hostname of the resourcemanager and port is the port
    on which the Applications in the cluster talk to the Resource Manager.
    </description>
  </property>

  <property>
    <name>yarn.resourcemanager.scheduler.class</name>
    <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
    <description>In case you do not want to use the default scheduler</description>
  </property>

  <property>
    <name>yarn.resourcemanager.address</name>
    <value>master:8032</value>
    <description>the host is the hostname of the ResourceManager and the port is the port on
    which the clients can talk to the Resource Manager. </description>
  </property>

  <property>
    <name>yarn.nodemanager.local-dirs</name>
    <value>${hadoop.tmp.dir}/nodemanager/local</value>
    <description>the local directories used by the nodemanager</description>
  </property>

  <property>
    <name>yarn.nodemanager.address</name>
    <value>0.0.0.0:8034</value>
    <description>the nodemanagers bind to this port</description>
  </property> 

  <property>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>10240</value>
    <description>the amount of memory on the NodeManager in GB</description>
  </property>

  <property>
    <name>yarn.nodemanager.remote-app-log-dir</name>
    <value>${hadoop.tmp.dir}/nodemanager/remote</value>
    <description>directory on hdfs where the application logs are moved to </description>
  </property>

   <property>
    <name>yarn.nodemanager.log-dirs</name>
    <value>${hadoop.tmp.dir}/nodemanager/logs</value>
    <description>the directories used by Nodemanagers as log directories</description>
  </property>

  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce.shuffle</value>
    <description>shuffle service that needs to be set for Map Reduce to run </description>
  </property>
</configuration>


啓動集羣

  • 啓動HDFS集羣
首先,需要格式化HDFS,執行如下命令:
shirdrn@master:~/cloud/hadoop2/hadoop-2.0.4-alpha$ bin/hdfs namenode -format
如果格式化正常,日誌中不會出現異常信息,可以繼續啓動集羣相關服務。
啓動HDFS集羣,執行如下命令:
shirdrn@master:~/cloud/hadoop2/hadoop-2.0.4-alpha$ sbin/start-dfs.sh
可以在master結點上看到如下幾個進程:
shirdrn@master:~/cloud/hadoop2/hadoop-2.0.4-alpha$ jps
17238 Jps
16845 NameNode
17128 SecondaryNameNode
而在slave結點上看到如下進程:
shirdrn@slave01:~/programs$ jps
4865 Jps
4753 DataNode

shirdrn@slave02:~/programs$ jps
4867 DataNode
4971 Jps
  • 啓動YARN集羣
如果配置完成以後,啓動YARN集羣非常容易,只需要執行幾個腳本就可以。
啓動ResourceManager,執行如下命令:
shirdrn@master:~/cloud/hadoop2/hadoop-2.0.4-alpha$ sbin/yarn-daemon.sh start resourcemanager
可以看到,多了一個ResourceManager進程:
shirdrn@master:~/cloud/hadoop2/hadoop-2.0.4-alpha$ jps
16845 NameNode
17128 SecondaryNameNode
17490 Jps
17284 ResourceManager
然後,在slave結點上啓動NodeManager進程,執行如下命令:
shirdrn@slave01:~/programs/hadoop2/hadoop-2.0.4-alpha$ sbin/yarn-daemon.sh start nodemanager
shirdrn@slave02:~/programs/hadoop2/hadoop-2.0.4-alpha$ sbin/yarn-daemon.sh start nodemanager
這時通過jps命令可以看到,各個slave結點上又多了一個NodeManager進程:
shirdrn@slave01:~/programs/hadoop2/hadoop-2.0.4-alpha$ jps
5544 DataNode
5735 NodeManager
5904 Jps

shirdrn@slave02:~/programs/hadoop2/hadoop-2.0.4-alpha$ jps
5544 DataNode
5735 NodeManager
5904 Jps
或者,可以查看啓動對應進程的日誌來確定是否啓動成功:
shirdrn@slave01:~/programs/hadoop2/hadoop-2.0.4-alpha$  tail -100f /home/shirdrn/programs/hadoop2/hadoop-2.0.4-alpha/logs/yarn-shirdrn-resourcemanager-master.log

shirdrn@slave01:~/programs/hadoop2/hadoop-2.0.4-alpha$  tail -100f /home/shirdrn/programs/hadoop2/hadoop-2.0.4-alpha/logs/yarn-shirdrn-nodemanager-master.log
另外,啓動整個Hadoop集羣(包括HDFS和YARN),也可以直接執行下面一個腳本,啓動全部相關進程,如下所示:
shirdrn@master:~/cloud/hadoop2/hadoop-2.0.4-alpha$ sbin/start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [master]
master: starting namenode, logging to /home/shirdrn/programs/hadoop2/hadoop-2.0.4-alpha/logs/hadoop-shirdrn-namenode-master.out
slave02: starting datanode, logging to /home/shirdrn/programs/hadoop2/hadoop-2.0.4-alpha/logs/hadoop-shirdrn-datanode-slave02.out
slave01: starting datanode, logging to /home/shirdrn/programs/hadoop2/hadoop-2.0.4-alpha/logs/hadoop-shirdrn-datanode-slave01.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /home/shirdrn/programs/hadoop2/hadoop-2.0.4-alpha/logs/hadoop-shirdrn-secondarynamenode-master.out
starting yarn daemons
starting resourcemanager, logging to /home/shirdrn/programs/hadoop2/hadoop-2.0.4-alpha/logs/yarn-shirdrn-resourcemanager-master.out
slave01: starting nodemanager, logging to /home/shirdrn/programs/hadoop2/hadoop-2.0.4-alpha/logs/yarn-shirdrn-nodemanager-slave01.out
slave02: starting nodemanager, logging to /home/shirdrn/programs/hadoop2/hadoop-2.0.4-alpha/logs/yarn-shirdrn-nodemanager-slave02.out

驗證集羣

最後,驗證集羣計算,執行Hadoop自帶的examples,執行如下命令:
shirdrn@master:~/cloud/hadoop2/hadoop-2.0.4-alpha$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.4-alpha.jar randomwriter out


發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章