Hadoop 安裝 教程

    Hadoop作爲一個開源的分佈式計算框架,現在在國內是越來越火了。包括淘寶、百度等大公司,都在大批量的使用hadoop。之前對hadoop稍微瞭解了一點,今天把Hadoop的分佈式安裝大致介紹一下。

  1. 安裝前的準備工作
    1. Java JDK 1.6.x
    2. ssh 以及 sshd。保證各個服務器之前的免密碼訪問
  2. Hadoop下載:http://labs.renren.com/apache-mirror//hadoop/core/hadoop-0.21.0/hadoop-0.21.0.tar.gz
  3. 這個包需要在hadoop集羣的所有服務器上下載並解壓縮
  4. 我們計劃將整個集羣安裝在4臺服務器上。服務器名字分別爲myna[5-8].其中myna5爲Namenode, myna6爲TaskTracker,myna[5-6]即爲master,其餘兩臺myna[7-8]爲slaver。
  5. 配置這四臺服務器的環境變量
  6. export JAVA_HOME=/path/to/bin/java
    export HADOOP_HOME=~/hadoop-0.21.0
  7. 配置myna5(NameNode)和myna6(TaskTracker):
    1. ~/hadoop-0.21.0/conf/core-site.xml:
    2. <property>
          <name>fs.default.name</name>
          <value>hdfs://myna5:54320</value>
          <description>The name of the default file system.    A URI whose
          scheme and authority determine the FileSystem implementation.    The
          uri's scheme determines the config property (fs.SCHEME.impl) naming
          the FileSystem implementation class.    The uri's authority is used to
          determine the host, port, etc. for a filesystem.</description>
      </property>

    3. ~/hadoop-0.21.0/conf/hdfs-site.xml:
    4. <property>
          <name>hadoop.tmp.dir</name>
          <value>/home/hrj/hadooptmp/hadoop-${user.name}</value>
          <description>A base for other temporary directories.</description>
      </property>
      <property>
               <name>dfs.upgrade.permission</name>
               <value>777</value>
      </property>

      <property>
               <name>dfs.umask</name>
               <value>022</value>
      </property>

    5. ~/hadoop-0.21.0/conf/mapred-site.xml:
    6. <property>
          <name>mapred.job.tracker</name>
          <value>myna6:54321</value>
          <description>The host and port that the MapReduce job tracker runs
          at.    If "local", then jobs are run in-process as a single map
          and reduce task.
          </description>
      </property>
      <property>
          <name>mapred.compress.map.output</name>
          <value>true</value>
          <description>Should the outputs of the maps be compressed before being
                                   sent across the network. Uses SequenceFile compression.
          </description>
      </property>
      <property>
               <name>mapred.child.java.opts</name>
               <value>-Xmx1024m</value>
      </property>

    7. 只需要配置myna5的masters
    8. myna6

    9. 配置myna5和myna6的slavers
    10. myna7
      myna8

  8. 配置myna7和myna8(Slavers)
    1. ~/hadoop-0.21.0/conf/core-site.xml:
    2. <property>
              <name>fs.default.name</name>
              <value>hdfs://myna5:54320</value>
              <description>The name of the default file system.        A URI whose
              scheme and authority determine the FileSystem implementation.        The
              uri's scheme determines the config property (fs.SCHEME.impl) naming
              the FileSystem implementation class.        The uri's authority is used to
              determine the host, port, etc. for a filesystem.</description>
      </property>

    3. ~/hadoop-0.21.0/conf/hdfs-site.xml:
    4. <property>
              <name>hadoop.tmp.dir</name>
              <value>/disk/hadooptmp/hadoop-${user.name}</value>
      </property>
      <property>
              <name>dfs.data.dir</name>
              <value>/home/hadoopdata,/disk/hadoopdata</value>
      </property>
      <property>
               <name>dfs.upgrade.permission</name>
               <value>777</value>
      </property>
      <property>
               <name>dfs.umask</name>
               <value>022</value>
      </property>
    5. ~/hadoop-0.21.0/conf/mapred-site.xml:
    6. <property>
          <name>mapred.job.tracker</name>
          <value>myna6:54321</value>
          <description>The host and port that the MapReduce job tracker runs
          at.    If "local", then jobs are run in-process as a single map
          and reduce task.
          </description>
      </property>
      <property>
          <name>mapred.compress.map.output</name>
          <value>true</value>
          <description>Should the outputs of the maps be compressed before being
                                   sent across the network. Uses SequenceFile compression.
          </description>
      </property>
      <property>
               <name>mapred.child.java.opts</name>
               <value>-Xmx1024m</value>
      </property>
      <property>
              <name>mapred.tasktracker.map.tasks.maximum</name>
              <value>4</value>
          </property>
          <property>
              <name>mapred.tasktracker.reduce.tasks.maximum</name>
              <value>2</value>
          </property>

  9. 編輯myna[5-8]的~/hadoop-0.21.0/conf/hadoop-env.sh,,添加JAVA_HOME環境變量
  10. export JAVA_HOME=/path/to/bin/java
  11. 至此,Hadoop cluster配置完成。啓動Hadoop:
    1. 在myna5上啓動HDFS
    2. hrj$ ~/hadoop-0.21.0/bin/hadoop namenode -format  
      hrj$ ~/hadoop-0.21.0/bin/start-dfs.sh

    3. 在myna6上啓動MapRed
    4. hrj$ ~/hadoop-0.21.0/bin/start-mapred.sh

  12. 啓動HDFS後,你可以通過前端頁面訪問:
    1. 訪問HDFS:http://myna5:50070
    2. 訪問MapRed:http://myna6:50030

更加詳細的安裝幫助及參數配置信息,請大家訪問英文官方網http://hadoop.apache.org/common/docs/r0.21.0/cluster_setup.html
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章