安裝單節點的hadoop集羣

設置單節點羣集(基於原版hadoop2.6.5)

目的

本文檔介紹如何設置和配置單節點Hadoop安裝,以便您可以使用Hadoop MapReduce和Hadoop分佈式文件系統(HDFS)快速執行簡單操作。

先決條件

  • 支持平臺

支持GNU / Linux作爲開發和生產平臺。 已經在具有2000個節點的GNU / Linux集羣上演示了Hadoop。
Windows也是受支持的平臺,但以下步驟僅適用於Linux。要在Windows上設置Hadoop,請參閱Wiki頁面。https://wiki.apache.org/hadoop/Hadoop2OnWindows

  • 必備軟件

1,Java™ must be installed. Recommended Java versions are described at HadoopJavaVersions.
2,ssh must be installed and sshd must be running to use the Hadoop scripts that manage remote Hadoop daemons.

下載

http://www.apache.org/dyn/closer.cgi/hadoop/common/

準備啓動Hadoop集羣

解壓縮下載的Hadoop發行版。 在這個發行版本中,編輯文件etc/hadoop/hadoop-env.sh以定義一些參數,如下所示:

# set to the root of your Java installation
export JAVA_HOME=/usr/java/latest

# Assuming your installation directory is /usr/local/hadoop
export HADOOP_PREFIX=/usr/local/hadoop

請嘗試以下命令:

$ bin / hadoop

這將顯示hadoop腳本的使用文檔。
現在,您已準備好以三種支持模式之一啓動Hadoop集羣:
* Local (Standalone) Mode
* Pseudo-Distributed Mode
* Fully-Distributed Mode

單機操作(本地模式)

默認情況下,Hadoop配置爲以單機模式(non-distributed mode)運行,作爲單個Java進程。 這對調試很有用。

以下示例複製解壓縮的conf目錄以用作輸入,然後查找並顯示給定正則表達式的每個匹配項。 輸出將寫入給定的輸出目錄。

$ mkdir input
$ cp etc/hadoop/*.xml input
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.5.jar grep input output 'dfs[a-z.]+'
$ cat output/*

僞分佈式操作(僞分佈式模式)

Hadoop還可以在僞分佈式模式下在單節點上運行,其中每個Hadoop守護程序在單獨的Java進程中運行。

Configuration

Use the following:

  • etc/hadoop/core-site.xml:



    fs.defaultFS
    hdfs://localhost:9000

  • etc/hadoop/hdfs-site.xml:



    dfs.replication
    1

設置passphraseless ssh

配置一個免密碼通行證

Now check that you can ssh to the localhost without a passphrase:

$ ssh localhost

If you cannot ssh to localhost without a passphrase, execute the following commands:

$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
Execution(執行腳本)

以下說明是在本地運行MapReduce作業。 如果要在YARN上執行作業,請參閱單節點上的YARN。

Format the filesystem:

$ bin/hdfs namenode -format

Start NameNode daemon and DataNode daemon:

$ sbin/start-dfs.sh

The hadoop daemon log output is written to the $HADOOP_LOG_DIR directory (defaults to $HADOOP_HOME/logs).

Browse the web interface for the NameNode; by default it is available at:
NameNode - http://localhost:50070/

Make the HDFS directories required to execute MapReduce jobs:

$ bin/hdfs dfs -mkdir /user
$ bin/hdfs dfs -mkdir /user/<username>

Copy the input files into the distributed filesystem:

$ bin/hdfs dfs -put etc/hadoop input

Run some of the examples provided:

$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.5.jar grep input output 'dfs[a-z.]+'

Examine the output files:
Copy the output files from the distributed filesystem to the local filesystem and examine them:

$ bin/hdfs dfs -get output output
$ cat output/*

or

View the output files on the distributed filesystem:

$ bin/hdfs dfs -cat output/*

When you’re done, stop the daemons with:

$ sbin/stop-dfs.sh
YARN on Single Node(在單節點啓動yarn)

您可以通過設置一些參數並運行ResourceManager守護程序和NodeManager守護程序,以僞分佈式模式在YARN上運行MapReduce作業。

The following instructions assume that 1. ~ 4. steps of the above instructions are already executed.

1 Configure parameters as follows:
  • etc/hadoop/mapred-site.xml:



    mapreduce.framework.name
    yarn

  • etc/hadoop/yarn-site.xml:



    yarn.nodemanager.aux-services
    mapreduce_shuffle

2 Start ResourceManager daemon and NodeManager daemon:
$ sbin/start-yarn.sh
3 Browse in the web

Browse the web interface for the ResourceManager; by default it is available at:
ResourceManager - http://localhost:8088/

4 Run a MapReduce job.

When you’re done, stop the daemons with:

$ sbin/stop-yarn.sh
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章