Hadoop安裝指南

安裝環境

OS:Ubuntu Linux 8.0

Java:sun-java 1.6.0.20

Hadoop:hadoop-0.20.2

安裝Hadoop之前需要做一些準備工作,系統應該先安裝Java和SSH,並且保證sshd一直運行,以便使用Hadoop腳本管理器遠端訪問Hadoop守護進程。

 

第一步:安裝SSH

使用OpenSSH,在配置好源的情況下,輸入如下命令:

sudo apt-get install openssh-serveropenssh-client

結果如下:

       出現這種情況的可能原因是,客戶端與服務器不匹配。首先刪除客戶端,運行如下命令:

       sudo apt-get remove openssh-client

       重新運行如下命令:

       sudo apt-get install openssh-server

       選擇是繼續完成安裝。

       可以使用如下3條命令停止、啓動、重啓SSH。

       /etc/init.d/ssh stop

       /etc/init.d/ssh start

/etc/init.d/ssh restart

 

第二步 安裝Java

       Ubuntu默認安裝了Java,但那不是我們所需要的。因爲它是BSD OpenJDK,而不是SUN JDK。

       可以通過如下命令查詢:

       java –version

       使用如下命令來獲得JDK:

       sudo apt-get install sun-java6-jdk

       安裝過程中會出現DLJ許可協議,閱讀同意選擇OK回車,即可繼續安裝。安裝完成後可以在/usr/lib/jvm/下發行新文件夾java-6-sun-1.6.0.20。

 

第三步 配置SSH

       Hadoop通過SSH來啓動所有的slaves。使用命令$ssh localhost:

       選擇yes:

       需要密碼訪問。重新設置,輸入如下內容:

       sudo ssh-keygen –t dsa –P ‘’ –f~/.ssh/id_dsa

       輸入如下命令創建一個新文件authorized_keys:

       cat ~/.ssh/id_dsa.pub >>~/.ssh/authorized_keys

       再次$ ssh localhost將不再需要密碼。

 

第四步 配置Hadoop

       修改文件conf/hadoop-env.sh。找到Java路徑一句:

       # export JAVA_HOME=/usr/lib/j2sdk1.5-sun

       將其改爲:

       export JAVA_HOME=/usr/lib/jvm/java-6-sun-1.6.0.20

 

第四步 Hadoop單機模式

       嘗試運行$bin/hadoop,將會顯示Hadoop腳本的使用文檔。

       新建一個文件夾,用於存放測試數據。

       $mkdir input

       將Hadoop配置文件夾的.xml文件放入新建的文件夾,作爲測試數據。

       $cp conf/.xml input

       總共放入了5個xml文件。

       運行命令:

       $bin/hadoop jar hadoop-*-examples.jargrep input output ‘dfs[a-z.]+’

       產生的信息如下:

10/05/0419:28:37 INFO jvm.JvmMetrics: Initializing JVM Metrics withprocessName=JobTracker, sessionId=
10/05/04 19:28:37 INFO mapred.FileInputFormat: Total input paths to process : 5
10/05/04 19:28:38 INFO mapred.JobClient: Running job: job_local_0001
10/05/04 19:28:38 INFO mapred.FileInputFormat: Total input paths to process : 5
10/05/04 19:28:38 INFO mapred.MapTask: numReduceTasks: 1
10/05/04 19:28:38 INFO mapred.MapTask: io.sort.mb = 100
10/05/04 19:28:39 INFO mapred.JobClient:  map 0% reduce 0%
10/05/04 19:28:40 INFO mapred.MapTask: data buffer = 79691776/99614720
10/05/04 19:28:40 INFO mapred.MapTask: record buffer = 262144/327680
10/05/04 19:28:40 INFO mapred.MapTask: Starting flush of map output
10/05/04 19:28:40 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000000_0 isdone. And is in the process of commiting
10/05/04 19:28:40 INFO mapred.LocalJobRunner:file:/home/sulliy/UESTC/Hadoop/Hadoop/hadoop-0.20.2/input/core-site.xml:0+178
10/05/04 19:28:40 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000000_0'done.
10/05/04 19:28:40 INFO mapred.MapTask: numReduceTasks: 1
10/05/04 19:28:40 INFO mapred.MapTask: io.sort.mb = 100
10/05/04 19:28:41 INFO mapred.JobClient:  map 100% reduce 0%
10/05/04 19:28:41 INFO mapred.MapTask: data buffer = 79691776/99614720
10/05/04 19:28:41 INFO mapred.MapTask: record buffer = 262144/327680
10/05/04 19:28:41 INFO mapred.MapTask: Starting flush of map output
10/05/04 19:28:41 INFO mapred.MapTask: Finished spill 0
10/05/04 19:28:41 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000001_0 isdone. And is in the process of commiting
10/05/04 19:28:41 INFO mapred.LocalJobRunner:file:/home/sulliy/UESTC/Hadoop/Hadoop/hadoop-0.20.2/input/hadoop-policy.xml:0+4190
10/05/04 19:28:41 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000001_0'done.
10/05/04 19:28:41 INFO mapred.MapTask: numReduceTasks: 1
10/05/04 19:28:41 INFO mapred.MapTask: io.sort.mb = 100
10/05/04 19:28:42 INFO mapred.MapTask: data buffer = 79691776/99614720
10/05/04 19:28:42 INFO mapred.MapTask: record buffer = 262144/327680
10/05/04 19:28:42 INFO mapred.MapTask: Starting flush of map output
10/05/04 19:28:42 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000002_0 isdone. And is in the process of commiting
10/05/04 19:28:42 INFO mapred.LocalJobRunner:file:/home/sulliy/UESTC/Hadoop/Hadoop/hadoop-0.20.2/input/capacity-scheduler.xml:0+3936
10/05/04 19:28:42 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000002_0'done.
10/05/04 19:28:42 INFO mapred.MapTask: numReduceTasks: 1
10/05/04 19:28:42 INFO mapred.MapTask: io.sort.mb = 100
10/05/04 19:28:42 INFO mapred.MapTask: data buffer = 79691776/99614720
10/05/04 19:28:42 INFO mapred.MapTask: record buffer = 262144/327680
10/05/04 19:28:42 INFO mapred.MapTask: Starting flush of map output
10/05/04 19:28:42 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000003_0 isdone. And is in the process of commiting
10/05/04 19:28:42 INFO mapred.LocalJobRunner:file:/home/sulliy/UESTC/Hadoop/Hadoop/hadoop-0.20.2/input/mapred-site.xml:0+178
10/05/04 19:28:42 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000003_0'done.
10/05/04 19:28:42 INFO mapred.MapTask: numReduceTasks: 1
10/05/04 19:28:42 INFO mapred.MapTask: io.sort.mb = 100
10/05/04 19:28:43 INFO mapred.MapTask: data buffer = 79691776/99614720
10/05/04 19:28:43 INFO mapred.MapTask: record buffer = 262144/327680
10/05/04 19:28:43 INFO mapred.MapTask: Starting flush of map output
10/05/04 19:28:43 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000004_0 isdone. And is in the process of commiting
10/05/04 19:28:43 INFO mapred.LocalJobRunner:file:/home/sulliy/UESTC/Hadoop/Hadoop/hadoop-0.20.2/input/hdfs-site.xml:0+178
10/05/04 19:28:43 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000004_0'done.
10/05/04 19:28:43 INFO mapred.LocalJobRunner:
10/05/04 19:28:43 INFO mapred.Merger: Merging 5 sorted segments
10/05/04 19:28:43 INFO mapred.Merger: Down to the last merge-pass, with 1segments left of total size: 21 bytes
10/05/04 19:28:43 INFO mapred.LocalJobRunner:
10/05/04 19:28:43 INFO mapred.TaskRunner: Task:attempt_local_0001_r_000000_0 isdone. And is in the process of commiting
10/05/04 19:28:43 INFO mapred.LocalJobRunner:
10/05/04 19:28:43 INFO mapred.TaskRunner: Task attempt_local_0001_r_000000_0 isallowed to commit now
10/05/04 19:28:43 INFO mapred.FileOutputCommitter: Saved output of task'attempt_local_0001_r_000000_0' tofile:/home/sulliy/UESTC/Hadoop/Hadoop/hadoop-0.20.2/grep-temp-1659003440
10/05/04 19:28:43 INFO mapred.LocalJobRunner: reduce > reduce
10/05/04 19:28:43 INFO mapred.TaskRunner: Task 'attempt_local_0001_r_000000_0'done.
10/05/04 19:28:44 INFO mapred.JobClient:  map 100% reduce 100%
10/05/04 19:28:44 INFO mapred.JobClient: Job complete: job_local_0001
10/05/04 19:28:44 INFO mapred.JobClient: Counters: 13
10/05/04 19:28:44 INFO mapred.JobClient:   FileSystemCounters
10/05/04 19:28:44 INFO mapred.JobClient:    FILE_BYTES_READ=974719
10/05/04 19:28:44 INFO mapred.JobClient:    FILE_BYTES_WRITTEN=1032450
10/05/04 19:28:44 INFO mapred.JobClient:   Map-Reduce Framework
10/05/04 19:28:44 INFO mapred.JobClient:     Reduce inputgroups=1
10/05/04 19:28:44 INFO mapred.JobClient:     Combine outputrecords=1
10/05/04 19:28:44 INFO mapred.JobClient:     Map inputrecords=219
10/05/04 19:28:44 INFO mapred.JobClient:     Reduce shufflebytes=0
10/05/04 19:28:44 INFO mapred.JobClient:     Reduce outputrecords=1
10/05/04 19:28:44 INFO mapred.JobClient:     SpilledRecords=2
10/05/04 19:28:44 INFO mapred.JobClient:     Map outputbytes=17
10/05/04 19:28:44 INFO mapred.JobClient:     Map inputbytes=8660
10/05/04 19:28:44 INFO mapred.JobClient:     Combine inputrecords=1
10/05/04 19:28:44 INFO mapred.JobClient:     Map outputrecords=1
10/05/04 19:28:44 INFO mapred.JobClient:     Reduce inputrecords=1
10/05/04 19:28:44 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics withprocessName=JobTracker, sessionId= - already initialized
10/05/04 19:28:44 WARN mapred.JobClient: Use GenericOptionsParser for parsingthe arguments. Applications should implement Tool for the same.
10/05/04 19:28:44 INFO mapred.FileInputFormat: Total input paths to process : 1
10/05/04 19:28:45 INFO mapred.JobClient: Running job: job_local_0002
10/05/04 19:28:45 INFO mapred.FileInputFormat: Total input paths to process : 1
10/05/04 19:28:45 INFO mapred.MapTask: numReduceTasks: 1
10/05/04 19:28:45 INFO mapred.MapTask: io.sort.mb = 100
10/05/04 19:28:45 INFO mapred.MapTask: data buffer = 79691776/99614720
10/05/04 19:28:45 INFO mapred.MapTask: record buffer = 262144/327680
10/05/04 19:28:45 INFO mapred.MapTask: Starting flush of map output
10/05/04 19:28:45 INFO mapred.MapTask: Finished spill 0
10/05/04 19:28:45 INFO mapred.TaskRunner: Task:attempt_local_0002_m_000000_0 isdone. And is in the process of commiting
10/05/04 19:28:45 INFO mapred.LocalJobRunner: file:/home/sulliy/UESTC/Hadoop/Hadoop/hadoop-0.20.2/grep-temp-1659003440/part-00000:0+111
10/05/04 19:28:45 INFO mapred.TaskRunner: Task 'attempt_local_0002_m_000000_0'done.
10/05/04 19:28:45 INFO mapred.LocalJobRunner:
10/05/04 19:28:45 INFO mapred.Merger: Merging 1 sorted segments
10/05/04 19:28:45 INFO mapred.Merger: Down to the last merge-pass, with 1segments left of total size: 21 bytes
10/05/04 19:28:45 INFO mapred.LocalJobRunner:
10/05/04 19:28:46 INFO mapred.TaskRunner: Task:attempt_local_0002_r_000000_0 isdone. And is in the process of commiting
10/05/04 19:28:46 INFO mapred.LocalJobRunner:
10/05/04 19:28:46 INFO mapred.TaskRunner: Task attempt_local_0002_r_000000_0 isallowed to commit now
10/05/04 19:28:46 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0002_r_000000_0'to file:/home/sulliy/UESTC/Hadoop/Hadoop/hadoop-0.20.2/output
10/05/04 19:28:46 INFO mapred.LocalJobRunner: reduce > reduce
10/05/04 19:28:46 INFO mapred.TaskRunner: Task 'attempt_local_0002_r_000000_0'done.
10/05/04 19:28:46 INFO mapred.JobClient:  map 100% reduce 100%
10/05/04 19:28:46 INFO mapred.JobClient: Job complete: job_local_0002
10/05/04 19:28:46 INFO mapred.JobClient: Counters: 13
10/05/04 19:28:46 INFO mapred.JobClient:   FileSystemCounters
10/05/04 19:28:46 INFO mapred.JobClient:    FILE_BYTES_READ=640771
10/05/04 19:28:46 INFO mapred.JobClient:    FILE_BYTES_WRITTEN=685225
10/05/04 19:28:46 INFO mapred.JobClient:   Map-Reduce Framework
10/05/04 19:28:46 INFO mapred.JobClient:     Reduce inputgroups=1
10/05/04 19:28:46 INFO mapred.JobClient:     Combine outputrecords=0
10/05/04 19:28:46 INFO mapred.JobClient:     Map inputrecords=1
10/05/04 19:28:46 INFO mapred.JobClient:     Reduce shufflebytes=0
10/05/04 19:28:46 INFO mapred.JobClient:     Reduce outputrecords=1
10/05/04 19:28:46 INFO mapred.JobClient:     SpilledRecords=2
10/05/04 19:28:46 INFO mapred.JobClient:     Map outputbytes=17
10/05/04 19:28:46 INFO mapred.JobClient:     Map inputbytes=25
10/05/04 19:28:46 INFO mapred.JobClient:     Combine inputrecords=0
10/05/04 19:28:46 INFO mapred.JobClient:     Map outputrecords=1
10/05/04 19:28:46 INFO mapred.JobClient:     Reduce inputrecords=1

 

        結尾處給出了Map和Reduce的統計信息。

       使用命令查看結果:

       cat output/*

      

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章