大數據入門(1)——安裝Hadoop

原創

imHou

2020-07-05 18:40

原文發表於我的個人網站：https://www.imhou.com/%e5%a4%a7%e6%95%b0%e6%8d%ae%e5%85%a5%e9%97%a81-%e5%ae%89%e8%a3%85hadoop/

環境準備：Ubuntu16、JDK 8、Hadoop3.1.2

Ubuntu的安裝這裏就不講了，JDK 的安裝，之前是直接用apt命令安裝的openjdk

1
2
3
4
5
6

// 搜索jdk版本
$ apt search openjdk
// 安裝jdk8
$ apt install openjdk-8-jdk
// 安裝好之後，查看版本號
$ java -version

因爲後續要用到Java 的安裝路徑，配置到Hadoop的環境中，所以要找到安裝在哪裏。

1
2
3
4
5
6
7
8
9

// 使用which 命令，查看java的可執行程序在哪裏
$ which java
/usr/bin/java
// 使用ls -l 命令查看java 程序的鏈接情況
$ ls -l /usr/bin/java
/usr/bin/java -> /etc/alternatives/java
// 再次使用ls -l 命令
$ ls -l /etc/alternatives/java
/etc/alternatives/java -> /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java

至此，發現java 的真實路徑就在/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java

第二步，下載hadoop，地址：https://hadoop.apache.org/releases.html

我選擇了3.1.2 版本，binary download。使用wget 命令下載，然後解壓

1 2	$ wget http://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-3.1.2/hadoop-3.1.2.tar.gz $ tar -xzvf hadoop-3.1.2.tar.gz

然後，就可以看到當前目錄下的文件夾 hadoop.3.1.2了。進入該文件夾

bin 單機執行程序
etc 配置文件
sbin 分佈式環境的執行程序
share/hadoop 所有引用的包，寫代碼時會用

編輯 ~/.bash_profile ，在文件末尾添加如下內容設置環境變量

1
2
3
4
5

HADOOP_HOME=/root/software/hadoop-3.1.2
export HADOOP_HOME

PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
export PATH

保存文件，然後運行如下命令使環境變量生效

1	$ source ~/.bash_profile

進入Hadoop安裝目錄，編輯 etc/hadoop/hadoop-env.sh 文件並保存

1 2	# set to the root of your Java installation export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre

現在基本上hadoop的單機環境就安裝好了，在hadoop-3.1.2/share/hadoop/mapreduce 目錄下，有一個hadoop-mapreduce-examples-3.1.2.jar 示例程序。進入文件目錄，通過如下命令執行該程序：

1	hadoop jar hadoop-mapreduce-examples-3.1.2.jar

看到以下信息，說明hadoop 安裝成功了。

An example program must be given as the first argument.
Valid program names are: aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files. aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files. bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi. dbcount: An example job that count the pageview counts from a database. distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi. grep: A map/reduce program that counts the matches of a regex in the input. join: A job that effects a join over sorted, equally partitioned datasets multifilewc: A job that counts words from several files. pentomino: A map/reduce tile laying program to find solutions to pentomino problems. pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method. randomtextwriter: A map/reduce program that writes 10GB of random textual data per node. randomwriter: A map/reduce program that writes 10GB of random data per node. secondarysort: An example defining a secondary sort to the reduce. sort: A map/reduce program that sorts the data written by the random writer. sudoku: A sudoku solver. teragen: Generate data for the terasort terasort: Run the terasort teravalidate: Checking results of terasort wordcount: A map/reduce program that counts the words in the input files. wordmean: A map/reduce program that counts the average length of the words in the input files. wordmedian: A map/reduce program that counts the median length of the words in the input files. wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.

此程序帶有很多的示例程序，其中有單詞計數wordcount，我們可以試試。

1	hadoop jar hadoop-mapreduce-examples-3.1.2.jar wordcount /root/data/input/data.txt /root/data/output/test1

看到如下內容，表明成功：

2019-08-08 11:10:47,100 INFO mapreduce.Job:  map 0% reduce 0%
2019-08-08 11:10:52,173 INFO mapreduce.Job:  map 100% reduce 0%
2019-08-08 11:10:58,210 INFO mapreduce.Job:  map 100% reduce 100%
2019-08-08 11:10:58,218 INFO mapreduce.Job: Job job_1565165510892_0005 completed successfully
2019-08-08 11:10:58,337 INFO mapreduce.Job: Counters: 53

大家可以看到，在wordcount 後面，帶了兩個路徑：/root/data/input/data.txt /root/data/output/test1 這兩個路徑分別是傳入的文件地址，輸出的文件夾。data.txt文件內容如下，可以自行創建編輯：

1
2
3

I love Chongqing
I love China
Chongqing is a province city of China

由於使本地環境，不具備HDFS分佈式文件系統，所以執行本地的文件。

最後，通過命令行可以看到test1文件下生成了兩個文件，然後_SUCCESS 和part-r-00000，使用cat part-r-00000 命令可以看到排好序的單詞計數信息：

1
2
3
4
5
6
7
8
9

China 2
Chongqing 2
I 2
a 1
city 1
is 1
love 2
of 1
province 1

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

大數據入門(1)——安裝Hadoop

[轉帖]使用NMT和pmap解決JVM資源泄漏問題原創

Python實現大麥網搶票的四大關鍵技術點解析

salesforce零基礎學習（一百三十八）零碎知識點小總結（十）

一款開源的.NET程序集反編譯、編輯和調試神器

關於接口協議，你必須要知道這些！

【2024-05-21】以茶會友

簡單談一下軟件需求

大數據入門(1)——安裝Hadoop

微信小程序的持續集成

國內幾個免費的Git私有倉庫

【已解決】Could not initialize class sun.awt.X11FontManager

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結