1、Hadoop簡介
Hadoop是一個由Apache基金會所開發的分佈式系統基礎架構。用戶可以在不瞭解分佈式底層細節的情況下,開發分佈式程序。充分利用集羣的威力進行高速運算和存儲。
Hadoop實現了一個分佈式文件系統(Hadoop Distributed File System),簡稱HDFS。HDFS有高容錯性的特點,並且設計用來部署在低廉的(low-cost)硬件上;而且它提供高吞吐量(high throughput)來訪問應用程序的數據,適合那些有着超大數據集(large data set)的應用程序。HDFS放寬了(relax)POSIX的要求,可以以流的形式訪問(streaming access)文件系統中的數據。
Hadoop的框架最核心的設計就是:HDFS和MapReduce。HDFS爲海量的數據提供了存儲,則MapReduce爲海量的數據提供了計算。
優點:
-
高可靠性:Hadoop按位存儲和處理數據的能力值得人們信賴。
-
高擴展性:Hadoop是在可用的計算機集簇間分配數據並完成計算任務的,這些集簇可以方便地擴展到數以千計的節點中。
-
高效性:Hadoop能夠在節點之間動態地移動數據,並保證各個節點的動態平衡,因此處理速度非常快。
-
高容錯性。Hadoop能夠自動保存數據的多個副本,並且能夠自動將失敗的任務重新分配。
-
低成本:與一體機、商用數據倉庫以及QlikView、Yonghong Z-Suite等數據集市相比,hadoop是開源的,項目的軟件成本因此會大大降低。
Hadoop帶有用Java語言編寫的框架,因此運行在 Linux 生產平臺上是非常理想的。Hadoop 上的應用程序也可以使用其他語言編寫,比如 C++。
HDFS寫入數據流程圖
HDFS讀出數據流程圖
2、環境說明:2臺arm架構的centos7.4操作系統
IP | 操作系統 | 服務器名 |
10.2.151.138 | centos7.4 | localhost-master |
10.2.151.140 | centos7.4 | localhost-slave1 |
3、主機名設置
(1)查看當前主機名
[root@localhost ~]# hostname
localhost
(2)修改主機名(根據自己需要而定)
$:hostnamectl set-hostname xxxxx
(3)配置hosts文件
$:vim /etc/hosts
ip 主機名
10.2.151.138 localhost-master
...
...
4、安裝Java環境
$:mkdir /usr/java
$:tar -xvf jdk-8u161-linux-arm64-vfp-hflt.tar.gz -C /usr/java
$:vim /etc/profile
#java
export JAVA_HOME=/usr/java/jdk1.8.0_161/
export JRE_HOME=/usr/java/jdk1.8.0_161/jre
export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib:$JRE_HOME/lib
export PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin
生效配置文件
$:source /etc/profile
5、ssh無密碼登錄
(1)安裝和啓動ssh服務
可以通過下面命令查看結果顯示如下:
$:rpm -qa | grep openssh
openssh-7.4p1-11.el7.aarch64
openssh-server-7.4p1-11.el7.aarch64
openssh-clients-7.4p1-11.el7.aarch64
$:rpm -qa | grep rsync
rsync-3.1.2-4.el7.aarch64
若未安裝可使用yum安裝
yum install ssh #安裝SSH協議
yum install rsync (rsync是一個遠程數據同步工具,可通過LAN/WAN快速同步多臺主機間的文件)
service sshd restart 啓動服務
(2)配置Master無密碼登錄所有Salve
a、ssh無密碼登錄原理
Master(NameNode | JobTracker)作爲客戶端,要實現無密碼公鑰認證,連接到服務器Salve(DataNode | Tasktracker)上時,需要在Master上生成一個密鑰對,包括一個公鑰和一個私鑰,而後將公鑰複製到所有的Slave上。當Master通過SSH連接Salve時,Salve就會生成一個隨機數並用Master的公鑰對隨機數進行加密,併發送給Master。Master收到加密數之後再用私鑰解密,並將解密數回傳給Slave,Slave確認解密數無誤之後就允許Master進行連接了。這就是一個公鑰認證過程,其間不需要用戶手工輸入密碼。重要過程是將客戶端Master複製到Slave上。
b、master機器上生成密碼對
在Master節點上執行以下命令:ssh-keygen –t rsa –P ''
這條命是生成其無密碼密鑰對,詢問其保存路徑時直接回車採用默認路徑。生成的密鑰對:id_rsa和id_rsa.pub,默認存儲在執行命令的目錄下。
$: ssh-keygen -t rsa -P ''
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Created directory '/root/.ssh'.
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:Ubr8vGxPKdtUHUSaTgjjC80nRoOzVuPMSz1rKu7iNpc root@localhost
The key's randomart image is:
+---[RSA 2048]----+
| .=. .o |
| o==+ . + |
| .X*oo + . |
| +oB+oo ..|
| . S.. o.. .|
| + o o |
| . * + |
| + E o.B |
| o.*o..+.o |
+----[SHA256]-----+
查看是否生成祕鑰:
[root@localhost ~]# ll -a
total 7237680
dr-xr-x---. 6 root root 4096 Aug 3 13:28 .
dr-xr-xr-x. 17 root root 264 Aug 3 11:36 ..
-rw-------. 1 root root 1774 Dec 30 2017 anaconda-ks.cfg
-rw-------. 1 root root 3194 Aug 3 10:05 .bash_history
-rw-r--r--. 1 root root 18 Dec 29 2013 .bash_logout
-rw-r--r--. 1 root root 176 Dec 29 2013 .bash_profile
-rw-r--r--. 1 root root 176 Dec 29 2013 .bashrc
drwx------. 3 root root 17 Aug 3 11:09 .cache
-rwxr-xr-x. 1 root root 7411329024 Mar 20 14:20 CentOS-7-aarch64-Everything.iso
-rw-r--r--. 1 root root 100 Dec 29 2013 .cshrc
-rwxrwxrwx. 1 root root 74 May 24 15:04 force-eth0-100Mbps.sh
-rwxr-xr-x. 1 root root 493 May 30 15:08 lvm-resize-sda.sh
drwx------. 2 root root 38 Aug 3 13:28 .ssh
-rw-r--r--. 1 root root 129 Dec 29 2013 .tcshrc
drwxr-xr-x. 2 root root 4096 Apr 19 10:41 updates
-rw-------. 1 root root 5360 Aug 3 12:43 .viminfo
[root@localhost ~]# cd .ssh/
[root@localhost .ssh]# ll
total 8
-rw-------. 1 root root 1679 Aug 3 13:28 id_rsa
-rw-r--r--. 1 root root 396 Aug 3 13:28 id_rsa.pub
c、在master節點中將id_rsa.pub追加到授權的key裏面去
[root@localhost .ssh]# cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
[root@localhost .ssh]# ll
total 12
-rw-r--r--. 1 root root 792 Aug 3 13:35 authorized_keys
-rw-------. 1 root root 1679 Aug 3 13:28 id_rsa
-rw-r--r--. 1 root root 396 Aug 3 13:28 id_rsa.pub
d、修改文件"authorized_keys"權限
[root@localhost .ssh]# ll
total 12
-rw-r--r--. 1 root root 792 Aug 3 13:35 authorized_keys
-rw-------. 1 root root 1679 Aug 3 13:28 id_rsa
-rw-r--r--. 1 root root 396 Aug 3 13:28 id_rsa.pub
[root@localhost .ssh]# chmod 600 ~/.ssh/authorized_keys
[root@localhost .ssh]# ll
total 12
-rw-------. 1 root root 792 Aug 3 13:35 authorized_keys
-rw-------. 1 root root 1679 Aug 3 13:28 id_rsa
-rw-r--r--. 1 root root 396 Aug 3 13:28 id_rsa.pub
e、用root用戶設置"/etc/ssh/sshd_config"的內容
[root@localhost .ssh]# vim /etc/ssh/sshd_config
修改配置如下:
RSAAuthentication yes # 啓用 RSA 認證
PubkeyAuthentication yes # 啓用公鑰私鑰配對認證方式
AuthorizedKeysFile .ssh/authorized_keys # 公鑰文件路徑(和上面生成的文件同)
重啓ssh服務使配置生效:service sshd restart
f、驗證是否生效
[root@localhost ~]# ssh localhost
The authenticity of host 'localhost (::1)' can't be established.
ECDSA key fingerprint is SHA256:a1NFzC3BwML16Ic2ZDgqOjyrX9DWWFTaipmSU3AQC34.
ECDSA key fingerprint is MD5:d7:cd:5c:29:db:b0:b1:33:47:fe:9a:91:48:f1:32:5c.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
Last login: Fri Aug 3 12:42:53 2018 from 10.2.154.39
[root@localhost ~]# ls
anaconda-ks.cfg CentOS-7-aarch64-Everything.iso force-eth0-100Mbps.sh lvm-resize-sda.sh updates zhaochuang
[root@localhost ~]# exit
logout
Connection to localhost closed.
[root@localhost ~]#
g、把公鑰複製所有的Slave機器上
[root@localhost ~]# scp ~/.ssh/id_rsa.pub [email protected]:~/
The authenticity of host '10.2.151.140 (10.2.151.140)' can't be established.
ECDSA key fingerprint is SHA256:a1NFzC3BwML16Ic2ZDgqOjyrX9DWWFTaipmSU3AQC34.
ECDSA key fingerprint is MD5:d7:cd:5c:29:db:b0:b1:33:47:fe:9a:91:48:f1:32:5c.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '10.2.151.140' (ECDSA) to the list of known hosts.
[email protected]'s password:
id_rsa.pub 100% 396 194.4KB/s 00:00
[root@localhost ~]#
h、登錄salve節點,針對salve節點進行配置
《1》在~/創建.ssh文件
$:mkdir ~/.ssh
$:chmod 700 ~/.ssh
《2》追加到授權文件"authorized_keys"
[root@localhost-slave1 ~]# cat ~/id_rsa.pub >> ~/.ssh/authorized_keys
[root@localhost-slave1 ~]# chmod 600 ~/.ssh/authorized_keys
《2》修改"/etc/ssh/sshd_config"
[root@localhost-slave1 ~]# vim /etc/ssh/sshd_config
RSAAuthentication yes
PubkeyAuthentication yes
AuthorizedKeysFile .ssh/authorized_keys
重啓ssh服務:service sshd restart
《3》在master上測試ssh無密碼登錄salve是否成功
[root@localhost ~]# ssh 10.2.151.140
Last login: Fri Aug 3 12:42:57 2018 from 10.2.154.39
[root@localhost-slave1 ~]#
把~/目錄下的"id_rsa.pub"文件刪除掉:rm –rf ~/id_rsa.pub
到此爲止以實現master節點和slave節點的ssh無密碼登錄,重複上面步驟實現其他master和slave節點的無密碼登錄!!!
(3)實現slave節點ssh無密碼登錄master節點
a、創建"Slave"自己的公鑰和私鑰,並把自己的公鑰追加到"authorized_keys"文件中
[root@localhost-slave1 ~]# rm –r ~/id_rsa.pub
rm: cannot remove ‘–r’: No such file or directory
rm: remove regular file ‘/root/id_rsa.pub’? y
[root@localhost-slave1 ~]# ssh-keygen -t rsa -P ''
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:bb17EEH7OwT1/CR/UZRCt09ZOfeW36TDdUlMrqb1aU0 root@localhost-slave1
The key's randomart image is:
+---[RSA 2048]----+
| .o.++*|
| .+oO*|
| o.+=@|
| . ..o.BX|
| S o .*o=E|
| . =o=o*|
| ...o=.|
| .o. |
| .. |
+----[SHA256]-----+
[root@localhost-slave1 ~]# cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
[root@localhost-slave1 ~]#
b、在slave上,用命令"scp"複製"Slave"的公鑰"id_rsa.pub"到"Master"的"~/"目錄下,並追加到"Master"的"authorized_keys"中。
slave:
[root@localhost-slave1 ~]# scp ~/.ssh/id_rsa.pub [email protected]:~/
The authenticity of host '10.2.151.138 (10.2.151.138)' can't be established.
ECDSA key fingerprint is SHA256:a1NFzC3BwML16Ic2ZDgqOjyrX9DWWFTaipmSU3AQC34.
ECDSA key fingerprint is MD5:d7:cd:5c:29:db:b0:b1:33:47:fe:9a:91:48:f1:32:5c.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '10.2.151.138' (ECDSA) to the list of known hosts.
[email protected]'s password:
id_rsa.pub 100% 403 72.3KB/s 00:00
[root@localhost-slave1 ~]#
master:
[root@localhost ~]# cat ~/id_rsa.pub >> ~/.ssh/authorized_keys
[root@localhost ~]# ll
total 7237648
-rw-------. 1 root root 1774 Dec 30 2017 anaconda-ks.cfg
-rwxr-xr-x. 1 root root 7411329024 Mar 20 14:20 CentOS-7-aarch64-Everything.iso
-rwxrwxrwx. 1 root root 74 May 24 15:04 force-eth0-100Mbps.sh
-rw-r--r--. 1 root root 403 Aug 3 14:21 id_rsa.pub
-rwxr-xr-x. 1 root root 493 May 30 15:08 lvm-resize-sda.sh
drwxr-xr-x. 2 root root 4096 Apr 19 10:41 updates
drwxr-xr-x. 2 root root 107 Aug 3 08:49 zhaochuang
[root@localhost ~]# rm -rf id_rsa.pub
e、測試slave無密碼登錄master
從"slave"到"master"無密碼登錄
[root@localhost-slave1 ~]# ssh 10.2.151.138
Last login: Fri Aug 3 13:44:30 2018 from ::1
[root@localhost ~]#
從"Master"到"Slave"無密碼登錄
[root@localhost ~]# ssh 10.2.151.140
Last login: Fri Aug 3 14:11:05 2018 from 10.2.151.138
[root@localhost-slave1 ~]#
到此爲止以實 現master和salve互相之間的無密碼登錄。(其他master和slave節點實現過程同上)
6、安裝Hadoop2.7.4
(1)解壓hadoop2.7.4並創建tmp目錄
$:mkdir /usr/hadoop
$:tar -xvf hadoop-2.7.4-aarch64.tar.gz -C /usr/hadoop
$:cd /usr/hadoop/hadoop-2.7.4
$:mv * ../
$:cd ../
$:rm -rf hadoop-2.7.4
$:mkdir tmp
(2)修改hadoop配置文件
1.配置"/etc/profile"
$:vim /etc/profile
#hadoop
export HADOOP_HOME=/usr/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
使配置文件生效:source /etc/profile
2.配置hadoop-env.sh
$:cd /usr/hadoop/etc/hadoop
$:vim hadoop-env.sh
末尾加
# set java environment
export JAVA_HOME=/usr/java/jdk1.8.0_161
3.配置core-site.xml文件
$:vim core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<!-- file system properties -->
<property>
<name>fs.default.name</name>
<value>hdfs://10.110.151.154:9000</value>
</property>
</configuration>
4.配置hdfs-site.xml文件
$:vim hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
5.配置mapred-site.xml文件
$:vim mapred-site.xml
<property>
<name>mapred.job.tracker</name>
<value>http://10.110.151.154:9001</value>
</property>
6.配置masters文件
去掉"localhost",加入Master機器的IP:10.110.151.154
7.配置slaves文件(Master主機特有)
去掉"localhost",加入集羣中所有Slave機器的IP,也是每行一個:10.2.151.140
7.啓動及驗證
1)格式化HDFS文件系統
cd /usr/hadoop/bin
hadoop namenode -format
2)啓動hadoop
啓動前關閉集羣中所有機器的防火牆,不然會出現datanode開後又自動關閉。
service iptables stop
cd /usr/hadoop/sbin
start-all.sh
3)驗證hadoop
(1)jps
(2)用"hadoop dfsadmin -report"
用這個命令可以查看Hadoop集羣的狀態。
(3)webUI訪問"http:10.110.151.154:50070"