目錄
MapReduce的配置(主要是給開發用的,程序呢)
需要修改兩個配置文件 mapred-site.xml和yarn-site.xml,在nn01節點安裝之後,需要同步到其他的主機上.
[root@nn01 hadoop]# cd /usr/local/hadoop/etc/hadoop/
[root@nn01 hadoop]# cp mapred-site.xml.template mapred-site.xml
[root@nn01 hadoop]# vim mapred-site.xml
20 <property>
21 <name>mapreduce.framework.name</name> ---指定資源管理的的套件是什麼,我們指定是yarn
22 <value>yarn</value>
23 </property>
解析:分佈式計算框架mapred-site.xml只支持local和yarn兩種,單機使用local,集羣使用yaml
[root@nn01 hadoop]# vim yarn-site.xml
18 <property>
19 <name>yarn.resourcemanager.hostname</name> ----指定那臺主機安裝resourcemanager
20 <value>nn01</value>
21 </property>
22 <property>
23 <name>yarn.nodemanager.aux-services</name> ---使用哪種計算框架,是開發寫的j計算框架的jar包
24 <value>mapreduce_shuffle</value>
25 </property>
[root@nn01 ~]# for i in {61..63}; do scp -r /usr/local/hadoop/ 192.168.1.$i:/usr/local/hadoop/ ; done
[root@nn01 hadoop]# ./sbin/start-yarn.sh ---啓動resourcemanager和nodemanager兩個節點
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-root-resourcemanager-nn01.out
node3: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-node3.out
node1: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-node1.out
node2: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-node2.out
[root@nn01 hadoop]# jps
2647 ResourceManager
1256 NameNode
2906 Jps
1439 SecondaryNameNode
[root@nn01 hadoop]# ssh node1 jps
1315 NodeManager
1411 Jps
1018 DataNode
[root@nn01 hadoop]# ssh node2 jps
983 DataNode
1279 NodeManager
1375 Jps
[root@nn01 hadoop]# ssh node3 jps
1280 NodeManager
1376 Jps
984 DataNode
驗證
[root@nn01 hadoop]# /usr/local/hadoop/bin/yarn node -list
20/06/19 20:35:25 INFO client.RMProxy: Connecting to ResourceManager at nn01/192.168.1.60:8032
Total Nodes:3
Node-Id Node-State Node-Http-Address Number-of-Running-Containers
node1:43208 RUNNING node1:8042 0
node3:40913 RUNNING node3:8042 0
node2:36472 RUNNING node2:8042 0
hadoop統計詞頻
[root@nn01 hadoop]# /usr/local/hadoop/bin/hadoop --help #查看hadoop的一些操作
[root@nn01 hadoop]# /usr/local/hadoop/bin/hadoop fs #查看命令的一些參數
Usage: hadoop fs [generic options]
[-appendToFile <localsrc> ... <dst>]
[root@nn01 hadoop]# /usr/local/hadoop/bin/hadoop fs -ls /
[root@nn01 hadoop]# /usr/local/hadoop/bin/hadoop fs -mkdir /input
[root@nn01 hadoop]# /usr/local/hadoop/bin/hadoop fs -ls /
Found 1 items
drwxr-xr-x - root supergroup 0 2020-06-19 20:37 /input
[root@nn01 hadoop]# /usr/local/hadoop/bin/hadoop fs -ls / #看文件系統裏面的/目錄裏面用什麼
Found 2 items
-rw-r--r-- 2 root supergroup 0 2020-06-19 20:43 /file1
drwxr-xr-x - root supergroup 0 2020-06-19 20:37 /input
[root@nn01 hadoop]# ls
bin etc include lib libexec LICENSE.txt logs NOTICE.txt README.txt sbin share zhy zhy1
[root@nn01 hadoop]# /usr/local/hadoop/bin/hadoop fs -put *.txt /input/ #將本地的*.txt的文件上傳的hadoop的文件系統中
[root@nn01 hadoop]# /usr/local/hadoop/bin/hadoop fs -get /input /tmp/ #將hadoop的文件系統下載到本地的/tmp/
[root@nn01 hadoop]# cd /tmp/
[root@nn01 tmp]# ls
firefox_root input systemd-private-76f2aeadd70b469a8cc6143e521f002a-chronyd.service-OA3zeG
hadoop-root Jetty_nn01_50070_hdfs____.g8rurc systemd-private-c4b2b65a9c924fc19c10511b54b16054-chronyd.service-i7iO6z
hadoop-root-namenode.pid Jetty_nn01_50090_secondary____qe27dn systemd-private-ea72bc68d88548e7a5ce9e5e8c9f6031-chronyd.service-xlIB1q
hadoop-root-secondarynamenode.pid Jetty_nn01_8088_cluster____.bwtsq9 yarn-root-resourcemanager.pid
hsperfdata_root sh-thd-27940862161
[root@nn01 hadoop]# ./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar wordcount /input /ouput #分析數據,將hadoop分佈式文件系統的中的/input的數據進行單詞頻率的統計,統計完畢之後放到/output
[root@nn01 hadoop]# /usr/local/hadoop/bin/hadoop fs -cat /ouput/part-r-00000 查看分析出來的數據
配置node4
[root@node4 ~]# yum -y install java-1.8.0-openjdk-devel
[root@node4 ~]# mkdir /var/hadoop
[root@nn01 ~]# ssh-copy-id 192.168.1.64
[root@nn01 ~]# scp /etc/hosts 192.168.1.64:/etc/
hosts 100% 226 137.3KB/s 00:00
[root@nn01 ~]# cd /usr/local/hadoop/
[root@nn01 hadoop]# vim ./etc/hadoop/slaves
[root@nn01 hadoop]# cat ./etc/hadoop/slaves
node1
node2
node3
node4
[root@nn01 hadoop]# scp -r /usr/local/hadoop/ 192.168.1.64:/usr/local/hadoop/
[root@nn01 hadoop]# for i in node{1..3}; do scp /usr/local/hadoop/etc/hadoop/slaves root@$i:/usr/local/hadoop/etc/hadoop/ ; done
[root@node4 hadoop]# ./bin/hdfs dfsadmin -setBalancerBandwidth 60000000 #設置同步寬帶的信息
Balancer bandwidth is set to 60000000
[root@node4 hadoop]# ./sbin/start-balancer.sh #啓動服務,讓節點自動擴容
starting balancer, logging to /usr/local/hadoop/logs/hadoop-root-balancer-node4.out
Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved
.
[root@node4 hadoop]# cd /usr/local/hadoop/
[root@node4 hadoop]# ./sbin/hadoop-daemon.sh start datanode #開啓datanode節點
[root@node4 hadoop]# /usr/local/hadoop/sbin/yarn-daemon.sh start nodemanager #開啓nodemanager節點
starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-node4.out
[root@node4 hadoop]# jps
1605 Jps
1256 DataNode
1503 NodeManager
starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-node4.out #啓動
[root@node2 hadoop]# ./sbin/start-dfs.sh ---可以選擇重新啓動HDFS集羣,如果集羣中顯示另外的節點信息的時候
[root@nn01 hadoop]# ./bin/hdfs dfsadmin -report ---可以看到出現新的節點
[root@nn01 hadoop]# ./sbin/start-yarn.sh ----可以選擇重新啓動mapreduce集羣,如果集羣中顯示另外的節點信息的時候
[root@nn01 hadoop]# /usr/local/hadoop/bin/yarn node -list --可以看到新的節點信息
[root@node4 ~]# jps
24704 Jps
1256 DataNode
1503 NodeManager
移除節點
基本操作思路:在移除節點的時候,需要先把對應的節點的數據,移除到其他的節點.假設要移除node4節點的數據,我們需要從查node4的DFS數據量,如果這個節點的數據量被移到其他的記得,則可以下線node4節點,下線節點需要在主節點nn01上的配置文件上配置下線節點的信息,在此刷新節點的時候,會讀取下線節點的信息從而執行.如果需要將節點徹底移除節點外,這個時候需要將有關配置的node4節點的信息清楚掉,再次刷新,需要等3-4個小時的時候,主要,如果數據量比較大的時候,在遷移節點信息的時候,會化比較多的時間.如果遷移之後的數據,相差幾十K,這個可以忽略不計.
[root@nn01 hadoop]# ls
bin etc include lib libexec LICENSE.txt logs NOTICE.txt README.txt sbin share zhy zhy1
[root@nn01 hadoop]# ./bin/hdfs dfsadmin -report
Name: 192.168.1.64:50010 (node4)
DFS Used: 8192 (8 KB)
Name: 192.168.1.61:50010 (node1)
DFS Used: 200704 (196 KB)
Name: 192.168.1.63:50010 (node3)
DFS Used: 172032 (168 KB)
Name: 192.168.1.62:50010 (node2)
DFS Used: 364544 (356 KB)
[root@nn01 hadoop]# vim /usr/local/hadoop/etc/hadoop/hdfs-site.xml
33 <property>
34 <name>dfs.hosts.exclude</name> #聲明要移除的hadoop的節點信息路徑
35 <value>/usr/local/hadoop/etc/hadoop/exclude</value>
36 </property>
[root@nn01 hadoop]# cat /usr/local/hadoop/etc/hadoop/exclude #創建該文件,文件寫名需要移除那臺節點
node4
[root@nn01 hadoop]# ./bin/hdfs dfsadmin -refreshNodes #刷新節點
Refresh nodes successful
[root@nn01 hadoop]# ./bin/hdfs dfsadmin -report #查看node4狀態,如果是Decommissioned,則表示同步成功,同時也可以觀察其他節點的存儲數據量在增加,增加的量爲node4的量,這個時候node4節點則可以下線了.
Name: 192.168.1.64:50010 (node4)
Decommission Status : Decommissioned
下線節點:dnf節點和yarn節點
[root@node4 ~]# /usr/local/hadoop/sbin/hadoop-daemon.sh stop datanode
stopping datanode
[root@nn01 hadoop]# cat /usr/local/hadoop/etc/hadoop/slaves
node1
node2
node3
[root@nn01 hadoop]# vim /usr/local/hadoop/etc/hadoop/slaves
[root@nn01 hadoop]# > /usr/local/hadoop/etc/hadoop/exclude
[root@nn01 hadoop]# /usr/local/hadoop/bin/hdfs dfsadmin -refreshNodes
Refresh nodes successful
[root@nn01 hadoop]# /usr/local/hadoop/bin/hdfs dfsadmin -report #需要3-4個小時下線
[root@node4 ~]# /usr/local/hadoop/sbin/yarn-daemon.sh stop nodemanager #需要3-4個小時下線
stopping nodemanager
[root@node4 ~]# /usr/local/hadoop/bin/yarn node -list
NFS配置
概念:當使用hadoop的文件系統操作的時候,對於目錄文件有些太過於繁瑣,這個時候如果可以將hadoop的文件系統掛載到NFS上去操作,這個時候的操作步驟相當於會比較的簡單.
NFS網關用途:用戶可以通過操作系統兼容的本地NFSv3客戶端來瀏覽HDFS文件系統,用戶可以從HDFS文件系統下載文檔到本地文件系統,用戶可以通過掛載點直接流化數據,支持文件附加,但是不支持隨機寫,NFS網關支持NFSv3和容許HDFS作爲客戶端文件系統的一部分被掛載.
配置
[root@nn01 hadoop]# cat /etc/hosts
# ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
192.168.1.60 nn01
192.168.1.61 node1
192.168.1.62 node2
192.168.1.63 node3
192.168.1.64 node4
192.168.1.65 nfsgw
[root@nfsgw ~]# yum -y install java-1.8.0-openjdk java-1.8.0-openjdk-devel
[root@nfsgw ~]# cat /etc/hosts
# ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
192.168.1.60 nn01
192.168.1.61 node1
192.168.1.62 node2
192.168.1.63 node3
192.168.1.65 nfsgw
配置代理用戶NameNode和NFSGW上添加代理用戶,代理用戶的UID,GID,用戶名必須完全相同.如果因特殊原因客戶端的用戶和NFS網關的用戶UID不能保持一致,需要我們配置nfs.map的靜態映射關係.爲了不是那麼麻煩,我們可以在兩臺機器上創建新的用戶即可.
[root@nfsgw ~]# groupadd -g 800 nfsuser
[root@nfsgw ~]# useradd -u 800 -g 800 -r -d /var/hadoop nfsuser
[root@nn01 hadoop]# groupadd -g 800 nfsuser
[root@nn01 hadoop]# useradd -u 800 -g 800 -r -d /var/hadoop nfsuser
在集羣中授權,在集羣的nno1管理節點上
1.停止集羣
[root@nn01 hadoop]# /usr/local/hadoop/sbin/stop-all.sh
2.配置文件,添加授權信息core-site.xml
[root@nn01 hadoop]# vim /usr/local/hadoop/etc/hadoop/core-site.xml
28 <property>
29 <name>hadoop.proxyuser.nfsuser.groups</name>
30 <value>*</value>
31 </property>
32 <property>
33 <name>hadoop.proxyuser.nfsuser.hosts</name>
34 <value>*</value>
35 </property>
3.同步配置文件到所有集羣機器
[root@nn01 ~]# cd /root/.ssh
[root@nn01 .ssh]# ssh-copy-id 192.168.1.65
[root@nn01 hadoop]# for i in node{1..3}
> do
> scp /usr/local/hadoop/etc/hadoop/core-site.xml root@$i:/usr/local/hadoop/etc/hadoop/core-site.xml
> done
core-site.xml 100% 1199 651.0KB/s 00:00
core-site.xml 100% 1199 874.4KB/s 00:00
core-site.xml 100% 1199 2.3MB/s 00:00
4.啓動集羣,驗證
[root@nn01 ~]# /usr/local/hadoop/sbin/start-dfs.sh
[root@nn01 ~]# jps
20899 Jps
20596 NameNode
20790 SecondaryNameNode
[root@nn01 ~]# ssh node1 jps
5856 DataNode
5934 Jps
[root@nn01 ~]# ssh node2 jps
6220 Jps
6142 DataNode
[root@nn01 ~]# ssh node3 jps
5641 Jps
5563 DataNode
[root@nn01 ~]# /usr/local/hadoop/bin/hdfs dfsadmin -report
Live datanodes (3):
配置NFS網關
配置步驟啓動一個新的系統,卸載rpcbind,nfs-utils
配置/etc/hosts,添加所有NameNode和DataNode的主機名和ip對應關係
安裝JAVA運行環境(java-1.8.0-openjdk-devel)
同步NameNode的/usr/local/hadoop到本機
[root@nfsgw ~]# mkdir /var/hadoop
[root@nn01 hadoop]# scp -r /usr/local/hadoop/ 192.168.1.65:/usr/local/hadoop/
配置hdfs-site.xml
[root@nfsgw ~]# cd /usr/local/hadoop/etc/hadoop/
[root@nfsgw hadoop]# vim hdfs-site.xml
38 <property>
39 <name>nfs.exports.allowed.hosts</name>
40 <value>* rw</value>
41 </property>
42 <property>
43 <name>nfs.dump.dir</name>
44 <value>/var/nfstmp</value>
45 </property>
解析:nfs.exports.allowed.hosts默認情況下,export可以被任何客戶端掛載,爲了更好的控制訪問,可以設置屬性,值和字符串對應機器名和訪問策略,通過空格來分割.機器名的格式可以是單一的主機,java的正則表達式或者IPV4地址.
使用rw或ro可以指定導出目錄的讀寫或只讀權限.如果訪問策略沒被提供,默認爲只讀.每個條目使用;來分割
nfs.dump.dir
用戶需要更新文件轉儲目錄參數.NFS客戶端經常重新安排寫操作,順序的寫操作會隨機到達NFS網關.這個目錄常用於臨時存儲無序的寫操作.對於每個文件,無序的寫操作會在他們積累在內存中超過一定值,(如1M)時被轉儲.需要確保有足夠的空間的目錄..
如:應用上傳10個100M,那麼這個轉儲目錄推薦1GB左右的空間,以便每個文件都發生最壞的情況.只有NFS網關需要在設置該屬性後重啓
啓動服務
[root@nfsgw hadoop]# mkdir /var/nfstmp #創建hadoop,並給他設置權限
[root@nfsgw hadoop]# chown nfsuser.nfsuser /var/nfstmp
[root@nfsgw hadoop]# cd /usr/local/hadoop/
[root@nfsgw hadoop]# ll
drwxr-xr-x 2 root root 4096 6月 22 16:12 logs
[root@nfsgw hadoop]# setfacl -m u:nfsuser:rwx logs #給logs目錄設置對應的權限
[root@nfsgw hadoop]# getfacl logs
# file: logs
# owner: root
# group: root
user::rwx
user:nfsuser:rwx
group::r-x
mask::rwx
other::r-x
[root@nfsgw hadoop]# cd /usr/local/hadoop/
[root@nfsgw hadoop]# jps
25303 Jps
[root@nfsgw hadoop]# /usr/local/hadoop/sbin/hadoop-daemon.sh --script ./bin/hdfs start portmap
#一定要要先啓動portmap.因爲nfs是在portmap的時候起來的,如果portmap重啓了,需要將
starting portmap, logging to /usr/local/hadoop/logs/hadoop-root-portmap-nfsgw.out
[root@nfsgw hadoop]# jps
25376 Jps
25332 Portmap
[root@nfsgw hadoop]# sudo -u nfsuser id ##一定要採用nfsuser
uid=800(nfsuser) gid=800(nfsuser) 組=800(nfsuser)
[root@nfsgw hadoop]# sudo -u nfsuser /usr/local/hadoop/sbin/hadoop-daemon.sh --script ./bin/hdfs start nfs3
starting nfs3, logging to /usr/local/hadoop/logs/hadoop-nfsuser-nfs3-nfsgw.out
[root@nfsgw hadoop]# sudo -u nfsuser jps
25432 Nfs3
25484 Jps
用客戶端訪問
注意:目前NFS只能使用v3版本vers=3,只使用TCP作爲傳輸協議,proto=tcp,不支持NLM所以使用nolock.禁用access time的時間更新noatime.禁用acl擴展權限.強烈建議使用安裝選項sync,它可以最小化避免重排序寫入造成不可預測的吞吐量,未指定同步選項可能會導致上傳大文件時出現不可靠行爲.
[root@node4 ~]# yum -y install nfs-utils
[root@node4 ~]# showmount -e 192.168.1.65
Export list for 192.168.1.65:
/ *
[root@node4 ~]# mount -t nfs -o vers=3,proto=tcp,nolock,noacl,noatime,sync 192.168.1.65:/ /mnt
[root@node4 ~]# cd /mnt/
[root@node4 mnt]# ls
file1 input ouput system tmp ---HDFS文件系統
實現開機自動掛載
[root@node4 mnt]# vim /etc/fstab
/dev/vda1 / xfs defaults 0 0
192.168.1.65:/ /mnt/ nfs vers=3,proto=tcp,nolock,noatime,sync,noacl,_netdev 0 0