hadoop 集羣架構
1.啓動3個linux服務器
IP
192.168.108.135 192.168.108.136 192.168.108.137
主機名
www.hadoop1.org www.hadoop2.org www.hadoop3.org
硬件信息
2G 內存 1CPU 1G 內存 1CPU 1G 內存 1CPU
HDFS:
NameNode
DataNode DataNode DataNode
SecondaryNameNode
YARN:
ResourceManager
NodeManager NodeManager NodeManager
MapReduce:
JobHistoryServer
2.配置映射(假設修改IP不用全部改變 只需修改映射)
/etc/hosts
192.168.108.135 www.hadoop1.org www.hadoop1
192.168.108.136 www.hadoop2.org www.hadoop2
192.168.108.137 www.hadoop3.org www.hadoop3
測試能否ping通
ping www.hadoop1.org
3.配置JAVA環境
可以參考
http://12275610.blog.51cto.com/12265610/1917942
4. 配置hadoop
hdfs
hadoop-env.sh
修改文件的JAVA_HOME環境變量
export JAVA_HOME=/opt/app/jdk1.7.0_79
core-site.xml
<configuration>
<!--指定NameNode地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://www.hadoop1.org:8020</value>
</property>
<!-- 指定臨時數據目錄-->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/app/hadoop-2.5.2/data/tmp</value>
</property>
<!-- 指定垃圾回收-->
<property>
<name>fs.trash.interval</name>
<value>420</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<!-- 指定保存的副本數 這裏三臺機器所以保存3份 -->
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<!-- 指定SecondaryNameNode -->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>www.hadoop3.org:50090</value>
</property>
</configuration>
slaves
<!-- 配置dataNode 和 NodeManager機器 -->
www.hadoop1.org
www.hadoop2.org
www.hadoop3.org
yarn
yarn-env.sh
修改文件的JAVA_HOME環境變量
export JAVA_HOME=/opt/app/jdk1.7.0_79
yarn-site.xml
<configuration>
<!-- 配置yarn中可以運行MapReduce程序-->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!-- 配置ResourceManager節點的地址-->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>www.hadoop2.org</value>
</property>
<!-- 配置每個NodeManager節點 爲4G內存 -->
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>4096</value>
</property>
<!-- 配置每個NodeManager節點 CPU爲4核 -->
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>4</value>
</property>
<!-- 配置開啓日誌 -->
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<!-- 日誌保留時間-->
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>640080</value>
</property>
</configuration>
slaves
<!-- 配置dataNode 和 NodeManager機器 -->
www.hadoop1.org
www.hadoop2.org
www.hadoop3.org
mapduce
mapred-env.sh
mapred-site.xml
<configuration>
<!-- 配置mapreduce在yarn上運行-->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<!-- 配置JobHistory歷史服務監控 rpc地址-->
<property>
<name>mapreduce.jobhistory.address</name>
<value>www.hadoop1.org:10020</value>
</property>
<!-- 配置JobHistory歷史服務監控 web地址-->
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>www.hadoop1.org:19888</value>
</property>
</configuration>
5. 準備分發到各個服務器上
使用Scp命令
1.事先要做ssh無祕鑰登錄 (三臺服務器都要做)
會在當前用戶家目錄下.ssh目錄生成幾個文件
2. ssh-keygen -t rsa (生成祕鑰)
ssh-copy-id www.hadoop1.org
ssh-copy-id www.hadoop2.org
ssh-copy-id www.hadoop3.org
3.測試遠程無祕鑰登錄
ssh www.hadoop1.org
ssh www.hadoop1.org
ssh www.hadoop1.org
4.分發到另外兩臺服務器
scp -r ./hadoop-2.5.2/ [email protected]:/opt/app
scp -r ./hadoop-2.5.2/ [email protected]:/opt/app
6.啓動整個hadoop集羣
1.格式化hdfs
bin/hdfs namenode -format
2.啓動hdfs
sbin/start-dfs.sh
root@www hadoop-2.5.2]# sbin/start-dfs.sh
17/06/23 23:38:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [www.hadoop1.org]
www.hadoop1.org: starting namenode, logging to /opt/app/hadoop-2.5.2/logs/hadoop-root-namenode-www.hadoop1.org.out
www.hadoop3.org: starting datanode, logging to /opt/app/hadoop-2.5.2/logs/hadoop-root-datanode-www.hadoop3.org.out
www.hadoop1.org: starting datanode, logging to /opt/app/hadoop-2.5.2/logs/hadoop-root-datanode-www.hadoop1.org.out
www.hadoop2.org: starting datanode, logging to /opt/app/hadoop-2.5.2/logs/hadoop-root-datanode-www.hadoop2.org.out
Starting secondary namenodes [www.hadoop3.org]
www.hadoop3.org: starting secondarynamenode, logging to /opt/app/hadoop-2.5.2/logs/hadoop-root-secondarynamenode-www.hadoop3.org.out
17/06/23 23:38:47 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
3.使用jps命令查看進程
jps
4.關閉防火牆
service iptables stop
5.訪問http://www.hadoop1.org:50070
(查看hdfs)
(記得在windows配置映射 可以使用switchHost這個工具 比較方便
6 對hadoop hdfs的基本測試
bin/hdfs dfs -mkdir -p /user/root/tmp
bin/hdfs dfs -put ./etc/hadoop/core-site.xml ./etc/hadoop/hdfs-site.xml /user/root/tmp/
bin/hdfs dfs -text /user/root/tmp/core*
7.啓動yarn
sbin/start-yarn.sh
sbin/mr-jobhistory-daemon.sh start historyserver
[root@www hadoop-2.5.2]# sbin/start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /opt/app/hadoop-2.5.2/logs/yarn-root-resourcemanager-www.hadoop2.org.out
www.hadoop3.org: starting nodemanager, logging to /opt/app/hadoop-2.5.2/logs/yarn-root-nodemanager-www.hadoop3.org.out
www.hadoop1.org: starting nodemanager, logging to /opt/app/hadoop-2.5.2/logs/yarn-root-nodemanager-www.hadoop1.org.out
www.hadoop2.org: starting nodemanager, logging to /opt/app/hadoop-2.5.2/logs/yarn-root-nodemanager-www.hadoop2.org.out
8.使用jsp查看
jps
9.訪問 http://www.hadoop2.org:8088/cluster
(查看yarn)
10.對yarn和mapreduce進行測試
bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.2.jar wordcount /user/root/tmp/* /user/root/tmp-out
17/06/24 00:04:02 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/06/24 00:04:03 INFO client.RMProxy: Connecting to ResourceManager at www.hadoop2.org/192.168.108.136:8032
17/06/24 00:04:04 INFO input.FileInputFormat: Total input paths to process : 2
17/06/24 00:04:04 INFO mapreduce.JobSubmitter: number of splits:2
17/06/24 00:04:05 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1498287508587_0001
17/06/24 00:04:05 INFO impl.YarnClientImpl: Submitted application application_1498287508587_0001
17/06/24 00:04:05 INFO mapreduce.Job: The url to track the job: http://www.hadoop2.org:8088/proxy/application_1498287508587_0001/
17/06/24 00:04:05 INFO mapreduce.Job: Running job: job_1498287508587_0001
17/06/24 00:04:19 INFO mapreduce.Job: Job job_1498287508587_0001 running in uber mode : false
17/06/24 00:04:19 INFO mapreduce.Job: map 0% reduce 0%
17/06/24 00:04:46 INFO mapreduce.Job: map 100% reduce 0%
17/06/24 00:05:04 INFO mapreduce.Job: map 100% reduce 100%
17/06/24 00:05:04 INFO mapreduce.Job: Job job_1498287508587_0001 completed successfully
17/06/24 00:05:04 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=2896
FILE: Number of bytes written=297217
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=2671
HDFS: Number of bytes written=1344
HDFS: Number of read operations=9
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=2
Launched reduce tasks=1
Data-local map tasks=2
Total time spent by all maps in occupied slots (ms)=48458
Total time spent by all reduces in occupied slots (ms)=14830
Total time spent by all map tasks (ms)=48458
Total time spent by all reduce tasks (ms)=14830
Total vcore-seconds taken by all map tasks=48458
Total vcore-seconds taken by all reduce tasks=14830
Total megabyte-seconds taken by all map tasks=49620992
Total megabyte-seconds taken by all reduce tasks=15185920
Map-Reduce Framework
Map input records=68
Map output records=235
Map output bytes=3071
Map output materialized bytes=2902
Input split bytes=240
Combine input records=235
Combine output records=180
Reduce input groups=98
Reduce shuffle bytes=2902
Reduce input records=180
Reduce output records=98
Spilled Records=360
Shuffled Maps =2
Failed Shuffles=0
Merged Map outputs=2
GC time elapsed (ms)=665
CPU time spent (ms)=2820
Physical memory (bytes) snapshot=487112704
Virtual memory (bytes) snapshot=2516992000
Total committed heap usage (bytes)=257171456
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=2431
File Output Format Counters
Bytes Written=1344
11.查看統計結果
bin/hdfs dfs -text /user/root/tmp-out/part*
"AS 2
"License"); 2
(the 2
--> 7
2.0 2
<!-- 8
<!--指定NameNode地址 1
</configuration> 2
</property> 5
<?xml 2
<?xml-stylesheet 2
<configuration> 2
<name>dfs.namenode.secondary.http-address</name> 1
<name>dfs.replication</name> 1
<name>fs.defaultFS</name> 1
<name>fs.trash.interval</name> 1
<name>hadoop.tmp.dir</name> 1
<property> 5
<value>/opt/app/hadoop-2.5.2/data/tmp</value> 1
<value>3</value> 1
<value>420</value> 1
<value>hdfs://www.hadoop1.org:8020</value> 1
<value>www.hadoop3.org:50090</value> 1
ANY 2
Apache 2
BASIS, 2
CONDITIONS 2
IS" 2
KIND, 2
LICENSE 2
License 6
License, 2
License. 4
Licensed 2
OF 2
OR 2
Put 2
See 4
Unless 2
Version 2
WARRANTIES 2
WITHOUT 2
You 2
a 2
accompanying 2
agreed 2
an 2
and 2
applicable 2
at 2
by 2
compliance 2
copy 2
distributed 4
either 2
encoding="UTF-8"?> 2
except 2
express 2
file 2
file. 4
for 2
governing 2
href="configuration.xsl"?> 2
http://www.apache.org/licenses/LICENSE-2.0 2
implied. 2
in 6
is 2
language 2
law 2
limitations 2
may 4
not 2
obtain 2
of 2
on 2
or 4
overrides 2
permissions 2
property 2
required 2
site-specific 2
software 2
specific 2
the 14
this 4
to 2
type="text/xsl" 2
under 6
use 2
version="1.0" 2
with 2
writing, 2
you 2
指定SecondaryNameNode 1
指定臨時數據目錄--> 1
指定保存的副本數 1
指定垃圾回收--> 1
這裏三臺機器所以保存3份 1
7.集羣的時間同步
* 找一臺機器
時間服務器
* 所有的機器與這臺機器時間進行定時的同步
比如,每日十分鐘,同步一次時間
以www.hadoop1.org爲時間服務器
修改/etc/ntp.conf
打開註釋
restrict 192.168.108.0 mask 255.255.255.0 nomodify notrap
註釋掉
#server 0.centos.pool.ntp.org iburst
#server 1.centos.pool.ntp.org iburst
#server 2.centos.pool.ntp.org iburst
#server 3.centos.pool.ntp.org iburst
增加
server 127.127.1.0 #local clock
fudge 127.127.1.0 stratum 10
修改 /etc/sysconfig/ntpd
SYNC_HWCLOCK=yes
啓動ntpd服務 開機自動啓動ntpd服務
service ntpd start
chkconfig ntpd on
在另外兩臺服務器
crontab -e (必須在root用戶)
0-59/10 * * * * /usr/sbin/ntpdate www.hadoop1.org
/usr/sbin/ntpdate www.hadoop1.org(代表手動同步)
###############################################################################################################
HA QJM架構
配置QJM要點
share edits
JournalNode
NameNode
Active Standby
Client
Proxy
Fence
同一個時刻只能有一臺NameNode起作用
IP
192.168.108.135 192.168.108.136 192.168.108.137
主機名
www.hadoop1.org www.hadoop2.org www.hadoop3.org
硬件信息
2G 內存 1CPU 1G 內存 1CPU 1G 內存 1CPU
HDFS:
NameNode NameNode
DataNode DataNode DataNode
JournalNode JournalNode JournalNode
配置文件
hdfs
hadoop-env.sh
修改文件的JAVA_HOME環境變量
export JAVA_HOME=/opt/app/jdk1.7.0_79
core-site.xml
<configuration>
<!-- 指定nameservice-->
<property>
<name>fs.defaultFS</name>
<value>hdfs://ns1</value>
</property>
<!-- 指定臨時數據目錄-->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/ha/hadoop-2.5.2/data/tmp</value>
</property>
<!-- 指定垃圾回收-->
<property>
<name>fs.trash.interval</name>
<value>420</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<!-- 指定保存的副本數 這裏三臺機器所以保存3份 -->
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<!-- nameservice-->
<property>
<name>dfs.nameservices</name>
<value>ns1</value>
</property>
<!-- nameservice namenodes-->
<property>
<name>dfs.ha.namenodes.ns1</name>
<value>nn1,nn2</value>
</property>
<!-- NameNode rpc address-->
<property>
<name>dfs.namenode.rpc-address.ns1.nn1</name>
<value>www.hadoop1.org:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.ns1.nn2</name>
<value>www.hadoop3.org:8020</value>
</property>
<!-- NameNode web address-->
<property>
<name>dfs.namenode.http-address.ns1.nn1</name>
<value>www.hadoop1.org:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.ns1.nn2</name>
<value>www.hadoop3.org:50070</value>
</property>
<!-- share edits JournalNode address -->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://www.hadoop1.org:8485;www.hadoop2.org:8485;www.hadoop3.org:8485/ns1</value>
</property>
<!-- journalnode edits dir -->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/opt/ha/hadoop-2.5.2/data/jn_data</value>
</property>
<!-- client proxy -->
<property>
<name>dfs.client.failover.proxy.provider.ns1</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!-- namenode fence method ssh -->
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_rsa</value>
</property>
</configuration>
slaves
<!-- 配置dataNode 和 NodeManager機器 -->
www.hadoop1.org
www.hadoop2.org
www.hadoop3.org
啓動
1.各個節點啓動journalnode
sbin/hadoop-daemon.sh start journalnode
2. 格式化NameNode
bin/hdfs namenode -format
3.啓動一臺NameNode
sbin/hadoop-daemon.sh start namenode
4.同步編輯日誌文件
bin/hdfs namenode -bootstrapStandby
5.啓動另外一臺NameNode
sbin/hadoop-daemon.sh start namenode
6.啓動所有dataNode
sbin/hadoop-daemon.sh start datanode
7.將nn1變成激活狀態
bin/hdfs haadmin -transitionToActive nn1
8.查看nn1狀態
bin/hdfs haadmin -getServiceState nn1
9 配置故障轉移
hdfs-site.xml 增加
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
core-site.xml增加
<property>
<name>ha.zookeeper.quorum</name>
<value>www.hadoop1.org:2181,www.hadoop2.org:2181,www.hadoop3.org:2181</value>
</property>
10
sbin/stop-dfs.sh
bin/zkServer.sh satrt
bin/hdfs zkfc -formatZK
sbin/start-dfs.sh (sbin/hadoop-daemon.sh start zkfc 這裏不用啓動)
11.驗證
kill -9 (namenode pid)
hadoop集羣搭建
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章
hadoop一步步入門
lcyuanjiang
2019-02-23 13:59:15
hadoop搭建之hive安裝
伊人心
2019-02-23 00:41:25
(第3篇)HDFS是什麼?HDFS適合做什麼?我們應該怎樣操作HDFS系統?
I加加
2019-02-23 00:38:18
(第2篇)一篇文章教你輕鬆安裝hadoop
I加加
2019-02-23 00:37:53
(第6.1篇)大數據發展背後的強力推手——HBase分佈式存儲系統
I加加
2019-02-23 00:37:53
(第8篇)實時可靠的開源分佈式實時計算系統——Storm
I加加
2019-02-23 00:37:53
(第7篇)靈活易用易維護的hadoop數據倉庫工具——Hive
I加加
2019-02-23 00:37:51
(第4篇)hadoop之魂--mapreduce計算框架,讓收集的數據產生價值
I加加
2019-02-23 00:37:51
(第1篇)什麼是hadoop大數據?我又爲什麼要寫這篇文章?
I加加
2019-02-23 00:37:51
(第9篇)大數據的的超級應用——數據挖掘-推薦系統
I加加
2019-02-23 00:37:50
(第5篇)避免協作衝突--簡單易接入的Zookeeper
I加加
2019-02-23 00:37:50
hadoop 完全分佈式搭建(帶配置文件)
wangyudiwang
2019-02-23 00:37:11
hbase安裝
chengongliang
2019-02-23 00:28:08
hadoop安裝
chengongliang
2019-02-23 00:28:08
基於spark排序的一種更廉價的實現方案-附基於spark的性能測試
wx58a7bb5e188a6
2019-02-23 00:26:45