一、基礎環境
在Linux上安裝Hadoop之前,需要先安裝兩個程序:
1.1 安裝說明
1. JDK 1.6或更高版本(本文所提到的安裝的是jdk1.7); redHat自帶的jdk一般不用,刪除後重新裝自己需要的
2. SSH(安全外殼協議),推薦安裝MobaXterm_Personal。(功能的,好用)
二、Host配置
由於我搭建Hadoop集羣包含三臺機器,所以需要修改調整各臺機器的hosts文件配置,進入/etc/hosts,配置主機名和ip的映射,命令如下:
vim /etc/hosts
如果沒有足夠的權限,可以切換用戶爲root。
三臺機器的內容統一增加以下host配置:
可以通過hostname來修改服務器名稱爲redHat1,redHat2,redHat3
三、Hadoop的安裝與配置
3.1 創建文件目錄
爲了便於管理,給redHat1的hdfs的NameNode、DataNode及臨時文件,在用戶目錄下創建目錄:
/data/hdfs/name
/data/hdfs/data
/data/hdfs/tmp
然後將這些目錄通過scp命令拷貝到redHat2和redHat3的相同目錄下。
3.2 下載
首先到Apache官網下載Hadoop,從中選擇推薦的下載鏡像,我選擇hadoop-2.7.1的版本,並使用以下命令下載到redHat1機器的
/data目錄:
wget http://archive.apache.org/dist/hadoop/core/hadoop-2.7.1/hadoop-2.7.1.tar.gz
然後使用以下命令將hadoop-2.7.1.tar.gz 解壓縮到/data目錄
tar -zxvf hadoop-2.7.1.tar.gz
3.3 配置環境變量
回到/data目錄,配置hadoop環境變量,命令如下:
vim /etc/profile
在/etc/profile添加如下內容
立刻讓hadoop環境變量生效,執行如下命令:
source /etc/profile
再使用hadoop命令,發現可以有提示了,則表示配置生效了。
hadoop
3.4 Hadoop的配置
進入hadoop-2.7.1的配置目錄:
cd /data/hadoop-2.7.1/etc/hadoop
依次修改core-site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml以及slaves文件。
3.4.1 修改core-site.xml
vim core-site.xml
<configuration> <property> <name>hadoop.tmp.dir</name> <value>file:/data/hdfs/tmp</value> <description>A base for other temporary directories.</description> </property> <property> <name>io.file.buffer.size</name> <value>131072</value> </property> <property> <name>fs.default.name</name> <value>hdfs://redHat1:9000</value> </property> <property> <name>hadoop.proxyuser.root.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.root.groups</name> <value>*</value> </property> </configuration>
注意:hadoop.tmp.dir的value填寫對應前面創建的目錄
3.4.2 修改vim hdfs-site.xml
vim hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/data/hdfs/name</value> <final>true</final> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/data/hdfs/data</value> <final>true</final> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>redHat1:9001</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> <property> <name>dfs.permissions</name> <value>false</value> </property> </configuration>
注意:dfs.namenode.name.dir和dfs.datanode.data.dir的value填寫對應前面創建的目錄
3.4.3 修改vim mapred-site.xml
複製template,生成xml,命令如下:
cp mapred-site.xml.template mapred-site.xml
vim mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
3.4.4 修改vim yarn-site.xml
vim yarn-site.xml
<?xml version="1.0"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <configuration> <!-- Site specific YARN configuration properties --> <property> <name>yarn.resourcemanager.address</name> <value>redHat1:18040</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>redHat1:18030</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>redHat1:18088</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>redHat1:18025</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>redHat1:18141</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce.shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> </configuration>
3.4.5 修改data/hadoop-2.7.1/etc/hadoop/redHat1
將原來的localhost刪除,改成如下內容
vim /data/hadoop-2.7.1/etc/hadoop/slaves
最後,將整個hadoop-2.7.1文件夾及其子文件夾使用scp複製到redHat2和redHat3的相同目錄中:
scp -r /data/hadoop-2.7.1 redHat2:/data
scp -r /data/hadoop-2.7.1 redHat3:/data
四、運行Hadoop
首先要格式化:
hadoop namenode -format
sh ./start-all.sh
查看集羣狀態:
/data/hadoop-2.7.1/bin/hdfs dfsadmin -report
測試yarn:
http://192.168.92.140:18088/cluster/cluster
測試查看HDFS:
http://192.168.92.140:50070/dfshealth.html#tab-overview
重點::配置運行Hadoop中遇見的問題
1 JAVA_HOME未設置?
啓動的時候報:
則需要/data/hadoop-2.7.1/etc/hadoop/hadoop-env.sh,添加JAVA_HOME路徑
要將路徑寫爲絕對路徑,不要用出事自動獲取那種。
2. FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool BP-336454126-127.0.0.1-1419216478581 (storage id DS-445205871-127.0.0.1-50010-1419216613930) service to /192.168.149.128:9000
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException): Datanode denied communication with namenode: DatanodeRegistration(0.0.0.0, storageID=DS-445205871-127.0.0.1-50010-1419216613930, infoPort=50075, ipcPort=50020, storageInfo=lv=-47;cid=CID-41993190-ade1-486c-8fe1-395c1d6f5739;nsid=1679060915;c=0)
原因:
由於本地dfs.data.dir目錄下的數據文件和namenode已知的不一致,導致datanode節點不被namenode接受。
解決:
1,刪除dfs.namenode.name.dir和dfs.datanode.data.dir 目錄下的所有文件
2,修改hosts
cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.149.128 localhost
3,重新格式化:bin/hadoop namenode -format
4,啓動
重新啓動