Linux下Hadoop2.7.1集羣環境的搭建(3臺爲例)

一、基礎環境

在Linux上安裝Hadoop之前,需要先安裝兩個程序:

1.1 安裝說明

1. JDK 1.6或更高版本(本文所提到的安裝的是jdk1.7); redHat自帶的jdk一般不用,刪除後重新裝自己需要的

2. SSH(安全外殼協議),推薦安裝MobaXterm_Personal。(功能的,好用)

二、Host配置

由於我搭建Hadoop集羣包含三臺機器,所以需要修改調整各臺機器的hosts文件配置,進入/etc/hosts,配置主機名和ip的映射,命令如下:

vim /etc/hosts

如果沒有足夠的權限,可以切換用戶爲root。

三臺機器的內容統一增加以下host配置:

可以通過hostname來修改服務器名稱爲redHat1,redHat2,redHat3

三、Hadoop的安裝與配置

3.1 創建文件目錄

爲了便於管理,給redHat1的hdfs的NameNode、DataNode及臨時文件,在用戶目錄下創建目錄:

/data/hdfs/name

/data/hdfs/data

/data/hdfs/tmp

然後將這些目錄通過scp命令拷貝到redHat2和redHat3的相同目錄下。

3.2 下載

首先到Apache官網下載Hadoop,從中選擇推薦的下載鏡像,我選擇hadoop-2.7.1的版本,並使用以下命令下載到redHat1機器的

/data目錄:

wget http://archive.apache.org/dist/hadoop/core/hadoop-2.7.1/hadoop-2.7.1.tar.gz

然後使用以下命令將hadoop-2.7.1.tar.gz 解壓縮到/data目錄

tar -zxvf hadoop-2.7.1.tar.gz

3.3 配置環境變量

回到/data目錄,配置hadoop環境變量,命令如下:

vim /etc/profile

在/etc/profile添加如下內容

立刻讓hadoop環境變量生效,執行如下命令:

source /etc/profile

再使用hadoop命令,發現可以有提示了,則表示配置生效了。

hadoop

3.4 Hadoop的配置

進入hadoop-2.7.1的配置目錄:

cd /data/hadoop-2.7.1/etc/hadoop

依次修改core-site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml以及slaves文件。

3.4.1 修改core-site.xml

vim core-site.xml

 

<configuration>
 <property>
   <name>hadoop.tmp.dir</name>
   <value>file:/data/hdfs/tmp</value>
   <description>A base for other temporary directories.</description>
 </property>
 <property>
   <name>io.file.buffer.size</name>
   <value>131072</value>
 </property>
 <property>
   <name>fs.default.name</name>
   <value>hdfs://redHat1:9000</value>
 </property>
 <property>
   <name>hadoop.proxyuser.root.hosts</name>
   <value>*</value>
 </property>
 <property>
   <name>hadoop.proxyuser.root.groups</name>
   <value>*</value>
 </property>
</configuration>

注意:hadoop.tmp.dir的value填寫對應前面創建的目錄

 

3.4.2 修改vim hdfs-site.xml

vim hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>
 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
 <!--
   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at
 
     http://www.apache.org/licenses/LICENSE-2.0
 
   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License. See accompanying LICENSE file.
 -->
 
 <!-- Put site-specific property overrides in this file. -->
 
<configuration>
 <property>
   <name>dfs.replication</name>
   <value>2</value>
 </property>
 <property>
   <name>dfs.namenode.name.dir</name>
   <value>file:/data/hdfs/name</value>
   <final>true</final>
 </property>
 <property>
   <name>dfs.datanode.data.dir</name>
   <value>file:/data/hdfs/data</value>
   <final>true</final>
 </property>
 <property>
   <name>dfs.namenode.secondary.http-address</name>
   <value>redHat1:9001</value>
 </property>
 <property>
   <name>dfs.webhdfs.enabled</name>
   <value>true</value>
 </property>
 <property>
   <name>dfs.permissions</name>
   <value>false</value>
 </property>
</configuration>

注意:dfs.namenode.name.dir和dfs.datanode.data.dir的value填寫對應前面創建的目錄

 

3.4.3 修改vim mapred-site.xml

複製template,生成xml,命令如下:

cp mapred-site.xml.template mapred-site.xml

vim  mapred-site.xml

 

   

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
 <property>
   <name>mapreduce.framework.name</name>
   <value>yarn</value>
 </property>
</configuration>
 

3.4.4 修改vim yarn-site.xml

vim  yarn-site.xml

 

<?xml version="1.0"?>
 <!--
   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at
 
     http://www.apache.org/licenses/LICENSE-2.0
 
   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License. See accompanying LICENSE file.
 -->
<configuration>

<!-- Site specific YARN configuration properties -->
 <property>
   <name>yarn.resourcemanager.address</name>
   <value>redHat1:18040</value>
 </property>
 <property>
   <name>yarn.resourcemanager.scheduler.address</name>
   <value>redHat1:18030</value>
 </property>
 <property>
   <name>yarn.resourcemanager.webapp.address</name>
   <value>redHat1:18088</value>
 </property>
 <property>
   <name>yarn.resourcemanager.resource-tracker.address</name>
   <value>redHat1:18025</value>
 </property>
 <property>
   <name>yarn.resourcemanager.admin.address</name>
   <value>redHat1:18141</value>
 </property>
 <property>
   <name>yarn.nodemanager.aux-services</name>
   <value>mapreduce.shuffle</value>
 </property>
 <property>
   <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
   <value>org.apache.hadoop.mapred.ShuffleHandler</value>
 </property>
</configuration>

3.4.5 修改data/hadoop-2.7.1/etc/hadoop/redHat1

將原來的localhost刪除,改成如下內容

vim /data/hadoop-2.7.1/etc/hadoop/slaves

最後,將整個hadoop-2.7.1文件夾及其子文件夾使用scp複製到redHat2和redHat3的相同目錄中:

scp -r /data/hadoop-2.7.1 redHat2:/data

scp -r /data/hadoop-2.7.1 redHat3:/data

四、運行Hadoop

首先要格式化:

hadoop namenode -format

 

sh ./start-all.sh

查看集羣狀態:

/data/hadoop-2.7.1/bin/hdfs dfsadmin -report

測試yarn:

http://192.168.92.140:18088/cluster/cluster

測試查看HDFS:

http://192.168.92.140:50070/dfshealth.html#tab-overview

 

重點::配置運行Hadoop中遇見的問題

1 JAVA_HOME未設置?

啓動的時候報:

則需要/data/hadoop-2.7.1/etc/hadoop/hadoop-env.sh,添加JAVA_HOME路徑

要將路徑寫爲絕對路徑,不要用出事自動獲取那種。

2. FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool BP-336454126-127.0.0.1-1419216478581 (storage id DS-445205871-127.0.0.1-50010-1419216613930) service to /192.168.149.128:9000
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException): Datanode denied communication with namenode: DatanodeRegistration(0.0.0.0, storageID=DS-445205871-127.0.0.1-50010-1419216613930, infoPort=50075, ipcPort=50020, storageInfo=lv=-47;cid=CID-41993190-ade1-486c-8fe1-395c1d6f5739;nsid=1679060915;c=0)

 

原因:
由於本地dfs.data.dir目錄下的數據文件和namenode已知的不一致,導致datanode節點不被namenode接受。

解決:

1,刪除dfs.namenode.name.dir和dfs.datanode.data.dir 目錄下的所有文件

2,修改hosts

 cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.149.128 localhost

3,重新格式化:bin/hadoop namenode -format

4,啓動

重新啓動

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章