一搭建環境列表
操作系統:centos6.5 64位
JDK環境:jdk1.7.0_71
hadoop版本:社區版本2.7.2,hadoop-2.7.2-src.tar.gz
主機名 |
ip |
角色 |
用戶 |
master1 |
192.168.204.202 |
Namenode;secondary namenode;resourcemanager |
hadoop |
slave1 |
192.168.204.203 |
Datanode; nodemanager |
hadoop |
slave2 |
192.168.204.204 |
Datanode; nodemanager |
hadoop |
二操作系統環境準備
1設置主機名:hostname
vi /etc/sysconfig/network
2設置防火牆
chkconfig iptables off
service iptables off
3關閉Selinux
vi /etc/sysconfig/selinux
SELINUX=disabled
[root@cloud001 Desktop]# hostname [root@cloud001 Desktop]# ifconfig [root@cloud001 Desktop]# service iptables status [root@cloud001 Desktop]# sestatus |
4安裝jdk
配置環境變量
[root@master1 hadoopsolf]vim /etc/profile
JAVA_HOME=/usr/java/jdk1.7.0_71(根據實際情況修改)
CLASSPATH=.:$JAVA_HOME/lib.tools.jar
PATH=$JAVA_HOME/bin:$PATH
export JAVA_HOME CLASSPATH PATH
[root@ master1 hadoopsolf]source /etc/profile
三hadoop2.X軟件編譯環境準備
1下載http://apache.claz.org/hadoop/common/最新版
2準備編譯環境
tar -zxvf hadoop-2.7.2-src.tar.gz得到hadoop-2.7.2-src文件夾。
進入hadoop-2.7.2-src文件夾,查看BUILDING.txt
cd hadoop-2.7.2-src
vim BUILDING.txt
可以看到編譯所需的庫或者工具。
3jdk
安裝jdk;然後打開/etc/profile配置jdk環境變量
export JAVA_HOME=/usr/java/jdk1.7.0_71
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/tools.jar
export JRE_HOME=/usr/java/jdk1.7.0_71
export PATH=$PATH:$JRE_HOME/bin
source /etc/profile
運行javac -version 查看狀態
4安裝各種庫
yum -y install svn ncurses-devel gcc*
yum -y install lzo-devel zlib-develautoconf automake libtool cmake openssl-devel
5安裝protobuf-2.5.0.tar.gz
tar zxvf protobuf-2.5.0.tar.gz進入protobuf-2.5.0依次執行
cd protobuf-2.5.0
./configure
make
make install
查版本[root@master1 protobuf-2.5.0]# protoc -version
6安裝maven
下載http://maven.apache.org/download.cgi
tar -zxvf apache-maven-3.3.9-bin.tar.gz -C /hadoopsolf
然後打開/etc/profile配置環境變量
export MAVEN_HOME=/hadoopsolf/apache-maven-3.3.9
export MAVEN_OPTS="-Xms256m-Xmx512m"
export PATH=$PATH:$MAVEN_HOME/bin
source /etc/profile
查版本:[root@master1 protobuf-2.5.0]# mvn -version
7安裝ant
下載:http://ant.apache.org/bindownload.cgi
tar -zxvf apache-ant-1.9.6-bin.tar.gz -C /hadoopsolf
然後打開/etc/profile配置環境變量
export ANT_HOME=/hadoopsolf/apache-ant-1.9.6
export PATH=$ANT_HOME/bin:$PATH
source /etc/profile
驗證:[root@master1 protobuf-2.5.0]# ant -version
8安裝findbugs
下載:http://findbugs.sourceforge.net/downloads.html
vim /etc/profile 文件末尾添加:
export FINDBUGS_HOME=/hadoopsolf/findbugs-3.0.1
export PATH=$PATH:$FINDBUGS_HOME/bin
source /etc/profile
驗證:[root@master1 protobuf-2.5.0]# findbugs -version
9 hadoop2.X軟件編譯
mvn clean package -Pdist,native -DskipTests -Dtar
或者
mvn package -Pdist,native -DskipTests -Dtar
務必保持網絡暢通,需要經過漫長的等待!
四安裝配置hadoop2.X
1配置ssh免密碼登錄
1.1各主機配置hosts文件與主機名
vi /etc/hosts 192.168.204.202 master1 192.168.204.203 slave1 192.168.204.204 slave2 vi /etc/sysconfig/network NETWORKING=yes HOSTNAME=master1 // slave1, slave2 |
1.2各主機設置靜態ip,互ping
1.3配置master1到2個slave(slave1,slave2)的免密碼登錄(按順序操作)
(1)首先檢查各機器是否已經安裝[root@master1Desktop]# rpm -qa|grep ssh
已經安裝 openssh-askpass-5.3p1-94.el6.x86_64 libssh2-1.4.2-1.el6.x86_64 openssh-5.3p1-94.el6.x86_64 openssh-server-5.3p1-94.el6.x86_64 openssh-clients-5.3p1-94.el6.x86_64 如果沒有安裝則 yum install ssh |
(2)在master1主機:
hadoop用戶執行:ssh-keygen -t rsa下一步繼續到結束
[hadoop@master1 ~]$ cd /home/hadoop/.ssh/ [hadoop@master1 .ssh]$ ls id_rsa id_rsa.pub [hadoop@master1 .ssh]$ cat id_rsa.pub>> authorized_keys |
root用戶執行[master1Desktop]# chmod 600 /home/hadoop/.ssh/authorized_keys
hadoop用戶執行驗證
[hadoop@master1 ~]$ ssh master1 Last login: Mon Feb 22 22:23:16 2016 from master1 [hadoop@master1 ~]$ |
slave機器hadoop用戶執行:mkdir-p /home/hadoop/.ssh
master1主機hadoop用戶執行傳送
[hadoop@master1~]$ scp /home/hadoop/.ssh/authorized_keys hadoop@slave1:/home/hadoop/.ssh/ [hadoop@master1~]$ scp /home/hadoop/.ssh/authorized_keys hadoop@slave2:/home/hadoop/.ssh/ |
slave機器root用戶執行:chmod 600 /home/hadoop/.ssh/authorized_keys
master1機器hadoop用戶執行驗證
[hadoop@master1 ~]$ ssh slave1 [hadoop@master1 ~]$ ssh slave2 |
2安裝hadoop2.X
hadoop用戶在master1操作:
2.1把編譯好的hadoop2.X解壓至目錄
(自定義,我這裏是/home/hadoop)
[hadoop@master1 ~]$ tar -zxvf /hadoopsolf/hadoop-2.7.2.tar.gz -C /home/hadoop
配置hadoop2.X的環境變量,修改~/.bash_profile
vi /home/hadoop/.bash_profile export HADOOP_HOME=/home/hadoop/hadoop-2.7.2 export HADOOP_MAPRED_HOME=${HADOOP_HOME} export HADOOP_COMMON_HOME=${HADOOP_HOME} export HADOOP_HDFS_HOME=${HADOOP_HOME} export YARN_HOME=${HADOOP_HOME} export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop export HDFS_CONF_DIR=${HADOOP_HOME}/etc/hadoop export YARN_CONF_DIR=${HADOOP_HOME}/etc/hadoop export HADOOP_LOG_DIR=${HADOOP_HOME}/logs export HADOOP_PID_DIR=/var/hadoop/pids --- 注意 (root用戶創建/var/hadoop/pids並賦予hadoop權限 mkdir -p /var/hadoop/pids chown -R hadoop: hadoop /var/hadoop/pids ) export PATH=$PATH:$HADOOP_HOME/bin export JAVA_HOME=/usr/java/jdk1.7.0_71 export CLASSPATH=.:$JAVA_HOME/lib.tools.jar export PATH=$JAVA_HOME/bin:$PATH |
2.2配置Hadoop中基礎目錄
cd /home/hadoop/hadoop-2.7.2
$ mkdir -p dfs/name
$ mkdir -p dfs/data
$ mkdir -p tmp
$ cd etc/hadoop
2.3配置Hadoop中配置文件
需要配置的文件如下core-site.xml, hdfs-site.xml, mapred-site.xml, hadoop-env.sh,所有的文件均位於hadoop2.7.2/etc/hadoop下面,具體需要的配置如下:
core-site.xml 配置如下:
<property> <name>hadoop.tmp.dir</name> <value>file:/home/hadoop/hadoop-2.7.2/tmp</value> <description>Abase for other temporary directories.</description> </property> //接收Client連接的RPC端口,用於獲取文件系統metadata信息。 <property> <name>fs.defaultFS</name> <value>hdfs://master1:9000</value> </property> <property> <name>io.file.buffer.size</name> <value>131702</value> </property> |
hdfs-site.xml配置如下:
<property> <name>dfs.namenode.secondary.http-address</name> <value>master1:9001</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/home/hadoop/hadoop-2.7.2/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/home/hadoop/hadoop-2.7.2/dfs/data</value> </property> <property> //2份 <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> |
mapred-site.xml配置如下:(cp mapred-site.xml.template mapred-site.xml)
<property> <name>mapreduce.framework.name</name> // mapreduce運行在yarn上。 <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>master1:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>master1:19888</value> </property> |
yarn-site.xml配置如下:
<property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>master1:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>master1:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>master1:8031</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>master1:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>master1:8088</value> </property> |
Vi slaves:
slave1 slave2 |
vi hadoop-env.sh
export JAVA_HOME=/usr/java/jdk1.7.0_71
vi yarn-env.sh
export JAVA_HOME=/usr/java/jdk1.7.0_71
2.4拷貝
從master1遠程複製文件夾到slaveX機
scp -r /home/hadoop/hadoop-2.7.2 slave1:/home/hadoop
scp -r /home/hadoop/hadoop-2.7.2 slave2:/home/hadoop
3啓動hadoop2.X
master1機器操作
3.1初始化HDFS系統
bin/hdfs namenode -format
3.2開啓NameNode和DataNode守護進程
sbin/start-dfs.sh(此命令啓動了namenode、secondaryNamenode以及datanode)
[hadoop@master1 sbin]$ ./start-dfs.sh Starting namenodes on [master1] master1: Error: JAVA_HOME is not set and could not be found. slave2: Error: JAVA_HOME is not set and could not be found. slave1: Error: JAVA_HOME is not set and could not be found. Starting secondary namenodes [master1] master1: Error: JAVA_HOME is not set and could not be found. 解決 vi /home/hadoop/hadoop-2.7.2/libexec/hadoop-config.sh |
此時在master1上面運行的進程有:namenode secondarynamenode
slaveX上面運行的進程有:datanode
./sbin/start-yarn.sh (此命令啓動了ResourceManager和NodeManager)
此時在master1上面運行的進程有:namenode secondarynamenode resourcemanager
slaveX上面運行的進程有:datanode NodeManager
查看各進程
[hadoop@master1 hadoop-2.7.2]$ jps
8176 Jps
4356 ResourceManager
6277 NameNode
6429 SecondaryNameNode
3.3基本狀態查看
查看幫助:
[hadoop@master1 hadoop-2.7.2]$./hdfs –help
[hadoop@master1 hadoop-2.7.2]$./hdfs dfs –help
[hadoop@master1 bin]$ hdfs dfsadmin -help
查看集羣狀態:./bin/hdfs dfsadmin –report
查看文件塊組成:./bin/hdfs fsck / -files -blocks
可以通過登錄Web控制檯,查看HDFS集羣狀態: http://master1:50070 (hdfs-site.xml)
ResourceManager運行在主節點master上,查看yarn: http://master1:8088 (yarn-site.xml)
NodeManager運行在從節點上,查看例如節點slave1:http://slave1:8042/
管理JobHistory Server(先要啓動mr-jobhistory-daemon.shstart historyserver),通過Web查看:http://master1:19888/jobhistory
查看hadoop:http://master1:9001