1.下載jdk並安裝
去官網下就可以了,下完之後把文件移到/opt/Java下
- guo@guo:~/下載$ mv ./hadoop-2.7.2-src.tar.gz /opt/Hadoop/
- mv: 無法創建普通文件"/opt/Hadoop/hadoop-2.7.2.tar.gz": 權限不夠
- guo@guo:~/下載$ su root #你用sudo也可以,我是直接換到root用戶了,寫順手了
- 密碼:
- root@guo:/home/guo/下載# mv ./hadoop-2.7.2.tar.gz /opt/Hadoop/
- root@guo:/home/guo/下載# mv ./jdk-8u73-linux-x64.tar.gz /opt/Java/
- root@guo:/opt# cd Java/
- root@guo:/opt/Java# ll
- 總用量 177072
- drwxr-xr-x 2 root root 4096 3月 14 15:54 ./
- drwxr-xr-x 4 root root 4096 3月 14 15:51 ../
- -rw-rw-r-- 1 guo guo 181310701 3月 14 15:47 jdk-8u73-linux-x64.tar.gz
- root@guo:/opt/Java# tar -zxvf jdk-8u73-linux-x64.tar.gz
修改文件所有者(用戶:用戶組)
- root@guo:/home/guo# chown -R guo:guo /opt/Java/jdk1.8.0_73/
- root@guo:/home/guo# cd /opt/Java/
- root@guo:/opt/Java# ll
- 總用量 177076
- drwxr-xr-x 3 root root 4096 3月 14 15:59 ./
- drwxr-xr-x 4 root root 4096 3月 14 15:51 ../
- drwxr-xr-x 8 guo guo 4096 1月 30 09:53 jdk1.8.0_73/
- -rw-rw-r-- 1 guo guo 181310701 3月 14 15:47 jdk-8u73-linux-x64.tar.gz
設置java環境變量
- sudo gedit /etc/profile
- #java
- export JAVA_HOME=/opt/Java/jdk1.8.0_73
- export JRE_HOME=/opt/Java/jdk1.8.0_73/jre
- export CLASSPATH=$JAVA_HOME/lib
- export PATH=:$PATH:$JAVA_HOME/bin:$JRE_HOME/bin
- source /etc/profile
- guo@guo:~$ java -version
- java version "1.8.0_73"
- Java(TM) SE Runtime Environment (build 1.8.0_73-b02)
- Java HotSpot(TM) 64-Bit Server VM (build 25.73-b02, mixed mode)
恭喜你第一步已經完成了!
2.配置SSH無密碼登錄
注:ssh 用戶名@主機名 ,如果直接ssh 主機名,它會以你那臺機器的當前用戶登錄,所以另一臺機器要有同樣的用戶。
多臺實體機進行通信(發送或者讀取數據,namenode和datanode之間)就是藉助ssh,在通信過程中如果需要操作人員頻繁輸入密碼是不切實際的,所以需要ssh的無密碼登錄。
安裝openssh-server
- guo@guo:~$ su root
- 密碼:
- root@guo:/home/guo# apt-get install openssh-server
- root@guo:/home/guo# ssh-keygen -t rsa
- root@guo:/home/guo# cd ~/.ssh
- root@guo:~/.ssh# ll
- 總用量 16
- drwx------ 2 root root 4096 3月 14 16:20 ./
- drwx------ 4 root root 4096 3月 14 16:20 ../
- -rw------- 1 root root 1679 3月 14 16:20 id_rsa
- -rw-r--r-- 1 root root 390 3月 14 16:20 id_rsa.pub
- root@guo:~/.ssh# cp id_rsa.pub authorized_keys
- root@guo:~/.ssh# ll
- 總用量 20
- drwx------ 2 root root 4096 3月 14 16:22 ./
- drwx------ 4 root root 4096 3月 14 16:20 ../
- -rw-r--r-- 1 root root 390 3月 14 16:22 authorized_keys
- -rw------- 1 root root 1679 3月 14 16:20 id_rsa
- -rw-r--r-- 1 root root 390 3月 14 16:20 id_rsa.pub
修改公鑰權限(一定要改)
- guo@guo:~/.ssh$ chmod 600 authorized_keys #4+2 0 0
- guo@guo:~/.ssh$ ll
- 總用量 56
- drwx------ 2 guo guo 4096 3月 15 18:41 ./
- drwx------ 20 guo guo 4096 3月 15 17:56 ../
- -rw------- 1 guo guo 389 3月 15 18:41 authorized_keys
- -rw------- 1 guo guo 1679 3月 15 18:41 id_rsa
- -rw-r--r-- 1 guo guo 389 3月 15 18:41 id_rsa.pub
- -rw-r--r-- 1 guo guo 444 3月 15 18:37 known_hosts
然後測試是否成功
- guo@guo:~/.ssh$ ssh guo
- Welcome to Ubuntu 15.10 (GNU/Linux 4.2.0-16-generic x86_64)
- * Documentation: https://help.ubuntu.com/
- 0 packages can be updated.
- 0 updates are security updates.
- Last login: Tue Mar 15 18:39:56 2016 from 127.0.0.1
- guo@guo:~$ exit
- 註銷
- Connection to guo closed.
- guo@guo:~/.ssh$
3.Hadoop單機模式配置
去官網下最新的Hadoop(http://apache.opencas.org/hadoop/common/stable/),目前最新的是2.7.2,下載完之後把它放到/opt/Hadoop下
- guo@guo:~/下載$ mv ./hadoop-2.7.2.tar.gz /opt/Hadoop/
- mv: 無法創建普通文件"/opt/Hadoop/hadoop-2.7.2.tar.gz": 權限不夠
- guo@guo:~/下載$ su root
- 密碼:
- root@guo:/home/guo/下載# mv ./hadoop-2.7.2.tar.gz /opt/Hadoop/
- guo@guo:/opt/Hadoop$ sudo tar -zxf hadoop-2.7.2.tar.gz
- [sudo] guo 的密碼:
修改文件所有者(用戶:用戶組)
- root@guo:/opt/Hadoop# chown -R guo:guo /opt/Hadoop/hadoop-2.7.2
- root@guo:/opt/Hadoop# ll
- 總用量 224960
- drwxr-xr-x 4 root root 4096 3月 14 18:14 ./
- drwxr-xr-x 4 root root 4096 3月 14 15:51 ../
- drwxr-xr-x 11 guo guo 4096 3月 14 21:16 hadoop-2.7.2/
設置環境變量
- guo@guo:/opt/Hadoop$ sudo gedit /etc/profile
- #hadoop
- export HADOOP_HOME=/opt/Hadoop/hadoop-2.7.2
- export PATH=$PATH:$HADOOP_HOME/sbin
- export PATH=$PATH:$HADOOP_HOME/bin
- guo@guo:/opt/Hadoop$ source /etc/profile
如果root用戶更新配置後,hadoop用戶無法使用,則切換至hadoop用戶的~/.bash_profile 文件進行修改
修改/opt/Hadoop/hadoop-2.7.2/etc/hadoop下的hadoop-env.sh
- guo@guo:/opt/Hadoop$ cd hadoop-2.7.2
- guo@guo:/opt/Hadoop/hadoop-2.7.2$ cd etc/hadoop/
- guo@guo:/opt/Hadoop/hadoop-2.7.2/etc/hadoop$ sudo gedit ./hadoop-env.sh
- export JAVA_HOME=${JAVA_HOME}#將這個改成JDK路徑,如下
- export JAVA_HOME=/opt/Java/jdk1.8.0_73
- guo@guo:/opt/Hadoop/hadoop-2.7.2/etc/hadoop$ source ./hadoop-env.sh
4.Hadoop僞分佈模式配置
修改core-site.xml
- guo@guo:/opt/Hadoop/hadoop-2.7.2/etc/hadoop$ sudo gedit ./core-site.xml
- [sudo] guo 的密碼:
進入文件後加入以下內容
- <configuration>
- <property>
- <name>fs.default.name</name>
- <value>hdfs://localhost:9000</value>
- </property>
- <property>
- <name>hadoop.tmp.dir</name>
- <value>/opt/Hadoop/hadoop-2.7.2/tmp</value>
- </property>
- </configuration>
- <configuration>
- <property><!--設置副本數1,不寫默認是3份-->
- <name>dfs.replication</name>
- <value>1</value>
- </property>
- </configuration>
拷貝mapred-site.xml.template一份,修改爲mapred-site.xml
- sudo cp mapred-site.xml.template mapred-site.xml
並在裏面加入
- <configuration>
- <property>
- <name>mapreduce.framework.name</name>
- <value>yarn</value>
- </property>
- </configuration>
修改yarn-site.xml
- <configuration>
- <property>
- <name>yarn.nodemanager.aux-services </name>
- <value>mapreduce_shuffle</value>
- </property>
- </configuration>
- guo@guo:/opt/Hadoop/hadoop-2.7.2# hdfs namenode -format
啓動hdfs
- guo@guo:/opt/Hadoop/hadoop-2.7.2# start-dfs.sh
啓動yarn
- guo@guo:/opt/Hadoop/hadoop-2.7.2# start-yarn.sh
查看是否成功啓動
- guo@guo:/opt/Hadoop/hadoop-2.7.2# jps
- 10144 NodeManager
- 9668 SecondaryNameNode
- 9833 ResourceManager
- 9469 DataNode
- 10285 Jps
- 9311 NameNode
如果出現下面的情況
- guo@guo:/opt/Hadoop/hadoop-2.7.2$ jps
- 程序 'jps' 已包含在下列軟件包中:
- * openjdk-7-jdk (您必須啓用main 組件)
- * openjdk-6-jdk (您必須啓用universe 組件)
- * openjdk-8-jdk (您必須啓用universe 組件)
- 請嘗試:sudo apt-get install <選定的軟件包>
則執行
- root@guo:/etc# update-alternatives --install /usr/bin/jps jps /opt/Java/jdk1.8.0_73/bin/jps 1
- update-alternatives: 使用 /opt/Java/jdk1.8.0_73/bin/jps 來在自動模式中提供 /usr/bin/jps (jps)
查看resourcemanager:http://localhost:8088
查看namenode:http://localhost:50070
恭喜你僞分佈模式已經配置好了!
5.hadoop jobhistory配置
hadoop jobhistory記錄下已運行完的MapReduce作業信息並存放在指定的HDFS目錄下,默認情況下是沒有啓動的,需要配置完後手工啓動服務。
mapred-site.xml添加如下配置
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop000:10020</value>
<description>MapReduce JobHistory Server IPC host:port</description>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop000:19888</value>
<description>MapReduce JobHistory Server Web UI host:port</description>
</property>
<property>
<name>mapreduce.jobhistory.done-dir</name>
<value>/history/done</value>
</property>
<property>
<name>mapreduce.jobhistory.intermediate-done-dir</name>
<value>/history/done_intermediate</value>
</property>
啓動history-server:
$HADOOP_HOME/sbin/mr-jobhistory-daemon.sh start historyserver
停止history-server:
$HADOOP_HOME/sbin/mr-jobhistory-daemon.sh stop historyserver
history-server啓動之後,可以通過瀏覽器訪問WEBUI: hadoop000:19888
在hdfs上會生成兩個目錄
hadoop fs -ls /history
drwxrwx--- - spark supergroup 0 2014-10-11 15:11 /history/done
drwxrwxrwt - spark supergroup 0 2014-10-11 15:16 /history/done_intermediate
mapreduce.jobhistory.done-dir(/history/done): Directory where history files are managed by the MR JobHistory Server(已完成作業信息)
mapreduce.jobhistory.intermediate-done-dir(/history/done_intermediate): Directory where history files are written by MapReduce jobs.(正在運行作業信息)
測試:
通過hive查詢city表觀察hdfs文件目錄和hadoop000:19888
hive> select id, name from city;
觀察hdfs文件目錄:
1)歷史作業記錄是按照年/月/日的形式分別存放在相應的目錄(/history/done/2014/10/11/000000);
2)每個作業有2個不同的後綴名的記錄:jhist和xml
hadoop fs -ls /history/done/2014/10/11/000000
-rwxrwx--- 1 spark supergroup 22572 2014-10-11 15:23 /history/done/2014/10/11/000000/job_1413011730351_0002-1413012208648-spark-select+id%2C+name+from+city%28Stage%2D1%29-1413012224777-1-0-SUCCEEDED-root.spark-1413012216261.jhist
-rwxrwx--- 1 spark supergroup 160149 2014-10-11 15:23 /history/done/2014/10/11/000000/job_1413011730351_0002_conf.xml
觀察WEBUI: hadoop000:19888
6.修改防火牆
在虛擬機visualBox裏安裝了centos7系統,並且在該系統裏運行了一個web服務,想通過宿主機或者外網來訪問該服務,總是無法訪問(虛擬機網卡已配置成橋接)
一般情況下iptables已經包含在Linux發行版中.
運行
# service iptables start
查看iptables規則集
下面是沒有定義規劃時iptables的樣子:
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain FORWARD (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
# iptables -I OUTPUT -o eth0 -p tcp --sport 3306 -j ACCEPT
# iptables -I OUTPUT -o eth0 -p tcp --sport 3306 -j DROP
# /etc/rc.d/init.d/iptables save