Hadoop安裝與配置
學習Hadoop開發已經有一段時間了,從開始的入門到如今的開發階段,下面是我在學習hadoop過程中所遇到的問題希望對大家有所幫助:
Hadoop的安裝與配置分爲三部分:單機的 僞分佈式 和 集羣
1.首先去官方網站下載最近的hadoop版本(http://mirror.bit.edu.cn/apache/hadoop/common/)
2.解壓 tar -zxvf hadoop-x.y.z(你的hadoop版本) -C /home/chianyu(你想解壓的目錄)
3.安裝Hadoop單機僞分佈式模式:<官網--http://hadoop.apache.org/common/docs/r1.0.3/single_node_setup.html
特別說明:
集羣中各機器上hadoop安裝路徑一致;系統用戶名一致,方便ssh無密碼登陸;設置不同的主機名,以便主機間通信。
前提條件:
sudo apt-get install ssh
sudo apt-get install rsync
ssh localhost
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa //產生公私密鑰 目的是爲了提供無密碼訪問
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
ssh localhost //無密碼訪問
接下來開始配置文件
1./hadoop-1.0.3/conf/core-site.xml:
<property>
<name>hadoop.tmp.dir</name>
<value>/home/chianyu/hadoopdata</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
2./hadoop-1.0.3/conf/hdfs-site.xml:
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
3./hadoop-1.0.3/conf/mapred-site.xml:
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
配置完成之後開始啓動hadoop 當然在啓動前去配置一下環境變量 在終端輸入sudo vim /etc/profile
export JAVA_HOME=/home/chianyu/jdk1.6.0_33
export HADOOP_HOME=/home/chianyu/hadoop-1.0.3
export PATH=$HADOOP_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
輸入source /etc/profile 使之生效
接下來通過start-all.sh來啓動hadoop咯 啓動hadoop之後終端出現如下界面:
starting namenode, logging to /home/chianyu/hadoop-1.0.3/libexec/../logs/hadoop-chianyu-namenode-chenxiaobian-Vostro-260s.out
localhost: starting datanode, logging to /home/chianyu/hadoop-1.0.3/libexec/../logs/hadoop-chianyu-datanode-chenxiaobian-Vostro-260s.out
localhost: starting secondarynamenode, logging to /home/chianyu/hadoop-1.0.3/libexec/../logs/hadoop-chianyu-secondarynamenode-chenxiaobian-Vostro-260s.out
starting jobtracker, logging to /home/chianyu/hadoop-1.0.3/libexec/../logs/hadoop-chianyu-jobtracker-chenxiaobian-Vostro-260s.out
localhost: starting tasktracker, logging to /home/chianyu/hadoop-1.0.3/libexec/../logs/hadoop-chianyu-tasktracker-chenxiaobian-Vostro-260s.out
日誌和服務器狀態查看:
NameNode - http://localhost:50070/
JobTracker - http://localhost:50030/
使用jps命令查看hadoop的五個進程
6753 JobTracker
6679 SecondaryNameNode
6983 Jps
6499 DataNode
6333 NameNode
6921 TaskTracker
說明分佈式集羣配置成功
4.安裝hadoop集羣配置:**NameNode配置(master):192.168.1.1181./hadoop-1.0.3/conf/core-site.xml:<property> <name>hadoop.tmp.dir</name> <value>/home/chianyu/hadoopdata</value></property>
<property> <name>fs.default.name</name> <value>hdfs://192.168.1.118:9000</value> //直接填寫IP地址</property>
<property> <name>dfs.hosts.exclude</name> <value>excludes</value> //可以動態增加DataNode節點</property>
2./hadoop-1.0.3/conf/hdfs-site.xml:<property> <name>dfs.replication</name> <value>1</value></property>
<property> <name>dfs.name.dir</name> <value>/home/chianyu/hdfs/nameDir</value></property>
<property> <name>dfs.data.dir</name> <value>/home/chianyu/hdfs/dataDir</value></property>
3./hadoop-1.0.3/conf/mapred-site.xml:<property> <name>mapred.job.tracker</name> <value>192.168.1.118:9001</value> //直接填寫IP地址</property>
4./hadoop-1.0.3/conf/masters:192.168.1.118
5./hadoop-1.0.3/conf/slaves:192.168.1.129
DataNode配置(slave):192.168.1.129
直接複製NameNode中的1.2.3文件,與文件4.5無關。
日誌和服務器狀態查看:
日誌和服務器狀態查看:
NameNode - http://host:50070/
JobTracker - http://host:50030/
使用jps命令查看hadoop的五個守護進程:
2051 NameNode master
2238 DataNode slave
2429 SecondaryNameNode master
2514 JobTracker master
2707 TaskTracker slave