hadoop2.7.2分佈式集羣搭建和生態系統配置

本文只介紹集羣環境安裝配置,其中的功能使用不做過多說明.詳情參考其他資料
集羣沒有配置HA,詳情參考其他資料,或本人接下來的文章
一  版本搭配問題:


Hadoop使用的是目前比較新的穩定版本


hive

27 June 2015 : release 1.2.1 available
This release works with Hadoop 1.x.y, 2.x.y
當然還有2.0.0版本,當時也沒直接選了1.2.1後來也不想改動了hive的其他說明
hive與java,hadoop選擇
Java 1.7
Note:  Hive versions 1.2 onward require Java 1.7 or newer. Hive versions 0.14 to 1.1 work with Java 1.6 as well. Users are strongly advised to start moving to Java 1.8 (see HIVE-8607).  
Hadoop 2.x (preferred), 1.x (not supported by Hive 2.0.0 onward).
Hive versions up to 0.13 also supported Hadoop 0.20.x, 0.23.x.
hive的元數據和mysql選擇
https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin
Database    Minimum Supported Version    Name for Parameter Values
MS SQL      Server    2008 R2    mssql
MySQL       5.6.17       mysql
Oracle       11g          oracle
Postgres    9.1.13    postgres
hive數據存儲整合hbase選擇
https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration
Hive 1.x will remain compatible with HBase 0.98.x and lower versions. Hive 2.x will be compatible with HBase 1.x and higher.


hbase

The 1.1.x series is the current stable release line, it supercedes 1.0.x, 0.98.x and 0.94.x (the 1.0.x, 0.98.x and 0.94.x lines are still seeing a monthly cadence of bug fix releases for those who are not easily able to update). Note that 0.96 was EOL'd September 1st, 2014.
這裏說明1.1.x是穩定版本,但爲何要使用hbase1.2.1看官方說明
http://hbase.apache.org/book.html#java找到第四項Basic Prerequisites查看hbase和java,以及hadoop支持,文中表示hadoop2.7.x不支持hbase1.1.x需要使用hbase1.2.x纔可以,所以選用hbase1.2.1


zookeeper

zookeeper的兼容性最好,所以選了當時的穩定版3.4.8


pig

6 June, 2015: release 0.15.0 available
This release works with Hadoop 0.23.X, 1.X and 2.X


sqoop

由於使用的是hadoop2.x就只能使用對應的sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz 

集羣節點配置

ip   192.168.2.3;192.168.2.10 ;192.168.11

hostname: hadoop;hadoop1;hadoop2

hadoop1:NameNode,SecondaryNameNode,ResourceManager,HMaster

hadoop2:DataNode,NodeManager,HRegionserver

hadoop3:DataNode,NodeManager,HRegionserver


綜上所述,開始搭建集羣

二 版本,路徑和環境變量的設置

[plain] view plain copy
  1. /usr/local/maven/maven-3.3.9  
  2. /usr/local/ant/apache-ant-1.9.7  
  3. /usr/local/java/jdk1.7.0_80  
  4. /usr/local/mysql(5.6 or later)  


/etc/profile
[plain] view plain copy
  1. export MAVEN_HOME=/usr/local/maven/maven-3.3.9  
  2. export ANT_HOME=/usr/local/ant/apache-ant-1.9.7  
  3. export JAVA_HOME=/usr/local/java/jdk1.7.0_80  
  4. export PATH=$PATH:$JAVA_HOME/bin:$ANT_HOME/bin:/usr/local/mysql/bin:$MAVEN_HOME/bin  
  5. export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/jre/lib:$JAVA_HOME/lib/toos.jar:$ANT_HOME/lib/ant-launcher.jar:$ANT_HOME/lib/*.jar  

安裝路徑

[plain] view plain copy
  1. /opt/modules/hadoop-2.7.2  
  2. /opt/modules/hive-1.2.1  
  3. /opt/modules/hbase-1.2.1  
  4. /opt/modules/zookeeper-3.4.8  
  5. /opt/modules/sqoop-1.4.6  


~/.bashrc或者~/.bash_profile
[plain] view plain copy
  1. export HADOOP_HOME=/opt/modules/hadoop-2.7.2  
  2. export HIVE_HOME=/opt/modules/hive-1.2.1  
  3. export SQOOP_HOME=/opt/modules/sqoop-1.4.6  
  4. export HBASE_HOME=/opt/modules/hbase-1.2.1  
  5. export ZOOKEEPER_HOME=/opt/modules/zookeeper-3.4.8  
  6. export PIG_HOME=/opt/modules/pig-0.15.0  
  7. export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$ZOOKEEPER_HOME/bin:$HBASE_HOME/bin:$SQOOP_HOME/bin:$PIG_HOME/bin  
  8. export CLASSPATH=$CLASSPATH:$PIG_HOME/pig-0.15.0-core-h2.jar  

基本工作
集羣節點hostname爲hadoop,hadoop1,hadoop2,虛擬機橋接設置
設置靜態ip    vi /etc/sysconfig/network-scripts/ifcfg-eth0
更改hostname:
Centos
sudo echo hadoop1> /etc/sysconfig/network
ubuntu
sudo echo hadoop> /etc/hostname
ip綁定host
[plain] view plain copy
  1. vim /etc/hosts  
  2. 192.168.2.3    hadoop  
  3. 192.168.2.10    hadoop1  
  4. 192.168.2.11    hadoop2  


關閉防火牆
[plain] view plain copy
  1. service iptables status  
  2. service iptables stop  
  3. vim /ect/sysconfig/selinux  
  4. SELINUX=disabled  


ssh傳輸協議
[plain] view plain copy
  1. ssh-keygen -t rsa  
  2. ssh-copy-id -i /home/hadoop/.ssh/id_rsa.pub ${hostname}  


haodoop配置


/opt/modules/hadoop-2.7.2/etc/hadoop/hadoop-env.sh
[plain] view plain copy
  1. export HADOOP_PREFIX=/opt/modules/hadoop-2.7.2  
  2. export JAVA_HOME=/usr/local/java/jdk1.7.0_80  


/opt/modules/hadoop-2.7.2/etc/hadoop/core-site.xml
[plain] view plain copy
  1. <configuration>  
  2.     <property>  
  3.         <name>fs.defaultFS</name>  
  4.         <value>hdfs://hadoop:9000</value>  
  5.     </property>  
  6.     <property>  
  7.         <name>hadoop.tmp.dir</name>  
  8.         <value>/opt/modules/hadoop-2.7.2/data</value>  
  9.     </property>  
  10.     <property>  
  11.         <name>io.file.buffer.size</name>  
  12.         <value>131072</value>  
  13.     </property>  
  14.     <property>  
  15.         <name>hadoop.proxyuser.hadoop.hosts</name>  
  16.         <value>*</value>  
  17.     </property>  
  18.     <property>  
  19.         <name>hadoop.proxyuser.hadoop.groups</name>  
  20.         <value>*</value>  
  21.     </property>  
  22. </configuration>  



/opt/modules/hadoop-2.7.2/etc/hadoop/hdfs-site.xml
[plain] view plain copy
  1. <configuration>  
  2.     <property>  
  3.         <name>dfs.namenode.secondary.http-address</name>  
  4.         <value>hadoop:50090</value>  
  5.     </property>  
  6.     <property>  
  7.         <name>dfs.replication</name>  
  8.         <value>1</value>  
  9.     </property>  
  10.     <property>  
  11.         <name>dfs.permissions.enabled</name>  
  12.         <value>false</value>  
  13.     </property>  
  14.     <property>  
  15.         <name>dfs.blocksize</name>  
  16.         <value>33554432</value>  
  17.     </property>  
  18.   
  19.     <property>  
  20.         <name>dfs.namenode.name.dir</name>  
  21.         <value>file:/opt/modules/hadoop-2.7.2/data/dfs/name</value>  
  22.     </property>  
  23.     <property>  
  24.         <name>dfs.datanode.data.dir</name>  
  25.         <value>file:/opt/modules/hadoop-2.7.2/data/dfs/data</value>  
  26.     </property>  
  27.     <property>  
  28.         <name>dfs.webhdfs.enabled</name>  
  29.         <value>true</value>  
  30.     </property>  
  31. </configuration>  



/opt/modules/hadoop-2.7.2/etc/hadoop/slaves
[plain] view plain copy
  1. hadoop1  
  2. hadoop2  



/opt/modules/hadoop-2.7.2/etc/hadoop/yarn-site.xml
[plain] view plain copy
  1. <configuration>  
  2.   
  3.     <property>  
  4.         <name>yarn.nodemanager.aux-services</name>  
  5.         <value>mapreduce_shuffle</value>  
  6.     </property>  
  7.     <property>  
  8.         <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>  
  9.         <value>org.apache.hadoop.mapred.ShuffleHandler</value>  
  10.     </property>  
  11.     <property>  
  12.         <name>yarn.resourcemanager.address</name>  
  13.         <value>hadoop:8032</value>  
  14.         </property>  
  15.     <property>  
  16.         <name>yarn.resourcemanager.scheduler.address</name>  
  17.         <value>hadoop:8030</value>  
  18.     </property>  
  19.     <property>  
  20.         <name>yarn.resourcemanager.resource-tracker.address</name>    
  21.         <value>hadoop:8031</value>  
  22.     </property>  
  23.     <property>  
  24.         <name>yarn.resourcemanager.admin.address</name>  
  25.         <value>hadoop:8033</value>  
  26.     </property>  
  27.     <property>  
  28.         <name>yarn.resourcemanager.webapp.address</name>  
  29.         <value>hadoop:8088</value>  
  30.     </property>  
  31.   
  32.     <property>  
  33.         <name>yarn.resourcemanager.hostname</name>  
  34.         <value>hadoop</value>  
  35.     </property>  
  36.     <property>  
  37.         <name>yarn.log-aggregation-enable</name>  
  38.         <value>true</value>  
  39.     </property>  
  40.     <property>  
  41.         <name>yarn.log-aggregation.retain-seconds</name>  
  42.         <value>604800</value>  
  43.     </property>  
  44. </configuration>  




/opt/modules/hadoop-2.7.2/etc/hadoop/mapred-site.xml
[plain] view plain copy
  1. <configuration>  
  2.   
  3.     <property>  
  4.         <name>mapreduce.framework.name</name>  
  5.         <value>yarn</value>  
  6.     </property>  
  7.     <property>  
  8.         <name>mapreduce.job.tracker</name>  
  9.         <value>hdfs://hadoop:9001</value>  
  10.         <final>true</final>  
  11.     </property>  
  12.     <property>  
  13.         <name>mapreduce.jobhistory.address</name>  
  14.         <value>hadoop:10020</value>  
  15.     </property>  
  16.     <property>  
  17.         <name>mapreduce.jobhistory.webapp.address</name>  
  18.         <value>hadoop:19888</value>  
  19.     </property>  
  20. </configuration>  

創建路徑

[plain] view plain copy
  1. /opt/modules/hadoop-2.7.2/data/dfs/name  
  2. /opt/modules/hadoop-2.7.2/data/dfs/data  
  3. /opt/modules/hadoop-2.7.2/data/dfs/namesecondary  



hive
/opt/modules/hive-1.2.1/conf/hive-env.sh
[plain] view plain copy
  1. <span style="font-size:12px;">export JAVA_HOME=/usr/local/java/jdk1.7.0_80  
  2. export HADOOP_HOME=/opt/modules/hadoop-2.7.2  
  3. export HIVE_CONF_DIR=/opt/modules/hive-1.2.1/conf</span>  



/opt/modules/hive-1.2.1/conf/hive-site.xml
[plain] view plain copy
  1. <configuration>  
  2.     <property>  
  3.         <name>hive.metastore.warehouse.dir</name>  
  4.         <value>/user/hive/warehouse</value>  
  5.     </property>  
  6.     <property>  
  7.         <name>javax.jdo.option.ConnectionURL</name>  
  8.         <value>jdbc:mysql://192.168.2.3:3306/hive</value>  
  9.     </property>  
  10.     <property>  
  11.         <name>javax.jdo.option.ConnectionDriverName</name>  
  12.         <value>com.mysql.jdbc.Driver</value>  
  13.     </property>  
  14.     <property>  
  15.         <name>javax.jdo.option.ConnectionUserName</name>  
  16.         <value>root</value>  
  17.     </property>  
  18.     <property>  
  19.         <name>javax.jdo.option.ConnectionPassword</name>  
  20.         <value>root</value>  
  21.     </property>  
  22.   
  23.     <property>  
  24.         <name>hive.hwi.listen.host</name>  
  25.         <value>0.0.0.0</value>  
  26.     </property>  
  27.     <property>  
  28.         <name>hive.hwi.listen.port</name>  
  29.         <value>9999</value>  
  30.     </property>  
  31.     <property>  
  32.         <name>hive.hwi.war.file</name>  
  33.         <value>lib/hive-hwi-1.2.1.war</value>  
  34.     </property>  
  35.   
  36.     <property>  
  37.         <name>hive.querylog.location</name>  
  38.         <value>/opt/modules/hive/logs</value>  
  39.     </property>  
  40.     <property>      
  41.         <name>hive.aux.jars.path</name>       
  42.         <value>file:///opt/modules/hive-1.2.1/lib/hive-hbase-handler-1.2.1.jar,file:///opt/modules/hive-1.2.1/lib/guava-14.0.1.jar,file:///opt/modules/hive-1.2.1/lib/hbase-common-1.2.1.jar,file:///opt/modules/hive-1.2.1/lib/zookeeper-3.4.8.jar</value>      
  43.     </property>      
  44.     <property>  
  45.         <name>hbase.zookeeper.quorum</name>  
  46.         <value>hadoop:2181,hadoop1:2182,hadoop2:2183</value>  
  47.     </property>  
  48. </configuration>  


首先說明,第一個段配置是設置mysql存儲元數據,首先要安裝mysql5.6或更高版本,並設置賬號登陸權限和操作權限,Linux下源碼安裝配置可以參考本人另外一排文章
第二段設置是配置hive-hwi,這裏需要編譯hive-hwi.war,以及一些必要的jar包,可以參考本人另外一片文章
第三段配置是設置hive整合hbase,這裏需要重新編譯hbase-hadler-1.2.1.jar,並且配置好hbase和zookeeper,並copy相應必要的jar包,詳情看我另一篇文章


zookeeper

/opt/modules/zookeeper-3.4.8/conf/zoo.cfg
[plain] view plain copy
  1. server.1=192.168.2.3:2888:3888  
  2. server.2=192.168.2.10:2888:3888  
  3. server.3=192.168.2.11:2888:3888  


/opt/modules/zookeeper-3.4.8/data/myid
[plain] view plain copy
  1. 1  


hbase
/opt/modules/hbase-1.2.1/conf/hbase-env.sh
[plain] view plain copy
  1. export HBASE_MANAGES_ZK=false  
  2. export JAVA_HOME=/usr/local/java/jdk1.7.0_80  
  3. export HBASE_CLASSPATH=/opt/modules/hadoop-2.7.1/etc/hadoop  



/opt/modules/hbase-1.2.1/conf/hbase-site.sh
[plain] view plain copy
  1. <configuration>  
  2.   <property>  
  3.     <name>hbase.rootdir</name>  
  4.     <value>hdfs://hadoop:9000/user/hbase</value>  
  5.   </property>  
  6.   <property>  
  7.     <name>hbase.cluster.distributed</name>  
  8.     <value>true</value>  
  9.   </property>  
  10.   <property>  
  11.     <name>hbase.zookeeper.property.clientPort</name>  
  12.     <value>2181</value>  
  13.   </property>  
  14.   <property>  
  15.     <name>hbase.zookeeper.quorum</name>  
  16.     <value>hadoop,hadoop1,hadoop2</value>  
  17.   </property>  
  18.   <property>  
  19.     <name>hbase.zookeeper.property.dataDir</name>  
  20.     <value>/opt/modules/hbase-1.2.1/data</value>  
  21.   </property>  
  22.   <property>  
  23.     <name>hbase.zookeeper.session.timeout</name>  
  24.     <value>90000</value>  
  25.   </property>  
  26.   <property>  
  27.     <name>hbase.tmp.dir</name>  
  28.     <value>/opt/modules/hbase-1.2.1/data/tmp</value>  
  29.   </property>  
  30. </configuration  


/opt/modules/hbase-1.2.1/conf/regionservers
[plain] view plain copy
  1. hadoop1  
  2. hadoop2  



sqoop
/opt/modules/sqoop-1.4.6/conf/sqoop-env.sh
[plain] view plain copy
  1. export HADOOP_COMMON_HOME=/opt/modules/hadoop-2.7.2  
  2. export HADOOP_MAPRED_HOME=/opt/modules/hadoop-2.7.2  
  3. export HBASE_HOME=/opt/modules/hbase-1.2.1  
  4. export HIVE_HOME=/opt/modules/hive-1.2.1  
  5. export ZOOCFGDIR=/opt/modules/zookeeper-3.4.8/conf  

/opt/modules/sqoop-1.4.6/bin/configure-sqoop註釋一下內容
[plain] view plain copy
  1. ## Moved to be a runtime check in sqoop.  
  2. # if [ ! -d "${HCAT_HOME}" ]; then  
  3. #   echo "Warning: $HCAT_HOME does not exist! HCatalog jobs will fail."  
  4. #   echo 'Please set $HCAT_HOME to the root of your HCatalog installation.'  
  5. # fi  
  6.   
  7. # if [ ! -d "${ACCUMULO_HOME}" ]; then  
  8. #   echo "Warning: $ACCUMULO_HOME does not exist! Accumulo imports will fail."  
  9. #   echo 'Please set $ACCUMULO_HOME to the root of your Accumulo installation.'  
  10. # fi  
  11.   
  12. # Add HCatalog to dependency list  
  13. # if [ -e "${HCAT_HOME}/bin/hcat" ]; then  
  14. #   TMP_SQOOP_CLASSPATH=${SQOOP_CLASSPATH}:`${HCAT_HOME}/bin/hcat -classpath`  
  15. #   if [ -z "${HIVE_CONF_DIR}" ]; then  
  16. #     TMP_SQOOP_CLASSPATH=${TMP_SQOOP_CLASSPATH}:${HIVE_CONF_DIR}  
  17. #   fi  
  18. #   SQOOP_CLASSPATH=${TMP_SQOOP_CLASSPATH}  
  19. # fi  
  20.   
  21. # Add Accumulo to dependency list  
  22. # if [ -e "$ACCUMULO_HOME/bin/accumulo" ]; then  
  23. #   for jn in `$ACCUMULO_HOME/bin/accumulo classpath | grep file:.*accumulo.*jar | cut -d':' -f2`; do  
  24. #     SQOOP_CLASSPATH=$SQOOP_CLASSPATH:$jn  
  25. #   done  
  26. #   for jn in `$ACCUMULO_HOME/bin/accumulo classpath | grep file:.*zookeeper.*jar | cut -d':' -f2`; do  
  27. #     SQOOP_CLASSPATH=$SQOOP_CLASSPATH:$jn  
  28. #   done  
  29. # fi  

配置完成後講hadoop,hbase,zookeeper分別複製到其他節點,並設置每個節點下zookeeper中的myid編號

傳輸方式爲:

scp -r /opt/modules/hadop-2.7.2  hadoop1:/opt/modules/

其他依次同上,注意文件路徑都要一一對應,因爲複製的配置文件中的路徑都是相同的.

完成後即可啓動驗證.

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章