虛擬機準備3臺,分別是hadoop01 hadoop02 hadoop03,所裝系統爲centos7
1.修改主機名
vim /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=hadoop01
NETWORKING_IPV6=no
PEERNTP=no
vim /etc/hostname
hadoop01
2.主機名映射
vim /etc/hosts(3臺機子都改)
192.168.133.xxx hadoop01
192.168.133.xxx hadoop02
192.168.133.xxx hadoop03
修改C:\Windows\System32\drivers\etc\hosts 文件(便於後期本機用主機名訪問集羣服務)
192.168.133.xxx hadoop01
192.168.133.xxx hadoop02
192.168.133.xxx hadoop03
3.設置靜態ip(3臺機子都改)
vim /etc/sysconfig/network-scripts/ifcfg-ens33
TYPE="Ethernet"
PROXY_METHOD="none"
BROWSER_ONLY="no"
BOOTPROTO="static"
DEFROUTE="yes"
IPV4_FAILURE_FATAL="no"
IPV6INIT="yes"
IPV6_AUTOCONF="yes"
IPV6_DEFROUTE="yes"
IPV6_FAILURE_FATAL="no"
IPV6_ADDR_GEN_MODE="stable-privacy"
NAME="ens33"
UUID="bfaae4ba-2275-4a2c-85db-c94585096a42"
DEVICE="ens33"
ONBOOT="yes"
IPADDR="192.168.133.xxx"
NETMASK="255.255.255.0"
GATEWAY="192.168.133.x"
DNS1="192.168.133.x"
service network restart
4.關閉防火牆(設置開機禁用)
查看狀態:systemctl status firewalld
開啓: systemctl start firewalld.service
重啓:systemctl restart firewalld.service
關閉:systemctl stop firewalld.service
開機禁用:systemctl disable firewalld.service
5.關閉selinux
vim /etc/sysconfig/selinux
修改內容SELINUX=disabled
6.ssh無密碼訪問
ssh-keygen -t rsa(主節點上輸入回車到結束)
ssh-copy-id hadoop01(根據提示輸入密碼)
ssh-copy-id hadoop02(根據提示輸入密碼)
ssh-copy-id hadoop03(根據提示輸入密碼)
7.Linux系統最大打開文件數量設置
查看命令
ulimit -a ## 查看所有
ulimit -n ##查看同時打開的文件數量
ulimit -u ##查看同時的進程數量
修改命令
vim /etc/security/limits.conf(添加下面的內容)
* soft nofile 32768
* hard nofile 1048576
* soft nproc 65536
* hard nproc 65536
* soft memlock unlimited
* hard memlock unlimited
vim /etc/security/limits.d/90-nproc.conf(添加下面的內容)
* soft nproc 65536
8.時鐘同步
選擇一臺機器作爲時間服務器: hadoop01
hadoop01進行操作:
修改ntpd服務的配置參數:
vim /etc/ntp.conf (添加下面的內容)
server 127.127.0.1
fudge 127.127.0.1 stratum 8
啓動ntpd服務:
service ntpd restart
systemctl enable ntpd.service ## 開機啓動服務
創建同步腳本:
vim /opt/date_sync.sh
service ntpd stop
/usr/sbin/ntpdate -u hadoop01
service ntpd start
修改權限:
chmod u+x /opt/date_sync.sh
運行shell腳本:
cd /opt
./date_sync.sh
同步到其他機器:
scp date_sync.sh hadoop02:/opt
scp date_sync.sh hadoop03:/opt
啓動定時任務(所有機器)
crontab -e
0-59/5 * * * * /opt/date_sync.sh
9.重啓機器
10.集羣搭建準備工作(所有機器)
在 /opt 下創建兩個目錄 softwares 和module
softwares 中放所有的包
module中放解壓後的文件
11.安裝jdk(所有機器)
需要卸載系統中已有的jdk,然後重新安裝對應版本的jdk
查看已有的jdk
rpm -qa | grep java ## 查看到包含java的服務
java-1.7.0-openjdk-1.7.0.45-2.4.3.3.el6.x86_64
java-1.6.0-openjdk-1.6.0.0-1.66.1.13.0.el6.x86_64
tzdata-java-2013g-1.el6.noarch
卸載jdk
rpm -e --nodeps python-javapackages-3.4.1-11.el7.noarch java-1.8.0-openjdk-1.8.0.161-2.b14.el7.x86_64
javassist-3.16.1-10.el7.noarch javamail-1.4.6-8.el7.noarch java-1.8.0-openjdk-headless-1.8.0.161-2.b14.el7.x86_64
tzdata-java-2018c-1.el7.noarch javapackages-tools-3.4.1-11.el7.noarch
安裝jdk
cd /opt/softwares/
rpm -ivh jdk-8u11-linux-x64.rpm
配置JAVA_HOME環境變量
vim /etc/profile
export PATH=$PATH:/usr/java/jdk1.8.0_11/bin(不配置jsp命令不可用)
source /etc/profile
12.ZooKeeper-3.4.10集羣安裝
選擇MySQL作爲數據存在的容器,默認使用postgresql
解壓:
tar -zxvf /opt/softwares/zookeeper-3.4.10.tar.gz -C /opt/modules/
cd /opt/modules/zookeeper-3.4.10/conf
cp zoo_sample.cfg zoo.cfg
修改zoo.cfg
vim zoo.cfg
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/opt/modules/zookeeper-3.4.10/data
# the port at which the clients will connect
clientPort=2181
server.1=hadoop01:2888:3888
server.2=hadoop02:2888:3888
server.3=hadoop03:2888:3888
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
設置myid
在/opt/module/zookeeper-3.4.10/data 目錄下創建一個 myid 的文件
touch myid
添加 myid 文件,注意一定要在 linux 裏面創建,在 notepad++裏面很可能亂碼
在文件中添加與 server 對應的編號:如 1
各節點分發:將配置好的文件拷貝到其他機器
scp -r /opt/modules/zookeeper-3.4.10/ root@hadoop02:/opt/modules/
scp -r /opt/modules/zookeeper-3.4.10/ root@hadoop03:/opt/modules/
並分別修改 myid 文件中內容爲 2,3
配置環境變量
export ZOOKEEPER_HOME=/opt/modules/zookeeper-3.4.10
export PATH=$PATH:$ZOOKEEPER_HOME/bin
修改日誌輸出路徑爲指定目錄:
修改zkEnv.sh中的
if [ "x${ZOO_LOG_DIR}" = "x" ]
then
ZOO_LOG_DIR="/opt/modules/zookeeper-3.4.10/log"
fi
if [ "x${ZOO_LOG4J_PROP}" = "x" ]
then
ZOO_LOG4J_PROP="INFO,ROLLINGFILE"
fi
修改log4j.properties中的
zookeeper.root.logger=INFO,ROLLINGFILE
常用操作命令
(1)啓動 zookeeper
zkServer.sh start
(2)查看狀態
zkServer.sh status
(3)停止zookeeper
zkServer.sh stop
啓動若報錯:java.net.NoRouteToHostException: No route to host
一般是防火牆沒關閉
13.搭建hadoop集羣的HA
解壓:
tar -zxvf /opt/softwares/hadoop-2.7.7.tar.gz -C /opt/modules/
修改hadoo-env.sh
export JAVA_HOME=/root/training/jdk1.8.0_144
修改core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://ns1</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/modules/hadoop-2.7.7/data/tmp</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>hadoop01:2181,hadoop02:2181,hadoop03:2181</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
</configuration>
修改hdfs-site.xml
<configuration>
<property>
<name>dfs.nameservices</name>
<value>ns1</value>
</property>
<property>
<name>dfs.ha.namenodes.ns1</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.ns1.nn1</name>
<value>hadoop01:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.ns1.nn1</name>
<value>hadoop01:50070</value>
</property>
<property>
<name>dfs.namenode.rpc-address.ns1.nn2</name>
<value>hadoop02:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.ns1.nn2</name>
<value>hadoop02:50070</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://hadoop01:8485;hadoop02:8485;hadoop03:8485/ns1</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/opt/modules/hadoop-2.7.7/journal</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.ns1</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>
sshfence
shell(/bin/true)
</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/opt/modules/hadoop-2.7.7/data/tmp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/opt/modules/hadoop-2.7.7/data/tmp/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.journalnode.http-address</name>
<value>0.0.0.0:8480</value>
</property>
<property>
<name>dfs.journalnode.rpc-address</name>
<value>0.0.0.0:8485</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>hadoop01:2181,hadoop02:2181,hadoop03:2181</value>
</property>
</configuration>
修改mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>0.0.0.0:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>0.0.0.0:19888</value>
</property>
</configuration>
修改yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.connect.retry-interval.ms</name>
<value>2000</value>
</property>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yrc</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>hadoop01:2181,hadoop02:2181,hadoop03:2181</value>
</property>
<property>
<name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>hadoop01</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>hadoop02</value>
</property>
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>hadoop01:2181,hadoop02:2181,hadoop03:2181</value>
</property>
<property>
<name>yarn.resourcemanager.zk-state-store.address</name>
<value>hadoop01:2181,hadoop02:2181,hadoop03:2181</value>
</property>
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>ns1-yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms</name>
<value>5000</value>
</property>
<property>
<name>yarn.resourcemanager.address.rm1</name>
<value>hadoop01:8132</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address.rm1</name>
<value>hadoop01:8130</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>hadoop01:23188</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm1</name>
<value>hadoop01:8131</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address.rm1</name>
<value>hadoop01:8033</value>
</property>
<property>
<name>yarn.resourcemanager.ha.admin.address.rm1</name>
<value>hadoop01:23142</value>
</property>
<property>
<name>yarn.resourcemanager.address.rm2</name>
<value>hadoop02:8132</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address.rm2</name>
<value>hadoop02:8130</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>hadoop02:23188</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm2</name>
<value>hadoop02:8131</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address.rm2</name>
<value>hadoop02:8033</value>
</property>
<property>
<name>yarn.resourcemanager.ha.admin.address.rm2</name>
<value>hadoop02:23142</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/opt/modules/hadoop-2.7.7/yarn</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/opt/modules/hadoop-2.7.7/logs</value>
</property>
<property>
<name>mapreduce.shuffle.port</name>
<value>23080</value>
</property>
<property>
<name>yarn.client.failover-proxy-provider</name>
<value>org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider</value>
</property>
<property>
<name>yarn.resourcemanager.ha.automatic-failover.zk-base-path</name>
<value>/yarn-leader-election</value>
<description>Optional setting. The default value is /yarn-leader-election</description>
</property>
</configuration>
xml配置文件中不能有漢字
修改slaves
hadoop01
hadoop02
hadoop03
將配置好的hadoop拷貝到其他節點
scp -r /opt/modules/hadoop-2.7.7/ root@hadoop02:/opt/modules/hadoop-2.7.7
scp -r /opt/modules/hadoop-2.7.7/ root@hadoop03:/opt/modules/hadoop-2.7.7
啓動測試集羣
啓動Zookeeper集羣
分別在hadoop001,hadoop002,hadoop003上執行
zkServer.sh start 啓動zookeeper
然後查看狀態 zkServer.sh status
(一個leader,兩個follower)zookeeper正常啓動
格式化HDFS的Zookeeper存儲目錄
在 hadoop01上執行( 只需在一個 zookeeper 節點執行即可 ):hdfs zkfc –formatZK
啓動 JournalNode 集羣
所有 journalnode 節點上分別執行:hadoop-daemon.sh start journalnode
格式化並啓動第一個 NameNode
選擇 hadoop01
格式化當前節點的 namenode 數據:hdfs namenode -format
格式化 journalnode 的數據,這個是 ha 需要做的:hdfs namenode -initializeSharedEdits
啓動當前節點的 namenode 服務:hadoop-daemon.sh start namenode
格式化並啓動第二個 NameNode
在 hadoop02執行:
hadoop001已經格式化過,然後hadoop01上data目錄下的內容同步至 hadoop002
hdfs namenode -bootstrapStandby
hadoop-daemon.sh start namenode
啓動所有DataNode
每個 datanode 上執行:hadoop-daemon.sh start datanode
啓動 ZooKeeperFailoverController
所有 namenode 節點分別執行:hadoop-daemon.sh start zkfc
登陸 namenode 服務器 web 端查看服務器狀態
此時登陸 http://hadoop001:50070與 http://haoop002:50070(windows中沒配置ip則用ip訪問)
其中一個爲 active 另一個爲 standby 狀態。
啓動YARN
在hadoop01上執行:start-yarn.sh
啓動resourcemanager
hadoop02 上啓動 resourcemanager:yarn-daemon.sh start resourcemanage
登陸 resourcemanager 服務器 web 端查看服務器狀態
此時登陸 http://hadoop001:23188與 http://haoop002:23188
其中一個爲 active 另一個爲 standby 狀態。活躍節點可以正常訪問,備用節點會自動跳轉至活躍節點的 web 地址。
http://resourcemanager_ipaddress:23188
啓動Hadoop集羣也可用(start-all.sh)
停止Hadoop集羣也可用(stop-all.sh)
14.安裝mysql(hadoop01上)
選擇MySQL作爲數據存在的容器,默認使用postgresql
mysql的安裝採用源碼編譯的方式
http://dev.mysql.com/doc/refman/5.6/en/linux-installation.html
mysql安裝:
選擇mysql安裝的機器: hadoop01
實際環境中是一個高配的機器,而且數據磁盤做過冗餘
上傳mysql安裝需要的文件
採用源碼安裝(15-20分鐘左右)
解壓:
cd /opt/modules
tar -zxvf /opt/softwares/mysql-5.6.26.tar.gz
安裝必要的服務:
yum -y install gcc gcc-c++ gdb cmake ncurses-devel bison bison-devel
進行編譯:
cd /opt/modules/mysql-5.6.26/
命令如下;
cmake \
-DCMAKE_INSTALL_PREFIX=/usr/local/mysql \
-DMYSQL_DATADIR=/usr/local/mysql/data \
-DSYSCONFDIR=/etc \
-DWITH_INNOBASE_STORAGE_ENGINE=1 \
-DWITH_PARTITION_STORAGE_ENGINE=1 \
-DMYSQL_UNIX_ADDR=/tmp/mysql.sock \
-DMYSQL_TCP_PORT=3306 \
-DDEFAULT_CHARSET=utf8 \
-DDEFAULT_COLLATION=utf8_general_ci
參數含義:
CMAKE_INSTALL_PREFIX: mysql服務的安裝路徑,也就是最終mysql位於的地方
MYSQL_DATADIR: mysql數據存儲目錄,同時一些日誌文件也會存儲在這兒
MYSQL_TCP_PORT: 端口號
DEFAULT_CHARSET/DEFAULT_COLLATION: 字符集
編譯
make 需要20分鐘
make install
mysql配置:
主要配置開機啓動mysql服務,需要配置一些常用的配置項
添加mysql用戶組和用戶
groupadd mysql
useradd -r -g mysql mysql
id mysql ## 查看
mysql初始化
cd /usr/local/mysql/scripts/
./mysql_install_db --basedir=/usr/local/mysql --datadir=/usr/local/mysql/data --user=mysql
期望使用service命令管理mysql
cp /opt/modules/mysql-5.6.26/support-files/mysql.server /etc/init.d/mysql
開機啓動mysql
chkconfig mysql on
把文件內容改成和mysql根目錄下的my.cnf文件內容一致
啓動服務
service mysql start(可能會報/etc/init.d/mysql沒權限,賦執行權限即可)
配置環境變量:
爲了方便操作,把mysql的命令添加到PATH中去
vim /etc/profile
export MYSQL_HOME=/usr/local/mysql
export PATH=$PATH:$MYSQL_HOME/bin
source /etc/profile
設置密碼:
mysql
mysql> set password=password("123456"); ## 設置密碼
Query OK, 0 rows affected (0.01 sec)
mysql> flush privileges; ## 刷新
Query OK, 0 rows affected (0.00 sec)
mysql> exit
Bye
mysql -uroot -p123456(登錄測試)
Windows Navicat連接mysql
mysql -u root -proot
mysql>GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY 'root的連接密碼' WITH GRANT OPTION;
如果報錯,無法連接,則執行下面語句查看是否合適
查看網絡端口信息:
netstat -ntpl
查看防火牆的狀態,發現3306端口的數據包是否是丟棄狀態
iptables -vnL
清除防火牆中鏈中的規則
iptables -F
15.搭建hive集羣
解壓:
tar -zxvf /opt/softwares/apache-hive-2.3.0-bin.tar.gz -C /opt/modules/
修改文件夾名稱:
cd /opt/modules/
mv apache-hive-2.3.0 apache-hive-2.3.0
配置環境變量:
/etc/profile 中配置
export HBASE_HOME=/opt/modules/hbase-2.0.0
export PATH=$PATH:$HBASE_HOME/bin
修改hive-env.xml文件:
將hive-env.sh.template文件複製爲hive-env.sh, 編輯hive-env.xml文件(添加下面內容)
JAVA_HOME=/usr/java/jdk1.8.0_11
HADOOP_HOME=/opt/modules/hadoop-2.7.7
HIVE_HOME=/opt/modules/hive-2.3.0
export HIVE_CONF_DIR=$HIVE_HOME/conf
export CLASSPATH=$CLASSPATH:$JAVA_HOME/lib:$HADOOP_HOME/lib:$HIVE_HOME/lib
export HADOOP_OPTS="-Dorg.xerial.snappy.tempdir=/tmp
-Dorg.xerial.snappy.lib.name=libsnappyjava.jnilib $HADOOP_OPTS"
編輯hive-site.xml文件:
將hive-default.xml.template文件拷貝爲hive-site.xml, 並編輯hive-site.xml文件(刪除所有內容,添加下面內容)
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://hadoop01:3306/metastore?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>123456</value>
<description>password to use against metastore database</description>
</property>
<property>
<name>datanucleus.autoCreateSchema</name>
<value>true</value>
</property>
<property>
<name>datanucleus.autoCreateTables</name>
<value>true</value>
</property>
<property>
<name>datanucleus.autoCreateColumns</name>
<value>true</value>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>
<property>
<name>hive.downloaded.resources.dir</name>
<value>/opt/modules/hive-2.3.0/tmp/resources</value>
<description>Temporary local directory for added resources in the remote file system.</description>
</property>
<property>
<name>hive.exec.dynamic.partition</name>
<value>true</value>
</property>
<property>
<name>hive.exec.dynamic.partition.mode</name>
<value>nonstrict</value>
</property>
<property>
<name>hive.exec.local.scratchdir</name>
<value>/opt/modules/hive-2.3.0/tmp/HiveJobsLog</value>
<description>Local scratch space for Hive jobs</description>
</property>
<property>
<name>hive.downloaded.resources.dir</name>
<value>/opt/modules/hive-2.3.0/tmp/ResourcesLog</value>
<description>Temporary local directory for added resources in the remote file system.</description>
</property>
<property>
<name>hive.querylog.location</name>
<value>/opt/modules/hive-2.3.0/tmp/HiveRunLog</value>
<description>Location of Hive run time structured log file</description>
</property>
<property>
<name>hive.server2.logging.operation.log.location</name>
<value>/opt/modules/hive-2.3.0/tmp/OpertitionLog</value>
<description>Top level directory where operation tmp are stored if logging functionality is enabled</description>
</property>
<property>
<name>hive.server2.thrift.bind.host</name>
<value>hadoop01</value>
</property>
<property>
<name>hive.server2.thrift.port</name>
<value>10000</value>
</property>
<property>
<name>hive.server2.thrift.http.port</name>
<value>10001</value>
</property>
<property>
<name>hive.server2.thrift.http.path</name>
<value>cliservice</value>
</property>
<property>
<name>hive.server2.webui.host</name>
<value>hadoop01</value>
</property>
<property>
<name>hive.server2.webui.port</name>
<value>10002</value>
</property>
<property>
<name>hive.scratch.dir.permission</name>
<value>755</value>
</property>
<property>
<name>hive.server2.enable.doAs</name>
<value>false</value>
</property>
<property>
<name>hive.auto.convert.join</name>
<value>false</value>
</property>
<property>
<name>spark.dynamicAllocation.enabled</name>
<value>true</value>
</property>
<property>
<name>spark.driver.extraJavaOptions</name>
<value>-XX:PermSize=128M -XX:MaxPermSize=512M</value>
</property>
<property>
<name>hive.cli.print.header</name>
<value>true</value>
<description>Whether to print the names of the columns in query output.</description>
</property>
<property>
<name>hive.cli.print.current.db</name>
<value>true</value>
<description>Whether to include the current database in the Hive prompt.</description>
</property>
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
<description>
Enforce metastore schema version consistency.
True: Verify that version information stored in is compatible with one from Hive jars. Also disable automatic schema migration attempt. Users are required to manually migrate schema after Hive upgrade which ensures proper metastore schema migration. (Default)
False: Warn if the version information stored in metastore doesn't match with one from in Hive jars.
</description>
</property>
</configuration>
拷貝JDBC包
將JDBC的jar包放入$HIVE_HOME/lib目錄下:
cp /opt/softwares/mysql-connector-java-5.1.26-bin.jar /opt/modules/hive-2.3.0/lib/
拷貝jline擴展包
將$HIVE_HOME/lib目錄下的jline-2.12.jar包拷貝到$HADOOP_HOME/share/hadoop/yarn/lib目錄下,並刪除$HADOOP_HOME/share/hadoop/yarn/lib目錄下舊版本的jline包
拷貝tools.jar包
複製$JAVA_HOME/lib目錄下的tools.jar到$HIVE_HOME/lib下
執行初始化Hive操作
選用MySQLysql和Derby二者之一爲元數據庫
注意:先查看MySQL中是否有殘留的Hive元數據,若有,需先刪除
schematool -dbType mysql -initSchema ## MySQL作爲元數據庫
其中mysql表示用mysql做爲存儲hive元數據的數據庫,
若不用mysql做爲元數據庫, 則執行
schematool -dbType derby -initSchema ## Derby作爲元數據庫
腳本hive-schema-1.2.1.mysql.sql會在配置的Hive元數據庫中初始化創建表
啓動Metastore服務:
執行Hive前, 須先啓動metastore服務, 否則會報錯
./hive --service metastore
然後打開另一個終端窗口,之後再啓動Hive進程
搭建hive遇見的問題:
message:Version information not found in metastore
修改conf/hive-site.xml 中的 “hive.metastore.schema.verification”值爲 false 即可解決
Access denied for user 'root'@'hadoop01' (using password: YES)
grant all privileges on *.* to root@hadoop01 identified by '123456';
flush privileges;
16.搭建hbase集羣
解壓:
tar -zxvf /opt/softwares/hbase-2.0.0-bin.tar.gz -C /opt/modules/
配置環境變量:
/etc/profile 中配置
export HBASE_HOME=/opt/modules/hbase-2.0.0
export PATH=$PATH:$HBASE_HOME/bin
配置hbase-env.sh
開啓JAVA_HOME配置
export JAVA_HOME=/usr/java/jdk1.8.0_11
關閉HBase自帶的zookeeper,使用zookeeper集羣
export HBASE_MANAGES_ZK=false
配置hbase-site.xml
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://hadoop01:8020/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>hadoop01,hadoop02,hadoop03</value>
</property>
<property>
<name>hbase.temp.dir</name>
<value>/opt/modules/hbase-2.0.0/tmp</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/opt/modules/hbase-2.0.0/tmp/zookeeper</value>
</property>
<property>
<name>hbase.master.info.port</name>
<value>60010</value>
</property>
</configuration>
配置regionservers
hadoop01
hadoop02
hadoop03
配置backup-masters
(conf目錄下)
vim backup-masters
hadoop02
複製Hadoop配置文件hdfs-site.xml到HBase的conf目錄
cp $HADOOP_HOME/etc/hadoop/hdfs-site.xml $HBASE_HOME/conf/
複製文件到所有的regionservers服務器中
scp -r /opt/modules/hbase-2.0.0/ root@hadoop02:/opt/modules/
scp -r /opt/modules/hbase-2.0.0/ root@hadoop03:/opt/modules/
啓動命令:
啓動HBase start-hbase.sh
停止HBase stop-hbase.sh
查看web頁面:http://192.168.133.160:60010/master-status#userTables
17.搭建spark集羣
解壓 :
tar -zxvf /opt/softwares/spark-2.4.0-bin-hadoop2.7.tgz -C /opt/modules
修改spark-env.sh
export JAVA_HOME=/usr/java/jdk1.8.0_11
HADOOP_CONF_DIR=/opt/modules/hadoop-2.7.7/etc/hadoop
SPARK_LOCAL_IP=hadoop01(節點主機名)
export SPARK_LIBRARY_PATH=${SPARK_HOME}/lib
export SCALA_LIBRARY_PATH=${SPARK_HOME}/lib
export SPARK_MASTER_HOST=192.168.133.160
export SPARK_MASTER_PORT=7077
export SPARK_MASTER_WEBUI_PORT=8080
export SPARK_WORKER_CORES=3
export SPARK_WORKER_MEMORY=3G
export SPARK_WORKER_PORT=7078
export SPARK_WORKER_WEBUI_PORT=8081
export SPARK_WORKER_INSTANCES=1
export SPARK_WORKER_MEMORY=1G
export SPARK_HISTORY_OPTS="-Dspark.history.fs.logDirectory=hdfs://hadoop01/spark/job/history"
修改spark-defaults.conf
spark.eventLog.enabled true
spark.eventLog.dir hdfs://hadoop01/spark/job/history
修改slaves
hadoop01
hadoop02
hadoop03
節點分發
scp -r /opt/modules/spark-2.4.0-bin-hadoop2.7/ root@hadoop02:/opt/modules/
scp -r /opt/modules/spark-2.4.0-bin-hadoop2.7/ root@hadoop03:/opt/modules/
配置SPARK_HOME
export SPARK_HOME=/opt/modules/spark-2.4.0-bin-hadoop2.7
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
hdfs創建目錄:
/spark/job/history
查看/spark/job/history文件:
hdfs dfs -chmod -R 755 /spark(授權)
啓動master:start-master.sh
啓動slave: start-slave.sh spark://192.168.133.xxx:7077
啓動spark-shell:spark-shell
測試:run-example SparkPi 查看是否輸出:Pi is roughly 3.14374
查看web頁面;http://192.168.133.160:8080/