Hadoop+HBase 集羣搭建

Hadoop+HBase 集羣搭建

圖片.png | center | 400x98.66666666666667

1. 環境準備

說明:本次集羣搭建使用系統版本Centos 7.5 ,軟件版本 V3.1.1。

1.1 配置說明

本次集羣搭建共三臺機器,具體說明下:

主機名
IP
說明
hadoop01
10.0.0.10
DataNode、NodeManager、NameNode
hadoop02
10.0.0.11
DataNode、NodeManager、ResourceManager、SecondaryNameNode
hadoop03
10.0.0.12
DataNode、NodeManager

1.2 機器配置說明

[clsn@hadoop01 /home/clsn]
$cat  /etc/redhat-release
CentOS Linux release 7.5.1804 (Core)

[clsn@hadoop01 /home/clsn]
$uname  -r
3.10.0-862.el7.x86_64

[clsn@hadoop01 /home/clsn]
$sestatus
SELinux status:                 disabled

[clsn@hadoop01 /home/clsn]
$systemctl  status  firewalld.service
● firewalld.service - firewalld - dynamic firewall daemon
   Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled)
   Active: inactive (dead)
     Docs: man:firewalld(1)

[clsn@hadoop01 /home/clsn]
$id clsn
uid=1000(clsn) gid=1000(clsn) 組=1000(clsn)

[clsn@hadoop01 /home/clsn]
$cat  /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
10.0.0.10   hadoop01
10.0.0.11   hadoop02
10.0.0.12   hadoop03

注:本集羣內所有進程均由clsn用戶啓動

1.3 ssh互信配置

ssh-keygen
ssh-copy-id -i ~/.ssh/id_rsa.pub  127.0.0.1
scp -rp ~/.ssh hadoop02:/home/clsn
scp -rp ~/.ssh hadoop03:/home/clsn

1.4 配置jdk

在三臺機器上都需要操作

tar xf jdk-8u191-linux-x64.tar.gz -C  /usr/local/
ln -s /usr/local/jdk1.8.0_191 /usr/local/jdk
sed -i.ori '$a export JAVA_HOME=/usr/local/jdk\nexport PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH\nexport CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib:$JAVA_HOME/lib/tools.jar' /etc/profile
. /etc/profile

2. 安裝hadoop

2.1 安裝包下載(Binary)

wget http://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-3.1.1/hadoop-3.1.1.tar.gz

2.2 安裝

tar xf hadoop-3.1.1.tar.gz -C /usr/local/
ln -s /usr/local/hadoop-3.1.1  /usr/local/hadoop
sudo  chown  -R clsn.clsn /usr/local/hadoop-3.1.1/

3.修改hadoop配置

配置文件全部位於 /usr/local/hadoop/etc/hadoop 文件夾下

3.1 hadoop-env.sh

[clsn@hadoop01 /usr/local/hadoop/etc/hadoop]
$ head hadoop-env.sh
.  /etc/profile
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at

3.2 core-site.xml

[clsn@hadoop01 /usr/local/hadoop/etc/hadoop]
$ cat core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <!-- 指定HDFS老大(namenode)的通信地址 -->
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://hadoop01:9000</value>
    </property>
    <!-- 指定hadoop運行時產生文件的存儲路徑 -->
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/data/tmp</value>
    </property>
</configuration>

3.3 hdfs-site.xml

[clsn@hadoop01 /usr/local/hadoop/etc/hadoop]
$ cat hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <!-- 設置namenode的http通訊地址 -->
    <property>
        <name>dfs.namenode.http-address</name>
        <value>hadoop01:50070</value>
    </property>

    <!-- 設置secondarynamenode的http通訊地址 -->
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>hadoop02:50090</value>
    </property>

    <!-- 設置namenode存放的路徑 -->
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>/data/name</value>
    </property>

    <!-- 設置hdfs副本數量 -->
    <property>
        <name>dfs.replication</name>
        <value>2</value>
    </property>
    <!-- 設置datanode存放的路徑 -->
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>/data/datanode</value>
    </property>

    <property>
        <name>dfs.permissions</name>
        <value>false</value>
    </property>

</configuration>

3.4 mapred-site.xml

[clsn@hadoop01 /usr/local/hadoop/etc/hadoop]
$ cat mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->

<configuration>
    <!-- 通知框架MR使用YARN -->
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>mapreduce.application.classpath</name>
        <value>
        /usr/local/hadoop/etc/hadoop,
        /usr/local/hadoop/share/hadoop/common/*,
        /usr/local/hadoop/share/hadoop/common/lib/*,
        /usr/local/hadoop/share/hadoop/hdfs/*,
        /usr/local/hadoop/share/hadoop/hdfs/lib/*,
        /usr/local/hadoop/share/hadoop/mapreduce/*,
        /usr/local/hadoop/share/hadoop/mapreduce/lib/*,
        /usr/local/hadoop/share/hadoop/yarn/*,
        /usr/local/hadoop/share/hadoop/yarn/lib/*
        </value>
    </property>

</configuration>

3.5 yarn-site.xml

[clsn@hadoop01 /usr/local/hadoop/etc/hadoop]
$ cat yarn-site.xml
<?xml version="1.0"?>

<configuration>
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>hadoop02</value>
    </property>

    <property>
        <description>The http address of the RM web application.</description>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>${yarn.resourcemanager.hostname}:8088</value>
    </property>

    <property>
        <description>The address of the applications manager interface in the RM.</description>
        <name>yarn.resourcemanager.address</name>
        <value>${yarn.resourcemanager.hostname}:8032</value>
    </property>

    <property>
        <description>The address of the scheduler interface.</description>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>${yarn.resourcemanager.hostname}:8030</value>
    </property>

    <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>${yarn.resourcemanager.hostname}:8031</value>
    </property>

    <property>
        <description>The address of the RM admin interface.</description>
        <name>yarn.resourcemanager.admin.address</name>
        <value>${yarn.resourcemanager.hostname}:8033</value>
    </property>

</configuration>

3.6 masters & slaves

echo 'hadoop02' >> /usr/local/hadoop/etc/hadoop/masters
echo 'hadoop03
hadoop01'  >> /usr/local/hadoop/etc/hadoop/slaves

3.7 啓動腳本修改

啓動腳本文件全部位於 /usr/local/hadoop/sbin 文件夾下:
(1)修改 start-dfs.sh stop-dfs.sh 文件添加:

HDFS_DATANODE_USER=clsn
HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=clsn
HDFS_SECONDARYNAMENODE_USER=clsn

(2)修改start-yarn.sh 和 stop-yarn.sh文件添加:

YARN_RESOURCEMANAGER_USER=clsn
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=clsn

4. 啓動前準備

4.1 創建文件目錄

mkdir -p /data/tmp
mkdir -p /data/name
mkdir -p /data/datanode
chown -R clsn.clsn /data 

在集羣內所有機器上都進行創建,也可以複製文件夾

for i in hadoop02 hadoop03
    do 
        sudo scp -rp /data $i:/
done 

4.2 複製hadoop配置到其他機器

for i in hadoop02 hadoop03
    do 
        sudo scp -rp  /usr/local/hadoop-3.1.1 $i:/usr/local/
done 

4.3 啓動hadoop集羣

(1)第一次啓動前需要格式化

/usr/local/hadoop/bin/hdfs namenode -format 

(2)啓動集羣

cd /usr/local/hadoop/sbin 
./start-all.sh

5.集羣啓動成功

(1)使用jps查看集羣中各個角色,是否與預期相一致

[clsn@hadoop01 /home/clsn]
$ pssh  -ih  cluster  "`which jps`"
[1] 11:30:31 [SUCCESS] hadoop03
7947 DataNode
8875 Jps
8383 NodeManager
[2] 11:30:31 [SUCCESS] hadoop01
20193 DataNode
20665 NodeManager
21017 NameNode
22206 Jps
[3] 11:30:31 [SUCCESS] hadoop02
8896 DataNode
9427 NodeManager
10883 Jps
9304 ResourceManager
10367 SecondaryNameNode

(2)瀏覽器訪問http://hadoop02:8088/cluster/nodes
該頁面爲ResourceManager 管理界面,在上面可以看到集羣中的三臺Active Nodes。

圖片.png | center | 747x378

(3) 瀏覽器訪問http://hadoop01:50070/dfshealth.html#tab-datanode
該頁面爲NameNode管理頁面

圖片.png | center | 747x438

6.Hbase配置

image.png | center | 300x73.88059701492537

6.1 部署Hbase包

cd /opt/
wget  http://mirrors.tuna.tsinghua.edu.cn/apache/hbase/1.4.9/hbase-1.4.9-bin.tar.gz
tar xf  hbase-1.4.9-bin.tar.gz -C  /usr/local/
ln -s /usr/local/hbase-1.4.9 /usr/local/hbase

6.2 修改配置文件

6.2.1 hbase-env.sh

# 添加一行
. /etc/profile

6.2.2

[clsn@hadoop01 /usr/local/hbase/conf]
$ cat  hbase-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
<property>
    <name>hbase.rootdir</name>
    <!-- hbase存放數據目錄 -->
    <value>hdfs://hadoop01:9000/hbase/hbase_db</value>
    <!-- 端口要和Hadoop的fs.defaultFS端口一致-->
</property>
<property>
    <name>hbase.cluster.distributed</name> 
    <!-- 是否分佈式部署 -->
    <value>true</value>
</property>
<property>
    <name>hbase.zookeeper.quorum</name>
    <!-- zookooper 服務啓動的節點,只能爲奇數個 -->
    <value>hadoop01,hadoop02,hadoop03</value>
</property>
<property>
    <!--zookooper配置、日誌等的存儲位置,必須爲以存在 -->
    <name>hbase.zookeeper.property.dataDir</name>
    <value>/data/hbase/zookeeper</value>
</property>
<property>
    <!--hbase web 端口 -->
    <name>hbase.master.info.port</name>
    <value>16610</value>
</property>
</configuration>

注意:

zookeeper有這樣一個特性:
集羣中只要有過半的機器是正常工作的,那麼整個集羣對外就是可用的。
也就是說如果有2個zookeeper,那麼只要有1個死了zookeeper就不能用了,因爲1沒有過半,所以2個zookeeper的死亡容忍度爲0;
同理,要是有3個zookeeper,一個死了,還剩下2個正常的,過半了,所以3個zookeeper的容忍度爲1;
再多列舉幾個:2->0 ; 3->1 ; 4->1 ; 5->2 ; 6->2 會發現一個規律,2n和2n-1的容忍度是一樣的,都是n-1,所以爲了更加高效,何必增加那一個不必要的zookeeper

6.2.3 regionservers

[clsn@hadoop01 /usr/local/hbase/conf]
$ cat regionservers
hadoop01
hadoop02
hadoop03

6.2.4 分發配置到其他節點

for i in hadoop02 hadoop03
    do 
        sudo scp -rp  /usr/local/hbase-1.4.9 $i:/usr/local/
done 

6.3 啓動hbase集羣

6.3.1 啓動hbase

[clsn@hadoop01 /usr/local/hbase/bin]
$ sudo  ./start-hbase.sh
hadoop03: running zookeeper, logging to /usr/local/hbase-1.4.9/bin/../logs/hbase-root-zookeeper-hadoop03.out
hadoop02: running zookeeper, logging to /usr/local/hbase-1.4.9/bin/../logs/hbase-root-zookeeper-hadoop02.out
hadoop01: running zookeeper, logging to /usr/local/hbase-1.4.9/bin/../logs/hbase-root-zookeeper-hadoop01.out
running master, logging to /usr/local/hbase-1.4.9/bin/../logs/hbase-root-master-hadoop01.out
hadoop02: running regionserver, logging to /usr/local/hbase-1.4.9/bin/../logs/hbase-root-regionserver-hadoop02.out
hadoop03: running regionserver, logging to /usr/local/hbase-1.4.9/bin/../logs/hbase-root-regionserver-hadoop03.out
hadoop01: running regionserver, logging to /usr/local/hbase-1.4.9/bin/../logs/hbase-root-regionserver-hadoop01.out

訪問http://hadoop01:16610/master-status 查看hbase狀態

image.png | center | 747x445

6.3.2 啓動hbase 客戶端

[clsn@hadoop01 /usr/local/hbase/bin]
$ ./hbase shell  #啓動hbase客戶端
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hbase-1.4.9/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop-3.1.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
HBase Shell
Use "help" to get list of supported commands.
Use "exit" to quit this interactive shell.
Version 1.4.9, rd625b212e46d01cb17db9ac2e9e927fdb201afa1, Wed Dec  5 11:54:10 PST 2018

hbase(main):001:0> create 'clsn','cf'   #創建一個clsn表,一個cf 列簇
0 row(s) in 7.8790 seconds

=> Hbase::Table - clsn    

hbase(main):003:0> list    #查看hbase 所有表
TABLE
clsn
1 row(s) in 0.0860 seconds

=> ["clsn"]
hbase(main):004:0> put 'clsn','1000000000','cf:name','clsn'    #put一條記錄到表clsn,rowkey 爲 1000000000,放到 name列上
0 row(s) in 0.3390 seconds

hbase(main):005:0> put 'clsn','1000000000','cf:sex','male'    #put一條記錄到表clsn,rowkey 爲 1000000000,放到sex列上
0 row(s) in 0.0300 seconds

hbase(main):006:0> put 'clsn','1000000000','cf:age','24'     #put一條記錄到表clsn,rowkey 爲 1000000000,放到age列上
0 row(s) in 0.0290 seconds

hbase(main):007:0> count  'clsn'     
1 row(s) in 0.2100 seconds

=> 1
hbase(main):008:0>  get 'clsn','cf'
COLUMN                        CELL
0 row(s) in 0.1050 seconds

hbase(main):009:0> get 'clsn','1000000000'      #獲取數據
COLUMN                        CELL
 cf:age                       timestamp=1545710530665, value=24
 cf:name                      timestamp=1545710495871, value=clsn
 cf:sex                       timestamp=1545710509333, value=male
1 row(s) in 0.0830 seconds

hbase(main):010:0> list
TABLE
clsn
1 row(s) in 0.0240 seconds

=> ["clsn"]
hbase(main):011:0> drop  clsn
NameError: undefined local variable or method `clsn' for #<Object:0x6f731759>

hbase(main):012:0> drop  'clsn'

ERROR: Table clsn is enabled. Disable it first.

Here is some help for this command:
Drop the named table. Table must first be disabled:
  hbase> drop 't1'
  hbase> drop 'ns1:t1'


hbase(main):013:0> list
TABLE
clsn
1 row(s) in 0.0330 seconds

=> ["clsn"]

hbase(main):015:0> disable 'clsn'
0 row(s) in 2.4710 seconds

hbase(main):016:0> list
TABLE
clsn
1 row(s) in 0.0210 seconds

=> ["clsn"]

7. 參考文獻

https://hadoop.apache.org/releases.html
https://my.oschina.net/orrin/blog/1816023
https://www.yiibai.com/hadoop/
http://blog.fens.me/hadoop-family-roadmap/
http://www.cnblogs.com/Springmoon-venn/p/9054006.html
https://github.com/googlehosts/hosts
http://abloz.com/hbase/book.html

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章