本篇屬於Hadoop系列環境搭建系列,騰訊雲或百度雲上都有許多搭建好的環境可以直接用。不過親自動手實踐一下,收穫肯定會更多一些。
目錄
(1)軟件環境準備
Hadoop運行環境:即環境中已經能運行Hadoop。可以參見我的上一篇博文:
超詳細的Hadoop3.1.2架構單機、僞分佈式、完全分佈式安裝和配置:https://blog.csdn.net/caojianhua2018/article/details/99174958
Hbase安裝包:可以在http://mirror.bit.edu.cn/apache/hadoop/common/下載
(2)HBASE安裝及配置
1. 將安裝包解壓:
[hadoop@master ~]$ tar -zxvf hbase-2.2.0-bin.tar.gz
2. 設置環境變量,同樣的使用root賬戶修改/etc/profile:
[root@master ~]# vi /etc/profile
#setting for hbase
export HBASE_HOME=/home/hadoop/hbase-2.2.0
export PATH=$HBASE_HOME/bin:$PATH
export PATH=$HBASE_HOME/lib:$PATH
保存後,使用source /etc/profile使其生效。
3. 配置文件修改,進入hbase安裝目錄下的conf文件夾,主要修改hbase-env.sh和hbase-site.xml文件。
[hadoop@master]$ cd hbase-2.2.0/conf
[hadoop@master]$ vi hbase-env.sh
# The java implementation to use. Java 1.8+ required.
export JAVA_HOME=/home/hadoop/jdk1.8.0_11
# Extra Java CLASSPATH elements. Optional.
export HBASE_CLASSPATH=/home/hadoop/hadoop-3.1.2/conf
繼續修改hbase-site.xml文件:
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:9000/hbase</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>localhost</value>
<description>The directory shared by RegionServers.
</description>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.master.info.port</name>
<value>16010</value>
</property>
</configuration>
4. 啓動HBASE, 在bin目錄下啓動./start-hbase.sh
[hadoop@master bin]$ ./start-hbase.sh
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-3.1.2/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/hbase-2.2.0/lib/client-facing-thirdparty/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-3.1.2/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/hbase-2.2.0/lib/client-facing-thirdparty/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
localhost: running zookeeper, logging to /home/hadoop/hbase-2.2.0/bin/../logs/hbase-hadoop-zookeeper-master.out
running master, logging to /home/hadoop/hbase-2.2.0/bin/../logs/hbase-hadoop-master-master.out
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-3.1.2/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/hbase-2.2.0/lib/client-facing-thirdparty/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
: running regionserver, logging to /home/hadoop/hbase-2.2.0/bin/../logs/hbase-hadoop-regionserver-master.out
查看進程如下:
[hadoop@master bin]$ jps
10595 NodeManager
19224 HMaster
10473 ResourceManager
10090 DataNode
19642 Jps
9947 NameNode
19371 HRegionServer
19167 HQuorumPeer
其中的Hmaster、HRegionServer和HQuorumPeer都是Hbase的進程,表明已經正常啓動了。
5. 可以從web界面查看狀態。在hbase-site.xm文件中設置了端口號爲16010,因此可以在外部瀏覽器裏輸入IP地址和端口號:
(3)Hbase使用測試
Hbase是一種典型的NOSQL數據庫,與Redis、Mongodb等類似,沒有嚴格的數據模型定義。不像關係型數據庫,模型定義完備,滿足各種範式要求,然後一行一行存儲和讀取,Hbase則是以列來存儲和讀取,每一列有列名、列號和列值,同時還有版本號,也就是這一列的值可以存儲好幾個版本,HBase專門用於大數據的分佈式存儲。所以除非有真正大數據量的需求,HBASE發揮他的特長,一般的數據量來使用hbase還是有點浪費的。
1. 在當前用戶目錄下輸入hbase shell命令,進入hbase操作:
[hadoop@master ~]$ hbase shell
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-3.1.2/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/hbase-2.2.0/lib/client-facing-thirdparty/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
HBase Shell
Use "help" to get list of supported commands.
Use "exit" to quit this interactive shell.
For Reference, please visit: http://hbase.apache.org/2.0/book.html#shell
Version 2.2.0, rUnknown, Tue Jun 11 04:30:30 UTC 2019
Took 0.0016 seconds
hbase(main):001:0> exit
出現了hbase(main):001.0>就可以在後面輸入相關操作命令了。
可以敲入help,看看相關幫助:
hbase(main):003:0> help
HBase Shell, version 2.2.0, rUnknown, Tue Jun 11 04:30:30 UTC 2019
Type 'help "COMMAND"', (e.g. 'help "get"' -- the quotes are necessary) for help on a specific command.
Commands are grouped. Type 'help "COMMAND_GROUP"', (e.g. 'help "general"') for help on a command group.
COMMAND GROUPS:
Group name: general
Commands: processlist, status, table_help, version, whoami
Group name: ddl
Commands: alter, alter_async, alter_status, clone_table_schema, create, describe, disable, disable_all, drop, drop_all, enable, enable_all, exists, get_table, is_disabled, is_enabled, list, list_regions, locate_region, show_filters
Group name: namespace
Commands: alter_namespace, create_namespace, describe_namespace, drop_namespace, list_namespace, list_namespace_tables
Group name: dml
Commands: append, count, delete, deleteall, get, get_counter, get_splits, incr, put, scan, truncate, truncate_preserve
2. 新建namespace。在hbase裏其數據庫名以namespace來代替,可以看做一個業務或項目名稱集合來理解。因此在新建時使用的是create _namespace,查看現有的namespace使用list_namespace命令,刪除是drop_namespace。
hbase(main):004:0> list_namespace
NAMESPACE
default
hbase
stuinfo
3 row(s)
Took 0.0254 seconds
hbase(main):005:0> create_namespace 'sinaWeiboData'
Took 0.4083 seconds
hbase(main):006:0> list_namespace
NAMESPACE
default
hbase
sinaWeiboData
stuinfo
4 row(s)
Took 0.0186 seconds
3. namespace中相關表操作。有了業務名稱namespace如sinaWeiboData,就可以添加相關記錄表了。由於是列存儲方式,因此這裏新建就是列名。
使用create ’namespace:表名', 列名1,列名2方式。創建成功後,可以使用describe方式來查看結構
hbase(main):010:0> create 'sinaWeiboData:logs','user','record'
Created table sinaWeiboData:logs
Took 2.4328 seconds
=> Hbase::Table - sinaWeiboData:logs
hbase(main):011:0> list
TABLE
sinaWeiboData:logs
user
2 row(s)
Took 0.0075 seconds
=> ["sinaWeiboData:logs", "user"]
hbase(main):013:0> describe 'sinaWeiboData:logs'
Table sinaWeiboData:logs is ENABLED
sinaWeiboData:logs
COLUMN FAMILIES DESCRIPTION
{NAME => 'record', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WR
ITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_W
RITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'tru
e', BLOCKSIZE => '65536'}
{NAME => 'user', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRIT
E => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRI
TE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true'
, BLOCKSIZE => '65536'}
2 row(s)
QUOTAS
0 row(s)
Took 0.3043 seconds
4. 新建列記錄。HBase中用put命令添加數據,注意:一次只能爲一個表的一行數據的一個列,也就是一個單元格添加一個數據,所以直接用shell命令插入數據效率很低,在實際應用中,一般都是利用編程操作數據。
例如先往logs表裏增加第一行的列名爲user的第一個記錄,然後再增加第一列的列名爲record的第一條記錄
hbase(main):002:0> put 'sinaWeiboData:logs','1001','user','caojianhua'
Took 0.2171 seconds
hbase(main):003:0> put 'sinaWeiboData:logs','1001','record','visiting all the news and be focused by 333 fans'
Took 0.0106 seconds
5. 查看記錄。使用get命令來查看。格式參考:get 表名、列名,行鍵。
hbase(main):004:0> get 'sinaWeiboData:logs','1001'
COLUMN CELL
record: timestamp=1581038283677, value=visiting all the news and be focused by 333 fans
user: timestamp=1581038245344, value=caojianhua
1 row(s)
Took 0.0481 seconds
也可以使用scan掃描來獲取,不過這個在數據較多時較爲耗時:
hbase(main):005:0> scan 'sinaWeiboData:logs'
ROW COLUMN+CELL
1001 column=record:, timestamp=1581038283677, value=visiting all the news and be focused by 333 fans
1001 column=user:, timestamp=1581038245344, value=caojianhua
1 row(s)
Took 0.0463 seconds