29hbase&hive&hdfs——好程序

如果不是高可用，是不需要這些的

mr和hbase的結合
TableMapper
TableReducer
TableMapReduceUtil

出錯：Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.client.Scan

解決方案：
1、將hbase的依賴jar包臨時帶入到hadoop的依賴中
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/usr/local/hbase-1.2.1/lib/*

2、將所有的依賴都打到jar包中，但是注意：jar包會有200M
3、在hadoop-env.sh中，將export的命令加入到最後。然後重啓集羣
4、將hbase的所有的依賴$HBASE_HOME/lib下的jar包整個的copy到HADOOP_HOME/lib目錄下。但是容易引起jar包的衝突。不推薦

hbase與hive的結合
整合的目的：
hbase中的表數據在hive中能夠看到
hive中的表數據在hbase中能夠看到

整合步驟：
1、在hive中創建hbase能看到的表

create table if not exists hbase2hive(
uid int,
uname string,
uage int
)
stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
with 
serdeproperties(
"hbase.columns.mapping"=":key,cf1:name,cf1:age"
)
tblproperties("hbase.table.name"="h2h")
;

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.hbase.HTableDescriptor.addFamily(Lorg/apache/hadoop/hbase/HColumnDescriptor;)V
解決：
將hive-hbase的jar重新打包，重新啓動hive

hive中的數據加載：
load data 方式不能加載數據
insert into
select
;

2、如果hbase中存在表，並且存在數據

create EXTERNAL table if not exists hbase2hive2(
uid string,
uname string,
uage int
)
stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
with 
serdeproperties(
"hbase.columns.mapping"=":key,base_info:name,base_info:age,"
)
tblproperties("hbase.table.name"="ns1:t_userinfo")
;

注意事項：
映射hbase中的列，rowKey的映射，要麼就寫:key，要麼不寫，否則列數不匹配
hbase中表存在時，在hive中創建表時需要加關鍵字external
若刪除hbase中對應的表，在hive中就不能查詢出數據。
hbase中的列和hive中的列個數以及類型最好相同，hive與hbase的映射是按照字段的順序來的，而不是按照字段名稱來的。
hbase和hive、mysql等都可以使用第三方工具來相互整合數據（藍燈、shell腳本、phoenix）

協處理器：
observer
endpoint

案例：
create 'ns1:t_guanzhu','cf1','cf2'
create 't_fensi','cf1'

將協處理加載到表：
alter 'ns1:t_guanzhu',METHOD => 'table_att','coprocessor'=>'hdfs://gp1923/demo/gp1923demo-1.0-SNAPSHOT.jar|qfedu.com.bigdata.hbaseObServer.InverIndexCoprocessor|1001|'

hbase需要注意的事項
memstore的刷新閥值：
屬性設置：

<property>
    <name>hbase.hregion.memstore.flush.size</name>
    <value>134217728</value>   128M
    <description>
    Memstore will be flushed to disk if size of the memstore
    exceeds this number of bytes.  Value is checked by a thread that runs
    every hbase.server.thread.wakefrequency.</description>
  </property>

hregion的閥值：

<property>
    <name>hbase.hregion.max.filesize</name>
    <value>10737418240</value>   10G
    <description>
    Maximum HStoreFile size. If any one of a column families' HStoreFiles has
    grown to exceed this value, the hosting HRegion is split in two.</description>
  </property>

regionserver的操作線程數：

<property>
    <name>hbase.regionserver.handler.count</name>
    <value>30</value>
    <description>Count of RPC Listener instances spun up on RegionServers.
    Same property is used by the Master for count of master handlers.</description>
  </property>

客戶端的優化
1、關閉自動刷新
HTable ht = (HTable) table;
ht.setAutoFlush(false,true);

2、儘量批量寫入數據(List<Put> List<Delete>)

3、謹慎關閉寫Log：
ht.setDurability(Durability.SKIP_WAL);

4、儘量將數據放到緩存
hc.setInMemory(true);

5、儘量不要太多列簇，最多2個。
hbase在刷新數據時會將列簇相鄰的列簇同時刷新

6、rowKey的長度儘可能短。最大64KB
7、儘量將該關閉的對象關閉
比如：admin table connection resultScanner 等

rowKey的設計：（應用場景,四大原則）
長度原則
散列原則
排序原則
唯一原則

移動數據：
通話
上網
短信
....

查詢某個用戶本月的通話詳單：
如何設計rowkey：
phonenum_type_year_month_day_timestamp

解決數據的熱點問題：
1、散列
2、加鹽
3、反轉

查詢效率問題：
二級索引

預習：
mr和hbase的結合（總結）
hive和hbase的結合

二級索引
協處理器
rowKey的設計
優化

flume
http://flume.apache.org/

flume的架構

hbase的總結
hbase shell
create alter drop
put get scan delete

tools

java api：
admin
table

rowkey的設計（必懂）
熱點問題
性能問題
二級索引
協處理器
寬表、高表

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

29hbase&hive&hdfs——好程序

基於 Nginx Ingress + 雲效 AppStack 實現灰度發佈

12款高效開源Wiki系統推薦，打造團隊知識管理利器

C語言--右移左移

一個開源且全面的C#算法實戰教程

dotnet 基於 DirectML 控制檯運行 Phi-3 模型

自定義MyBatis插件

一款.NET開源、功能強大、跨平臺的繪圖庫 - OxyPlot

常用的 Git 指令

鼠標控制軟件有可能和虛擬機軟件產生衝突

sm4加密工具類

22HIVE的分區分桶——好程序

06hadoop基礎架構——好程序

28hbase的內部機制&存儲機制&尋址機制——好程序

18mapreduce的案例加強——好程序

29hbase&hive&hdfs——好程序

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結