hbase的內容查詢(1)

一、shell 查詢

hbase 查詢相當簡單,提供了get和scan兩種方式,也不存在多表聯合查詢的問題。複雜查詢需通過hive創建相應外部表,用sql語句自動生成mapreduce進行。
但是這種簡單,有時爲了達到目的,也不是那麼順手。至少和sql查詢方式相差較大。

hbase 提供了很多過濾器,可對行鍵,列,值進行過濾。過濾方式可以是子串,二進制,前綴,正則比較等。條件可以是AND,OR等 組合。所以通過過濾,還是能滿足需求,找到正確的結果的。

1.1 過濾器類型

HBase 最新官方文檔中文版(http://abloz.com/hbase/book.html)中有對過濾器的描述。過濾器分爲5種類型:

  1. 構造型過濾器:用於包含其他一組過濾器的過濾器。包括:FilterList
  2. 列值型過濾器:對每列的值進行過濾的. 相當於sql查詢中的=和like 包括:
    SingleColumnValueFilter
    比較器,包括:
    RegexStringComparator 支持值比較的正則表達式
    SubstringComparator 用於檢測一個子串是否存在於值中。大小寫不敏感。 
    BinaryPrefixComparator 二進制前綴比較
    BinaryComparator 二進制比較
  3. 鍵值元數據過濾器:用於對列進行過濾的。包括:
    FamilyFilter 用於過濾列族。 通常,在Scan中選擇ColumnFamilie優於在過濾器中做。
    QualifierFilter 用於基於列名(即 Qualifier)過濾.
    ColumnPrefixFilter 可基於列名(即Qualifier)前綴過濾。
    MultipleColumnPrefixFilter 和 ColumnPrefixFilter 行爲差不多,但可以指定多個前綴。
    ColumnRangeFilter 可以進行高效內部掃描。  

     

  4. Rowkey:對行鍵進行過濾。通常認爲行選擇時Scan採用 startRow/stopRow 方法比較好。然而 RowFilter 也可以用。
  5. 工具:如FirstKeyOnlyFilter用於統計行數。

二、示例

 

1.FirstKeyOnlyFilter,一種方便的計算行數的過濾器

hbase(main):002:0> scan 'toplist_ware_ios_1009_201231',{COLUMNS=>'info',FILTER=>"(FirstKeyOnlyFilter())"}
 0000000001                       column=info:loginid, timestamp=1343625459713, value=jjm168131013
 0000000002                       column=info:loginid, timestamp=1343625459713, value=loveswh
...
21 row(s) in 0.5480 seconds

2.列名子串進行過濾

hbase(main):006:0> scan 'toplist_ware_ios_1009_201231',{COLUMNS=>['info:'],FILTER=>"(QualifierFilter(=,'substring:id'))"}
ROW COLUMN+CELL
0000000001 column=info:loginid, timestamp=1343625459713, value=jjm168131013
0000000001 column=info:userid, timestamp=1343625459713, value=168131013
0000000002 column=info:loginid, timestamp=1343625459713, value=loveswh
0000000002 column=info:userid, timestamp=1343625459713, value=100898152

hbase(main):005:0> scan 'toplist_ware_ios_1009_201231',{COLUMNS=>['info:loginid'],FILTER=>"(QualifierFilter(=,'substring:id'))"}
ROW COLUMN+CELL
0000000001 column=info:loginid, timestamp=1343625459713, value=jjm168131013
0000000002 column=info:loginid, timestamp=1343625459713, value=loveswh

hbase(main):007:0> scan 'toplist_ware_ios_1009_201231',{COLUMNS=>['info:'],FILTER=>"(QualifierFilter(=,'substring:nid'))"}
ROW COLUMN+CELL
0000000001 column=info:loginid, timestamp=1343625459713, value=jjm168131013
0000000002 column=info:loginid, timestamp=1343625459713, value=loveswh

hbase(main):008:0> scan 'toplist_ware_ios_1009_201231',{COLUMNS=>['info:'],FILTER=>"(QualifierFilter(=,'substring:nick'))"}
ROW COLUMN+CELL
0000000001 column=info:nick, timestamp=1343625459713, value=\xE5\xAE\xB6\xE6\x9C\x89\xE8\x99\x8E\xE5\xAE\x9
D
0000000002 column=info:nick, timestamp=1343625459713, value=loveswh08

3.Value 過濾

3.1 正則過濾
hbase(main):004:0> scan 'toplist_ware_ios_1009_201231',{COLUMNS=>'info',FILTER=>"(SingleColumnValueFilter('info','nick',=,'regexstring:.*99',true,true))"}
ROW                               COLUMN+CELL
 0000000009                       column=info:loginid, timestamp=1343625459713, value=zgh1968
 0000000009                       column=info:nick, timestamp=1343625459713, value=zwy99
 0000000009                       column=info:score, timestamp=1343625459713, value=5
 0000000009                       column=info:userid, timestamp=1343625459713, value=100366262
1 row(s) in 0.2520 seconds

3.2 子串
需導入
import org.apache.hadoop.hbase.filter.CompareFilter
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
import org.apache.hadoop.hbase.filter.SubstringComparator
import org.apache.hadoop.hbase.util.Bytes

hbase(main):028:0> scan 'toplist_ware_ios_1001_201231',{COLUMNS =>'info:nick', FILTER=>SingleColumnValueFilter.new(Bytes.toBytes('info'),Bytes.toBytes('nick'),CompareFilter::CompareOp.valueOf('EQUAL'),SubstringComparator.new('8888'))}
ROW COLUMN+CELL
0000000002 column=info:nick, timestamp=1343625446556, value=\xE7\x81\x8F????\xE3\x81\x8A??8888
1 row(s) in 0.0330 seconds

3.3 二進制
子串等不支持多字節文字,所以用二進制來進行比較
hbase(main):010:0> scan 'toplist_ware_ios_1009_201231',{COLUMNS=>['info:'],FILTER=>"(QualifierFilter(=,'substring:nick') AND ValueFilter(=,'binary:7789\xE6\xB4\x81') )"}
ROW COLUMN+CELL
0000000016 column=info:nick, timestamp=1343625459713, value=7789\xE6\xB4\x81
1 row(s) in 0.1710 seconds

4 綜合列名子串和值二進制比較

hbase(main):012:0> scan 'toplist_ware_ios_1009_201231',{COLUMNS=>['info:'],FILTER=>"(QualifierFilter(=,'substring:nick') AND ValueFilter(=,'binary:7789\xE6\xB4\x81') )"}
ROW COLUMN+CELL
0000000016 column=info:nick, timestamp=1343625459713, value=7789\xE6\xB4\x81
1 row(s) in 0.0120 seconds
hbase(main):014:0> scan 'toplist_ware_ios_1009_201231',{COLUMNS=>"info:",FILTER=>"(PrefixFilter('000000002')) AND (QualifierFilter(=,'substring:nick')"}
ROW COLUMN+CELL
 0000000020 column=info:nick, timestamp=1343625459713, value=Denny_feng
 0000000021 column=info:nick, timestamp=1343625459713, value=\xE5\xB0\x8F\xE7\xBD\x97\xE6\x95\x99\xE7\xBB\x8
 31
2 row(s) in 0.0440 seconds

5. 行查詢

 

hbase(main):005:0> get 'toplist_ware_ios_1009_201231','0000000009'
COLUMN CELL
 info:loginid timestamp=1343625459713, value=zgh1968
 info:nick timestamp=1343625459713, value=zwy99
 info:score timestamp=1343625459713, value=5
 info:userid timestamp=1343625459713, value=100366262
4 row(s) in 0.1000 seconds
hbase(main):006:0> get 'toplist_ware_ios_1009_201231','0000000009','info:nick'
COLUMN CELL
 info:nick timestamp=1343625459713, value=zwy99
1 row(s) in 0.0100 seconds
hbase(main):009:0> scan 'toplist_ware_ios_1009_201231',FILTER=>"PrefixFilter('000000002')"
ROW COLUMN+CELL
 0000000020 column=info:loginid, timestamp=1343625459713, value=jjm169212318
 0000000020 column=info:nick, timestamp=1343625459713, value=Denny_feng
 0000000020 column=info:score, timestamp=1343625459713, value=1
 0000000020 column=info:userid, timestamp=1343625459713, value=169212318
 0000000021 column=info:loginid, timestamp=1343625459713, value=jjm169371841
 0000000021 column=info:nick, timestamp=1343625459713, value=\xE5\xB0\x8F\xE7\xBD\x97\xE6\x95\x99\xE7\xBB\x8
 31
 0000000021 column=info:score, timestamp=1343625459713, value=1
 0000000021 column=info:userid, timestamp=1343625459713, value=169371841
2 row(s) in 0.0180 seconds
hbase(main):010:0> scan 'toplist_ware_ios_1009_201231',FILTER=>"PrefixFilter('000000002')",LIMIT=>1
ROW COLUMN+CELL
 0000000020 column=info:loginid, timestamp=1343625459713, value=jjm169212318
 0000000020 column=info:nick, timestamp=1343625459713, value=Denny_feng
 0000000020 column=info:score, timestamp=1343625459713, value=1
 0000000020 column=info:userid, timestamp=1343625459713, value=169212318
1 row(s) in 0.0170 seconds
hbase(main):011:0> scan 'toplist_ware_ios_1009_201231',{COLUMNS=>"info:nick",FILTER=>"PrefixFilter('000000002')",LIMIT=>1}
ROW COLUMN+CELL
 0000000020 column=info:nick, timestamp=1343625459713, value=Denny_feng
1 row(s) in 0.0160 seconds

 
查詢MPID和GameID同時等於某個值的記錄:

hbase(main):014:0> scan 'award_1211',{FILTER=>"(PrefixFilter('2012-11-26')) AND (SingleColumnValueFilter('info','MPID',=,'regexstring:8639',true,true)) AND (SingleColumnValueFilter('info','gameID',=,'regexstring:1001',true,true))",LIMIT=>2}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章