HBase Scan,Get用法

Scan,get用法

1. get help幫助信息

從下列get用法信息可以看出 get 後面可以跟table表名,rowkey,以及column,value.但是如果想通過get直接獲取一個表中的全部數據是做不到的,這種情況就要用到另外一個命令scan。

複製代碼

hbase(main):214:0> help 'get'
Get row or cell contents; pass table name, row, and optionally
a dictionary of column(s), timestamp, timerange and versions. Examples:

  hbase> get 'ns1:t1', 'r1'
  hbase> get 't1', 'r1'
  hbase> get 't1', 'r1', {TIMERANGE => [ts1, ts2]}
  hbase> get 't1', 'r1', {COLUMN => 'c1'}
  hbase> get 't1', 'r1', {COLUMN => ['c1', 'c2', 'c3']}
  hbase> get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1}
  hbase> get 't1', 'r1', {COLUMN => 'c1', TIMERANGE => [ts1, ts2], VERSIONS => 4}
  hbase> get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1, VERSIONS => 4}
  hbase> get 't1', 'r1', {FILTER => "ValueFilter(=, 'binary:abc')"}
  hbase> get 't1', 'r1', 'c1'
  hbase> get 't1', 'r1', 'c1', 'c2'
  hbase> get 't1', 'r1', ['c1', 'c2']
  hbsase> get 't1','r1', {COLUMN => 'c1', ATTRIBUTES => {'mykey'=>'myvalue'}}
  hbsase> get 't1','r1', {COLUMN => 'c1', AUTHORIZATIONS => ['PRIVATE','SECRET']}

複製代碼

2. Scan help幫助信息

scan的用法很多,可以直接掃描全表信息也可以通過指定條件來顯示我們所需要獲取的數據。這裏涉及到Filter的用法接下來會逐一演示

複製代碼

hbase(main):221:0> help 'scan'
Scan a table; pass table name and optionally a dictionary of scanner
specifications.  Scanner specifications may include one or more of:
TIMERANGE, FILTER, LIMIT, STARTROW, STOPROW, TIMESTAMP, MAXLENGTH,
or COLUMNS, CACHE

If no columns are specified, all columns will be scanned.
To scan all members of a column family, leave the qualifier empty as in
'col_family:'.

The filter can be specified in two ways:
1. Using a filterString - more information on this is available in the
Filter Language document attached to the HBASE-4176 JIRA
2. Using the entire package name of the filter.

Some examples:

  hbase> scan 'hbase:meta'
  hbase> scan 'hbase:meta', {COLUMNS => 'info:regioninfo'}
  hbase> scan 'ns1:t1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, STARTROW => 'xyz'}
  hbase> scan 't1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, STARTROW => 'xyz'}
  hbase> scan 't1', {COLUMNS => 'c1', TIMERANGE => [1303668804, 1303668904]}
  hbase> scan 't1', {REVERSED => true}
  hbase> scan 't1', {FILTER => "(PrefixFilter ('row2') AND
    (QualifierFilter (>=, 'binary:xyz'))) AND (TimestampsFilter ( 123, 456))"}
  hbase> scan 't1', {FILTER =>
    org.apache.hadoop.hbase.filter.ColumnPaginationFilter.new(1, 0)}
For setting the Operation Attributes 
  hbase> scan 't1', { COLUMNS => ['c1', 'c2'], ATTRIBUTES => {'mykey' => 'myvalue'}}
  hbase> scan 't1', { COLUMNS => ['c1', 'c2'], AUTHORIZATIONS => ['PRIVATE','SECRET']}
For experts, there is an additional option -- CACHE_BLOCKS -- which
switches block caching for the scanner on (true) or off (false).  By
default it is enabled.  Examples:

  hbase> scan 't1', {COLUMNS => ['c1', 'c2'], CACHE_BLOCKS => false}

Also for experts, there is an advanced option -- RAW -- which instructs the
scanner to return all cells (including delete markers and uncollected deleted
cells). This option cannot be combined with requesting specific COLUMNS.
Disabled by default.  Example:

  hbase> scan 't1', {RAW => true, VERSIONS => 10}

Besides the default 'toStringBinary' format, 'scan' supports custom formatting
by column.  A user can define a FORMATTER by adding it to the column name in
the scan specification.  The FORMATTER can be stipulated: 

 1. either as a org.apache.hadoop.hbase.util.Bytes method name (e.g, toInt, toString)
 2. or as a custom class followed by method name: e.g. 'c(MyFormatterClass).format'.

Example formatting cf:qualifier1 and cf:qualifier2 both as Integers: 
  hbase> scan 't1', {COLUMNS => ['cf:qualifier1:toInt',
    'cf:qualifier2:c(org.apache.hadoop.hbase.util.Bytes).toInt'] } 

Note that you can specify a FORMATTER by column only (cf:qualifer).  You cannot
specify a FORMATTER for all columns of a column family.

Scan can also be used directly from a table, by first getting a reference to a
table, like such:

  hbase> t = get_table 't'
  hbase> t.scan

Note in the above situation, you can still provide all the filtering, columns,
options, etc as described above.

複製代碼

3. 通過get,Scan用法來獲取表中指定rowkey信息。

複製代碼

1. get 獲取table中rowkey語句 於 Scan獲取table中rowkey語句
=================================================================================================================
【get】
hbase(main):011:0> get 'liupeng:employee','1001'
COLUMN                                  CELL
 contect:mail                           timestamp=1522202414649, [email protected]
 contect:phone                          timestamp=1522202430196, value=15962459503
 group:number                           timestamp=1522202455929, value=1
 info:age                               timestamp=1522202371257, value=34
 info:name                              timestamp=1522202364156, value=liupeng

【Scan】
hbase(main):010:0> scan 'liupeng:employee',FILTER=>"PrefixFilter('1001')"
ROW                                     COLUMN+CELL
 1001                                   column=contect:mail, timestamp=1522202414649, [email protected]
 1001                                   column=contect:phone, timestamp=1522202430196, value=15962459503
 1001                                   column=group:number, timestamp=1522202455929, value=1
 1001                                   column=info:age, timestamp=1522202371257, value=34
 1001                                   column=info:name, timestamp=1522202364156, value=liupeng
1 row(s) in 0.0590 seconds

總結:從上述兩種不同的方法可以看出Scan的結果包含了rowkey本身。而get獲取到的信息不包含rowkey的值。另外get的column於cell是分開的。而Scan是2者結合在一起的。
     另外Scan中FILTER過濾“PrefixFilter”關鍵字是用來篩選rowkey的。

複製代碼

 4. get於Scan獲取table中單條數據信息中的區別
《相同點》

複製代碼

hbase(main):229:0> get "liupeng:employee",'1001','info:phone'
COLUMN                          CELL                                                                                     
 info:phone                     timestamp=1527914569028, value=15962459503                                               
1 row(s) in 0.0320 seconds

hbase(main):230:0> scan "liupeng:employee",FILTER=>"PrefixFilter('1001')AND ValueFilter(=,'substring:159')"
ROW                             COLUMN+CELL                                                                              
 1001                           column=info:phone, timestamp=1527914569028, value=15962459503                            
1 row(s) in 0.1010 seconds

複製代碼

《不同點》
##注意事項:上述都可以把table中rowkey爲1002,元素爲'159'的信息查詢出來。但是查詢的方式截然不同。get是通過指定固定的value 'contect:phone'來獲取到的。
而scan是通過PerfixFilter指定固定的rowkey,然後通過AND條件語句結合ValueFilter指定模糊查詢的字符串159查出來的。如果不知道對應的value是contect:phone的基礎上
顯然Scan這種模糊查詢的方式更加高效。

另外Scan下面這種相同語句的查詢用get語法是做不到的。例如:
=================================================================================================================

hbase(main):026:0> scan 'liupeng:employee',FILTER=>"ValueFilter(=,'substring:159')"
ROW                                     COLUMN+CELL
 1001                                   column=contect:phone, timestamp=1522202430196, value=15962459503
 1002                                   column=contect:phone, timestamp=1522202527866, value=15977634464

##解釋:上述是通過模糊查詢直接找到了只要包含159的字段的值就全部顯示出來。而get的語法如下所視必須指定rowkey的基礎上纔可以查詢columns。這就需要對rowkey定義的時候
考慮全面的涉及纔可以做到。因此從這點來看Scan的方法個人認爲比get獲取信息更加的便捷。

複製代碼

 hbase> t.get 'r1'
  hbase> t.get 'r1', {TIMERANGE => [ts1, ts2]}
  hbase> t.get 'r1', {COLUMN => 'c1'}
  hbase> t.get 'r1', {COLUMN => ['c1', 'c2', 'c3']}
  hbase> t.get 'r1', {COLUMN => 'c1', TIMESTAMP => ts1}
  hbase> t.get 'r1', {COLUMN => 'c1', TIMERANGE => [ts1, ts2], VERSIONS => 4}
  hbase> t.get 'r1', {COLUMN => 'c1', TIMESTAMP => ts1, VERSIONS => 4}
  hbase> t.get 'r1', {FILTER => "ValueFilter(=, 'binary:abc')"}
  hbase> t.get 'r1', 'c1'
  hbase> t.get 'r1', 'c1', 'c2'
  hbase> t.get 'r1', ['c1', 'c2']

複製代碼

5. Scan方法可以不用指定rowkey檢索的情況下直接找valuse值。更具體點說也就是我們要找的哪個column中的哪個value值。get方法是無法做到這一點的。

    ColumnPrefixFilter('列名')

複製代碼

hbase(main):038:0> scan 'liupeng:employee',FILTER=>"ColumnPrefixFilter('name')"
ROW                                     COLUMN+CELL
 1001                                   column=info:name, timestamp=1522202364156, value=liupeng
 1002                                   column=info:name, timestamp=1522202474669, value=Jack_Ma
 1003                                   column=info:name, timestamp=1522202561029, value=kevin_shi
3 row(s) in 0.0210 seconds

##註釋:ColumnPrefixFilter代表指定具體哪一個column(key(info)對應的value(name))。

複製代碼

6.  Scan方法方便在於它可以隨意指定rowkey,column以及value的值來進行查找。還可以結合AND,ORD等條件語句並用來找到自己想要的數據。
下列語法是AND及OR的連用方法。但是同一條語句中相同的條件語句不可以同時使用。例如AND ....AND..這種方法是不允許的。

hbase(main):060:0> scan 'liupeng:employee',FILTER=>"ColumnPrefixFilter('ph')AND ValueFilter(=,'substring:15962')OR ValueFilter(=,'substring:186')"
ROW                                                  COLUMN+CELL
 1001                                                column=contect:phone, timestamp=1522202430196, value=15962459503
 1003                                                column=contect:phone, timestamp=1522202605976, value=18665851263
2 row(s) in 0.0170 seconds

7.  通過SingleColumnValueFilter類方法指定檢索值列舉出檢索值對應的所有列及value數據

複製代碼

hbase(main):242:0> scan "liupeng:employee",{FILTER=>"SingleColumnValueFilter('info','age',=,'substring:30')"}
ROW                             COLUMN+CELL                                                                              
 1005                           column=contect:mail, timestamp=1528420218800, [email protected]                     
 1005                           column=info:age, timestamp=1528439967493, value=30                                       
 1005                           column=info:name, timestamp=1528420218800, value=zhangsan                                
 1008                           column=contect:mail, timestamp=1528681786126, [email protected]                
 1008                           column=info:age, timestamp=1528681786126, value=30                                       
 1008                           column=info:name, timestamp=1528681786126, value=kevin                                   
2 row(s) in 0.0110 seconds

複製代碼

8.  SingleColumnValueFilter類還提供正則表達式查詢方法。可以通過模糊查詢來查找對應的rowkeys,columns以及values。

複製代碼

hbase(main):244:0> scan "liupeng:employee",{FILTER=>"SingleColumnValueFilter('info','name',=,'regexstring:liu')"}
ROW                             COLUMN+CELL                                                                              
 1001                           column=contect:mail, timestamp=1527231141046, [email protected]                  
 1001                           column=info:address, timestamp=1527753987327, value=shanghai                             
 1001                           column=info:age, timestamp=1527231097033, value=34                                       
 1001                           column=info:name, timestamp=1527231081262, value=liupeng                                 
 1001                           column=info:phone, timestamp=1527914569028, value=15962459503                            
 1004                           column=contect:mail, timestamp=1527473497956, [email protected]                  
 1004                           column=info:address, timestamp=1527755135174, value=shenzhen                             
 1004                           column=info:age, timestamp=1527473477124, value=40                                       
 1004                           column=info:name, timestamp=1527415665182, value=liuqiangdong                            
2 row(s) in 0.0080 seconds

複製代碼

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章