用hbase(0.92版本以上）的協處理器實現快速返回查詢結果總數

在0.92版本的hbase上添加了協處理器的功能，協處理器分爲兩大部分 endpoint和observer.

observer相當於一個鉤子的作用，根據鉤子運行的模塊來劃分，又分成三個

RegionObserver：用這個做數據操縱事件，其緊密的綁定到表的region

MasterObserver：處理集羣級別的事件：管理操作和數據定義語言操作

WALObserver：預寫日誌處理

而endpoint可看作關係數據庫中的存儲過程，用戶可自定義。

言歸正傳，如何配置並使用協處理器呢

本次只介紹用endpoint實現快速返回符合條件結果總數的方法。

1.配置

在$HBASE_HOME/conf/hbase-site.xml添加一個配置項。我用的0.94版本自帶的實現爲AggregateImplementation，具體如下

<property>
<name>hbase.coprocessor.region.classes</name>
<value>org.apache.hadoop.hbase.coprocessor.AggregateImplementation</value>
</property>

若之前未配置此項，則配置完後，需要重啓hbase方能生效。

2.客戶端使用，直接上代碼。

scan直接用查詢結果所用的scan即可。

    /**
     * 獲得符合條件結果總數
     * @author wanglongyf2 2013-1-11 上午10:29:15
     * @param scan
     * @return
     */
    private long getTotalNumber(Scan scan) {
    	AggregationClient aggregationClient = new AggregationClient(conf);
    	long rowCount = 0;
    	try {
    		scan.addColumn(columnFamily, etimeQualifier);//必須有此句，或者用addFamily(),否則出錯，異常包含 ci ****
	        rowCount = aggregationClient.rowCount(tableName, null, scan);
        } catch (Throwable e) {
	        LOG.fatal("getTotalNumber wrong. ");
	        e.printStackTrace();
        }
    	return rowCount;
    }

若要驗證此結果總數和實際的結果數是否相同，則看下面，關鍵代碼

		        scan.setStartRow(startRow);
		        scan.setStopRow(stopRow);
		        Filter filter = new SingleColumnValueFilter(columnFamily, qualifier,
		                                                    CompareOp.GREATER, Bytes.toBytes(startTime));
		        scan.setFilter(filter);
		        long number = getTotalNumber(scan);
		        ResultScanner scanner = table.getScanner(scan);
		        Result res = scanner.next(); 
		        while(res != null) {
		        	numberOfResults ++;
		        	res = scanner.next();
		        }
		        if (numberOfResults != number) {
		        	LOG.fatal(String.format("use aggregation %d and scanner %d gets inconsistant result. ",
		        	                        number, numberOfResults));
		        }

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

用hbase(0.92版本以上）的協處理器實現快速返回查詢結果總數

python gdal 安裝使用（Windows， python 3.6.8）

sqoop之從oracle導入hbase的問題與sqoop hbase 需要注意的一個問題

JAVA-編譯-包-將源文件和類文件分開

Hive學習筆記1--------Hive入門

hive2:HIVE的結構

hive3:hive和關係型數據庫RDBMS的異同

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結