Hbase的Filter詳解使用

原文鏈接:https://blog.csdn.net/lr131425/article/details/72676254

參數基礎
有兩個參數類在各類Filter中經常出現,統一介紹下:
(1)比較運算符 CompareFilter.CompareOp
比較運算符用於定義比較關係,可以有以下幾類值供選擇:

    EQUAL                                  相等
    GREATER                              大於
    GREATER_OR_EQUAL           大於等於
    LESS                                      小於
    LESS_OR_EQUAL                  小於等於
    NOT_EQUAL                        不等於


(2)比較器  ByteArrayComparable
通過比較器可以實現多樣化目標匹配效果,比較器有以下子類可以使用:

    BinaryComparator               匹配完整字節數組
    BinaryPrefixComparator     匹配字節數組前綴
    BitComparator
    NullComparator
    RegexStringComparator    正則表達式匹配
    SubstringComparator        子串匹配

1,FilterList
FilterList 代表一個過濾器鏈,它可以包含一組即將應用於目標數據集的過濾器,過濾器間具有“與” FilterList.Operator.MUST_PASS_ALL 和“或” FilterList.Operator.MUST_PASS_ONE 關係。
官網實例代碼,兩個“或”關係的過濾器的寫法:

FilterList list = new FilterList(FilterList.Operator.MUST_PASS_ONE);   //數據只要滿足一組過濾器中的一個就可以
SingleColumnValueFilter filter1 = new SingleColumnValueFilter(cf,column,CompareOp.EQUAL,Bytes.toBytes("my value"));
list.add(filter1);
SingleColumnValueFilter filter2 = new SingleColumnValueFilter(cf,column,CompareOp.EQUAL,Bytes.toBytes("my other value"));
list.add(filter2);
Scan scan = new Scan();
scan.setFilter(list);

2,列值過濾器--SingleColumnValueFilter
SingleColumnValueFilter 用於測試列值相等 (CompareOp.EQUAL ), 不等 (CompareOp.NOT_EQUAL),或單側範圍 (e.g., CompareOp.GREATER)。
構造函數:
(1)比較的關鍵字是一個字符數組
SingleColumnValueFilter(byte[] family, byte[] qualifier, CompareFilter.CompareOp compareOp, byte[] value)
(2)比較的關鍵字是一個比較器(比較器下一小節做介紹)
SingleColumnValueFilter(byte[] family, byte[] qualifier, CompareFilter.CompareOp compareOp, ByteArrayComparable comparator)
注意:根據列的值來決定這一行數據是否返回,落腳點在行,而不是列。我們可以設置filter.setFilterIfMissing(true);如果爲true,當這一列不存在時,不會返回,如果爲false,當這一列不存在時,會返回所有的列信息
測試表user內容如下:

java代碼測試:

Table table = connection.getTable(TableName.valueOf("user"));
        SingleColumnValueFilter scvf= new SingleColumnValueFilter(Bytes.toBytes("account"), Bytes.toBytes("name"), 
       		 CompareOp.EQUAL,"zhangsan".getBytes());
        scvf.setFilterIfMissing(true); //默認爲false, 沒有此列的數據也會返回 ,爲true則只返回name=lisi的數據
        Scan scan = new Scan();
        scan.setFilter(scvf);
        ResultScanner resultScanner = table.getScanner(scan);
        for (Result result : resultScanner) {
			 List<Cell> cells= result.listCells();	
			 for (Cell cell : cells) {
				 String row = Bytes.toString(result.getRow());
				 String family1 = Bytes.toString(CellUtil.cloneFamily(cell));
				 String qualifier = Bytes.toString(CellUtil.cloneQualifier(cell));
				 String value = Bytes.toString(CellUtil.cloneValue(cell));
				 System.out.println("[row:"+row+"],[family:"+family1+"],[qualifier:"+qualifier+"]"
				 		+ ",[value:"+value+"],[time:"+cell.getTimestamp()+"]");
			}
		}

如果setFilterIfMissing(true), 有匹配只會返回當前列所在的行數據,基於行的數據 country 也返回了,因爲他麼你的rowkey是相同的

[row:zhangsan_1495527850824],[family:account],[qualifier:country],[value:china],[time:1495636452285]
[row:zhangsan_1495527850824],[family:account],[qualifier:name],[value:zhangsan],[time:1495556648729]

 

如果setFilterIfMissing(false),有匹配的列的值相同會返回,沒有此列的 name的也會返回,, 不匹配的name則不會返回。

下面 紅色是匹配列內容的會返回,其他的不是account:name列也會返回,, name=lisi的不會返回,因爲不匹配。

[row:lisi_1495527849910],[family:account],[qualifier:idcard],[value:42963319861234561230],[time:1495556647872]
[row:lisi_1495527850111],[family:account],[qualifier:password],[value:123451231236],[time:1495556648013]
[row:lisi_1495527850114],[family:address],[qualifier:city],[value:黃埔],[time:1495556648017]
[row:lisi_1495527850136],[family:address],[qualifier:province],[value:shanghai],[time:1495556648041]
[row:lisi_1495527850144],[family:info],[qualifier:age],[value:21],[time:1495556648045]
[row:lisi_1495527850154],[family:info],[qualifier:sex],[value:女],[time:1495556648056]
[row:lisi_1495527850159],[family:userid],[qualifier:id],[value:002],[time:1495556648060]
[row:wangwu_1495595824517],[family:userid],[qualifier:id],[value:009],[time:1495624624131]
[row:zhangsan_1495527850759],[family:account],[qualifier:idcard],[value:9897645464646],[time:1495556648664]
[row:zhangsan_1495527850759],[family:account],[qualifier:passport],[value:5689879898],[time:1495636370056]
[row:zhangsan_1495527850824],[family:account],[qualifier:country],[value:china],[time:1495636452285]
[row:zhangsan_1495527850824],[family:account],[qualifier:name],[value:zhangsan],[time:1495556648729]
[row:zhangsan_1495527850951],[family:address],[qualifier:province],[value:guangdong],[time:1495556648855]
[row:zhangsan_1495527850975],[family:info],[qualifier:age],[value:100],[time:1495556648878]
[row:zhangsan_1495527851080],[family:info],[qualifier:sex],[value:男],[time:1495556648983]
[row:zhangsan_1495527851095],[family:userid],[qualifier:id],[value:001],[time:1495556648996]

3. 鍵值元數據
由於HBase 採用鍵值對保存內部數據,鍵值元數據過濾器評估一行的鍵(ColumnFamily:Qualifiers)是否存在

3.1. 基於列族過濾數據的FamilyFilter
構造函數:
FamilyFilter(CompareFilter.CompareOp familyCompareOp, ByteArrayComparable familyComparator)
代碼如下:

   public static ResultScanner getDataFamilyFilter(String tableName,String family) throws IOException{
    	Table table = connection.getTable(TableName.valueOf("user"));
        FamilyFilter ff = new FamilyFilter(CompareOp.EQUAL , 
        		new BinaryComparator(Bytes.toBytes("account")));   //表中不存在account列族,過濾結果爲空
//		 new BinaryPrefixComparator(value) //匹配字節數組前綴
//		 new RegexStringComparator(expr) // 正則表達式匹配
//		 new SubstringComparator(substr)// 子字符串匹配 
        Scan scan = new Scan();
        // 通過scan.addFamily(family)  也可以實現此操作
        scan.setFilter(ff);
        ResultScanner resultScanner = table.getScanner(scan);
    	return resultScanner;
    }

測試結果:查詢的都是account列簇的內容

[row:lisi_1495527849910],[family:account],[qualifier:idcard],[value:42963319861234561230],[time:1495556647872]
[row:lisi_1495527850081],[family:account],[qualifier:name],[value:lisi],[time:1495556647984]
[row:lisi_1495527850111],[family:account],[qualifier:password],[value:123451231236],[time:1495556648013]
[row:zhangsan_1495527850759],[family:account],[qualifier:idcard],[value:9897645464646],[time:1495556648664]
[row:zhangsan_1495527850759],[family:account],[qualifier:passport],[value:5689879898],[time:1495636370056]
[row:zhangsan_1495527850824],[family:account],[qualifier:country],[value:china],[time:1495636452285]
[row:zhangsan_1495527850824],[family:account],[qualifier:name],[value:zhangsan],[time:1495556648729]


3.2. 基於限定符Qualifier(列)過濾數據的QualifierFilter

構造函數:

QualifierFilter(CompareFilter.CompareOp op, ByteArrayComparable qualifierComparator)

	Table table = connection.getTable(TableName.valueOf("user"));
    	QualifierFilter ff = new QualifierFilter(
                CompareOp.EQUAL , new BinaryComparator(Bytes.toBytes("name")));
//		 new BinaryPrefixComparator(value) //匹配字節數組前綴
//		 new RegexStringComparator(expr) // 正則表達式匹配
//		 new SubstringComparator(substr)// 子字符串匹配 
        Scan scan = new Scan();
        // 通過scan.addFamily(family)  也可以實現此操作
        scan.setFilter(ff);
        ResultScanner resultScanner = table.getScanner(scan);

測試結果:只返回 name 的列內容

[row:lisi_1495527850081],[family:account],[qualifier:name],[value:lisi],[time:1495556647984]
[row:zhangsan_1495527850824],[family:account],[qualifier:name],[value:zhangsan],[time:1495556648729]

3.3. 基於列名(即Qualifier)前綴過濾數據的ColumnPrefixFilter  ( 該功能用QualifierFilter也能實現 )

構造函數:

ColumnPrefixFilter(byte[] prefix) 

Table table = connection.getTable(TableName.valueOf("user"));
    	 ColumnPrefixFilter ff = new ColumnPrefixFilter(Bytes.toBytes("name"));
        Scan scan = new Scan();
        // 通過QualifierFilter的 newBinaryPrefixComparator也可以實現
        scan.setFilter(ff);
        ResultScanner resultScanner = table.getScanner(scan);

返回結果:

[row:lisi_1495527850081],[family:account],[qualifier:name],[value:lisi],[time:1495556647984]
[row:zhangsan_1495527850824],[family:account],[qualifier:name],[value:zhangsan],[time:1495556648729]

3.4. 基於多個列名(即Qualifier)前綴過濾數據的MultipleColumnPrefixFilter

MultipleColumnPrefixFilter 和 ColumnPrefixFilter 行爲差不多,但可以指定多個前綴

byte[][] prefixes = new byte[][] {Bytes.toBytes("name"), Bytes.toBytes("age")};
        //返回所有行中以name或者age打頭的列的數據
        MultipleColumnPrefixFilter ff = new MultipleColumnPrefixFilter(prefixes);
 
        Scan scan = new Scan();
        scan.setFilter(ff);
        ResultScanner rs = table.getScanner(scan);  

結果:

[row:lisi_1495527850081],[family:account],[qualifier:name],[value:lisi],[time:1495556647984]
[row:lisi_1495527850144],[family:info],[qualifier:age],[value:21],[time:1495556648045]
[row:zhangsan_1495527850824],[family:account],[qualifier:name],[value:zhangsan],[time:1495556648729]
[row:zhangsan_1495527850975],[family:info],[qualifier:age],[value:100],[time:1495556648878]

3.5. 基於列範圍過濾數據ColumnRangeFilter
構造函數:
ColumnRangeFilter(byte[] minColumn, boolean minColumnInclusive, byte[] maxColumn, boolean maxColumnInclusive)
參數解釋:

    minColumn - 列範圍的最小值,如果爲空,則沒有下限;
    minColumnInclusive - 列範圍是否包含minColumn ;
    maxColumn - 列範圍最大值,如果爲空,則沒有上限;
    maxColumnInclusive - 列範圍是否包含maxColumn 。

代碼:

Table table = connection.getTable(TableName.valueOf("user"));
    	byte[] startColumn = Bytes.toBytes("a");
        byte[] endColumn = Bytes.toBytes("d");
        //返回所有列中從a到d打頭的範圍的數據,
        ColumnRangeFilter ff = new ColumnRangeFilter(startColumn, true, endColumn, true);
        Scan scan = new Scan();
        scan.setFilter(ff);
        ResultScanner rs = table.getScanner(scan);  

結果:返回列名開頭是a 到  d的所有列數據

[row:lisi_1495527850114],[family:address],[qualifier:city],[value:黃埔],[time:1495556648017]
[row:lisi_1495527850144],[family:info],[qualifier:age],[value:21],[time:1495556648045]
[row:zhangsan_1495527850824],[family:account],[qualifier:country],[value:china],[time:1495636452285]
[row:zhangsan_1495527850975],[family:info],[qualifier:age],[value:100],[time:1495556648878]

4. RowKey
當需要根據行鍵特徵查找一個範圍的行數據時,使用Scan的startRow和stopRow會更高效,但是,startRow和stopRow只能匹配行鍵的開始字符,而不能匹配中間包含的字符:
        byte[] startColumn = Bytes.toBytes("azha");
        byte[] endColumn = Bytes.toBytes("dddf");
        Scan scan = new Scan(startColumn,endColumn);
當需要針對行鍵進行更復雜的過濾時,可以使用RowFilter:
構造函數:
RowFilter(CompareFilter.CompareOp rowCompareOp, ByteArrayComparable rowComparator)

代碼:

Table table = connection.getTable(TableName.valueOf("user"));
    	RowFilter rf = new RowFilter(CompareOp.EQUAL , 
                new SubstringComparator("zhangsan"));
		//		 new BinaryPrefixComparator(value) //匹配字節數組前綴
		//		 new RegexStringComparator(expr) // 正則表達式匹配
		//		 new SubstringComparator(substr)// 子字符串匹配 
        Scan scan = new Scan();
        scan.setFilter(rf);
        ResultScanner rs = table.getScanner(scan); 

結果:

[row:zhangsan_1495527850759],[family:account],[qualifier:idcard],[value:9897645464646],[time:1495556648664]
[row:zhangsan_1495527850759],[family:account],[qualifier:passport],[value:5689879898],[time:1495636370056]
[row:zhangsan_1495527850824],[family:account],[qualifier:country],[value:china],[time:1495636452285]
[row:zhangsan_1495527850824],[family:account],[qualifier:name],[value:zhangsan],[time:1495556648729]
[row:zhangsan_1495527850951],[family:address],[qualifier:province],[value:guangdong],[time:1495556648855]
[row:zhangsan_1495527850975],[family:info],[qualifier:age],[value:100],[time:1495556648878]
[row:zhangsan_1495527851080],[family:info],[qualifier:sex],[value:男],[time:1495556648983]
[row:zhangsan_1495527851095],[family:userid],[qualifier:id],[value:001],[time:1495556648996]


5.PageFilter
指定頁面行數,返回對應行數的結果集。
需要注意的是,該過濾器並不能保證返回的結果行數小於等於指定的頁面行數,因爲過濾器是分別作用到各個region server的,它只能保證當前region返回的結果行數不超過指定頁面行數。
構造函數:
PageFilter(long pageSize)
代碼:

Table table = connection.getTable(TableName.valueOf("user"));
        PageFilter pf = new PageFilter(2L);
        Scan scan = new Scan();
        scan.setFilter(pf);
        scan.setStartRow(Bytes.toBytes("zhangsan_"));
        ResultScanner rs = table.getScanner(scan);

結果:返回的結果實際上有四條,因爲這數據來自不同RegionServer, 

[row:zhangsan_1495527850759],[family:account],[qualifier:idcard],[value:9897645464646],[time:1495556648664]
[row:zhangsan_1495527850759],[family:account],[qualifier:passport],[value:5689879898],[time:1495636370056]
[row:zhangsan_1495527850824],[family:account],[qualifier:country],[value:china],[time:1495636452285]
[row:zhangsan_1495527850824],[family:account],[qualifier:name],[value:zhangsan],[time:1495556648729]

6.SkipFilter
根據整行中的每個列來做過濾,只要存在一列不滿足條件,整行都被過濾掉。
例如,如果一行中的所有列代表的是不同物品的重量,則真實場景下這些數值都必須大於零,我們希望將那些包含任意列值爲0的行都過濾掉。
在這個情況下,我們結合ValueFilter和SkipFilter共同實現該目的:
scan.setFilter(new SkipFilter(new ValueFilter(CompareOp.NOT_EQUAL,new BinaryComparator(Bytes.toBytes(0))));
構造函數:
SkipFilter(Filter filter)
代碼:

Table table = connection.getTable(TableName.valueOf("user"));
    	SkipFilter sf = new SkipFilter(new ValueFilter(CompareOp.NOT_EQUAL,
                new BinaryComparator(Bytes.toBytes("zhangsan"))));
        Scan scan = new Scan();
        scan.setFilter(sf);
        ResultScanner rs = table.getScanner(scan); 

結果:

[row:lisi_1495527849910],[family:account],[qualifier:idcard],[value:42963319861234561230],[time:1495556647872]
[row:lisi_1495527850081],[family:account],[qualifier:name],[value:lisi],[time:1495556647984]
[row:lisi_1495527850111],[family:account],[qualifier:password],[value:123451231236],[time:1495556648013]
[row:lisi_1495527850114],[family:address],[qualifier:city],[value:黃埔],[time:1495556648017]
[row:lisi_1495527850136],[family:address],[qualifier:province],[value:shanghai],[time:1495556648041]
[row:lisi_1495527850144],[family:info],[qualifier:age],[value:21],[time:1495556648045]
[row:lisi_1495527850154],[family:info],[qualifier:sex],[value:女],[time:1495556648056]
[row:lisi_1495527850159],[family:userid],[qualifier:id],[value:002],[time:1495556648060]
[row:wangwu_1495595824517],[family:userid],[qualifier:id],[value:009],[time:1495624624131]
[row:zhangsan_1495527850759],[family:account],[qualifier:idcard],[value:9897645464646],[time:1495556648664]
[row:zhangsan_1495527850759],[family:account],[qualifier:passport],[value:5689879898],[time:1495636370056]
[row:zhangsan_1495527850951],[family:address],[qualifier:province],[value:guangdong],[time:1495556648855]
[row:zhangsan_1495527850975],[family:info],[qualifier:age],[value:100],[time:1495556648878]
[row:zhangsan_1495527851080],[family:info],[qualifier:sex],[value:男],[time:1495556648983]
[row:zhangsan_1495527851095],[family:userid],[qualifier:id],[value:001],[time:1495556648996]

和原來數據相比  列值爲name的 zhagnsan的所在行的 rowkey   爲   zhangsan_1495527850824 在上面結果中是過濾了

[row:lisi_1495527849910],[family:account],[qualifier:idcard],[value:42963319861234561230]
[row:lisi_1495527850081],[family:account],[qualifier:name],[value:lisi]
[row:lisi_1495527850111],[family:account],[qualifier:password],[value:123451231236]
[row:lisi_1495527850114],[family:address],[qualifier:city],[value:黃埔]
[row:lisi_1495527850136],[family:address],[qualifier:province],[value:shanghai]
[row:lisi_1495527850144],[family:info],[qualifier:age],[value:21]
[row:lisi_1495527850154],[family:info],[qualifier:sex],[value:女]
[row:lisi_1495527850159],[family:userid],[qualifier:id],[value:002]
[row:wangwu_1495595824517],[family:userid],[qualifier:id],[value:009]
[row:zhangsan_1495527850759],[family:account],[qualifier:idcard],[value:9897645464646]
[row:zhangsan_1495527850759],[family:account],[qualifier:passport],[value:5689879898]
[row:zhangsan_1495527850824],[family:account],[qualifier:country],[value:china]
[row:zhangsan_1495527850824],[family:account],[qualifier:name],[value:zhangsan]
[row:zhangsan_1495527850951],[family:address],[qualifier:province],[value:guangdong]
[row:zhangsan_1495527850975],[family:info],[qualifier:age],[value:100]
[row:zhangsan_1495527851080],[family:info],[qualifier:sex],[value:男]
[row:zhangsan_1495527851095],[family:userid],[qualifier:id],[value:001]

7. FirstKeyOnlyFilter

該過濾器僅僅返回每一行中的第一個cell的值,可以用於高效的執行行數統計操作。

構造函數:

public FirstKeyOnlyFilter()


代碼

Table table = connection.getTable(TableName.valueOf("user"));
    	 FirstKeyOnlyFilter fkof = new FirstKeyOnlyFilter();
    	    Scan scan = new Scan();
    	    scan.setFilter(fkof);
    	    ResultScanner rs = table.getScanner(scan); 

結果: 看着返回數據還沒明白,僅僅返回每一行中的第一個cell的值,可以用於高效的執行行數統計操作。

[row:lisi_1495527849910],[family:account],[qualifier:idcard],[value:42963319861234561230],[time:1495556647872]
[row:lisi_1495527850081],[family:account],[qualifier:name],[value:lisi],[time:1495556647984]
[row:lisi_1495527850111],[family:account],[qualifier:password],[value:123451231236],[time:1495556648013]
[row:lisi_1495527850114],[family:address],[qualifier:city],[value:黃埔],[time:1495556648017]
[row:lisi_1495527850136],[family:address],[qualifier:province],[value:shanghai],[time:1495556648041]
[row:lisi_1495527850144],[family:info],[qualifier:age],[value:21],[time:1495556648045]
[row:lisi_1495527850154],[family:info],[qualifier:sex],[value:女],[time:1495556648056]
[row:lisi_1495527850159],[family:userid],[qualifier:id],[value:002],[time:1495556648060]
[row:wangwu_1495595824517],[family:userid],[qualifier:id],[value:009],[time:1495624624131]
[row:zhangsan_1495527850759],[family:account],[qualifier:idcard],[value:9897645464646],[time:1495556648664]
[row:zhangsan_1495527850824],[family:account],[qualifier:country],[value:china],[time:1495636452285]
[row:zhangsan_1495527850951],[family:address],[qualifier:province],[value:guangdong],[time:1495556648855]
[row:zhangsan_1495527850975],[family:info],[qualifier:age],[value:100],[time:1495556648878]
[row:zhangsan_1495527851080],[family:info],[qualifier:sex],[value:男],[time:1495556648983]
[row:zhangsan_1495527851095],[family:userid],[qualifier:id],[value:001],[time:1495556648996]

對比原數據:

[row:lisi_1495527849910],[family:account],[qualifier:idcard],[value:42963319861234561230]
[row:lisi_1495527850081],[family:account],[qualifier:name],[value:lisi]
[row:lisi_1495527850111],[family:account],[qualifier:password],[value:123451231236]
[row:lisi_1495527850114],[family:address],[qualifier:city],[value:黃埔]
[row:lisi_1495527850136],[family:address],[qualifier:province],[value:shanghai]
[row:lisi_1495527850144],[family:info],[qualifier:age],[value:21]
[row:lisi_1495527850154],[family:info],[qualifier:sex],[value:女]
[row:lisi_1495527850159],[family:userid],[qualifier:id],[value:002]
[row:wangwu_1495595824517],[family:userid],[qualifier:id],[value:009]
[row:zhangsan_1495527850759],[family:account],[qualifier:idcard],[value:9897645464646]
[row:zhangsan_1495527850759],[family:account],[qualifier:passport],[value:5689879898]
[row:zhangsan_1495527850824],[family:account],[qualifier:country],[value:china]
[row:zhangsan_1495527850824],[family:account],[qualifier:name],[value:zhangsan]
[row:zhangsan_1495527850951],[family:address],[qualifier:province],[value:guangdong]
[row:zhangsan_1495527850975],[family:info],[qualifier:age],[value:100]
[row:zhangsan_1495527851080],[family:info],[qualifier:sex],[value:男]
[row:zhangsan_1495527851095],[family:userid],[qualifier:id],[value:001]

對比一下明顯,rowkey相同的只會返回第一個rowkey的所在cell數據
 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章