HBase的coprocessor分拆HRegion

[quote]轉載請註明出處,文章鏈接:http://blackwing.iteye.com/blog/1788647[/quote]


之前通過修改TableInputFormatBase類實現了客戶端分拆每個HRegion,從而實現一個region可以同時被多個map同時讀取,原文:
[url]http://blackwing.iteye.com/admin/blogs/1763961[/url]

但以上方法是把數據取回客戶端進行,速度慢,現在改用coprocessor的endpoint方式,直接在server端計算好InputSplit後返回給客戶端。

Hbase的coprocessor詳解請參考:
[url]https://blogs.apache.org/hbase/entry/coprocessor_introduction[/url]

coprocessor的開發還是很直觀、簡單的。
1.繼承SplitRowProtocol
public interface SplitRowProtocol extends CoprocessorProtocol {

public List<InputSplit> getSplitRow(byte [] splitStart, byte [] splitEnd, byte [] tableName,String regionLocation, int mappersPerSplit) throws IOException;
}


把自己需要的函數、參數定義好。

2.實現剛纔繼承的接口、繼承BaseEndpointCoprocessor
public class SplitRowEndPoint extends BaseEndpointCoprocessor implements
SplitRowProtocol {

@Override
public List<InputSplit> getSplitRow(byte[] splitStart, byte[] splitEnd,
byte[] tableName,String regionLocation,int mappersPerSplit) throws IOException {
RegionCoprocessorEnvironment environment = (RegionCoprocessorEnvironment) getEnvironment();
List<InputSplit> splits = new ArrayList<InputSplit>();
HRegion region = environment.getRegion();

byte[] splitRow = region.checkSplit();

if (null != splitRow)
return splits;

try {
HTableInterface table = environment.getTable(tableName);
Scan scan = new Scan();
scan.setFilter(new FirstKeyOnlyFilter());
scan.setStartRow(splitStart);
scan.setStopRow(splitEnd);
scan.setBatch(300);
/*String regionLocation = table.getRegionLocation(splitStart,true)
.getHostname();*/

InternalScanner scanner = region.getScanner(scan);

List<String> rows = new ArrayList<String>();

try {
List<KeyValue> curVals = new ArrayList<KeyValue>();
boolean hasMore = false;
do {
curVals.clear();
hasMore = scanner.next(curVals);
KeyValue kv = curVals.get(0);
rows.add(Bytes.toString(curVals.get(0).getRow()));
} while (hasMore);
} finally {
scanner.close();
}

int splitSize = rows.size() / mappersPerSplit;

for (int j = 0; j < mappersPerSplit; j++) {
TableSplit tablesplit = null;
if (j == mappersPerSplit - 1)
tablesplit = new TableSplit(table.getTableName(), rows.get(
j * splitSize).getBytes(), rows
.get(rows.size() - 1).getBytes(), regionLocation);
else
tablesplit = new TableSplit(table.getTableName(), rows.get(
j * splitSize).getBytes(), rows.get(
j * splitSize + splitSize - 1).getBytes(),
regionLocation);
splits.add(tablesplit);
}

} catch (IOException e) {
e.printStackTrace();
}
return splits;
}

}


3.爲需要使用到該coprocessor的表加載coprocessor
加載coprocessor有3種方式

1)一種是通過配置文件,在hbase-site.xml中配置:
<property>
<name>hbase.coprocessor.region.classes</name>
<value>com.blackwing.util.hbase.coprocessor.SplitRowEndPoint</value>
</property>


這種方法缺點是,需要重啓hbase。

2)通過hbase shell設置coprocessor
主要通過alter和table_att實現coprocessor的設置,之前需要disable表才能進行操作:
alter 'user_video_pref_t2',METHOD=>'table_att','coprocessor'=>'hdfs://myhadoop:8020/user/hadoop/coprocessor.jar|com.blackwing.util.hbase.coprocessor.SplitRowEndPoint|1073741823'


跟着enable表,再describe這個表,就能看到已經爲該表添加了coprocessor。

3)java動態加載
動態加載,是通過java程序,實現某表的coprocessor設置,優點當然是無需重啓hbase。



HBaseAdmin admin;
String[] truncatedTableInfo;
HTableDescriptor desc;
truncatedTableInfo = conf.getStrings("user_video_pref");
conf.addResource(propertyFileName);

try {
admin = new HBaseAdmin(conf);
desc = new HTableDescriptor(truncatedTableInfo[0]);
desc.setValue("VERSIONS", "1");

HColumnDescriptor coldef = new HColumnDescriptor(truncatedTableInfo[1]);
desc.addFamily(coldef);

int priority = 0;
if(conf.get("coprocessor.pref.priority").equals("USER"))
priority = Coprocessor.PRIORITY_USER;
else
priority = Coprocessor.PRIORITY_SYSTEM;

//2013-2-2 增加coprocessor
desc.setValue("COPROCESSOR$1", conf.get("coprocessor.pref.path")+"|"
+conf.get("coprocessor.pref.class")+
"|"+priority);

try {
if(admin.isTableAvailable(truncatedTableInfo[0]))
{
//清表
admin.disableTable(truncatedTableInfo[0]);
admin.deleteTable(truncatedTableInfo[0]);

if(admin.isTableAvailable(truncatedTableInfo[0]))
LOG.info("truncate table : user_video_pref fail !");
//建表
admin.createTable(desc);
}
if(admin.isTableAvailable(truncatedTableInfo[0]))
LOG.info("create table : user_video_pref done !");
} catch (IOException e) {
e.printStackTrace();
}
} catch (MasterNotRunningException e) {
e.printStackTrace();
} catch (ZooKeeperConnectionException e) {
e.printStackTrace();
}


以上3種方法只是把coprocessor增加到某表,但因爲hbase不會檢查路徑上的jar是否存在,類是否正確,所以要最終確認coprocessor是否真正添加成功,需要:
1)在hbase shell下,輸入status 'detailed',看看對應表的屬性中是否有:coprocessors=[SplitRowEndPoint]

或者:
2)在hbase的60010 web界面中,找到剛增加了coprocessor的表,點擊進去其region server,查看該表的“Metrics”列,是否有:coprocessors=[SplitRowEndPoint]

如果有該屬性,說明coprocessor已經添加成功,這樣就能進行客戶端的遠程調用了。

客戶端調用coprocessor也有兩種方式,如下:
public <T extends CoprocessorProtocol> T coprocessorProxy(Class<T> protocol, Row row);

public <T extends CoprocessorProtocol, R> void coprocessorExec(
Class<T> protocol, List<? extends Row> rows,
BatchCall<T,R> callable, BatchCallback<R> callback);

public <T extends CoprocessorProtocol, R> voidcoprocessorExec(
Class<T> protocol, RowRange range,
BatchCall<T,R> callable, BatchCallback<R> callback);

一是使用coprocessorProxy方法,二是使用voidcoprocessorExec方法。二者的區別是[img]https://blogs.apache.org/hbase/mediaresource/71e2816c-c109-475a-9d64-bc6b74e61443[/img]
就是Exec方法是並行的,效率更高。

具體調用代碼如下:
	public static void main(String[] args) {
Configuration conf = HBaseConfiguration.create();
conf.addResource("FilePath.xml");
String tableName="user_video_pref_t2";
try {
HTable table = new HTable(conf,tableName.getBytes());

Pair<byte[][], byte[][]> keys = table.getStartEndKeys();

for (int i = 0; i < keys.getFirst().length; i++) {
String regionLocation = table.getRegionLocation(keys.getFirst()[i],
true).getHostname();
Batch.Call call = Batch.forMethod(SplitRowProtocol.class,
"getSplitRow", f.getBytes(), e.getBytes(), tableName.getBytes(),regionLocation,1);
Map<byte[], List<InputSplit>> results = table
.coprocessorExec(SplitRowProtocol.class,
f.getBytes(), e.getBytes(), call);
// 2013-2-4 取得返回的所有InputSplit
for (List<InputSplit> list : results.values())
{
System.out.println("total input splits : " + list.size());
}
}
} catch (Throwable e) {
e.printStackTrace();
}

}


coprocessor的另外一種模式,oberser模式,類似於傳統數據庫的觸發器,針對某個動作進行響應,例如preGet方法,就是在客戶端get操作前觸發執行,具體略過。
發佈了62 篇原創文章 · 獲贊 1 · 訪問量 7318
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章