HBase的coprocessor分拆HRegion

[quote]轉載請註明出處，文章鏈接：http://blackwing.iteye.com/blog/1788647[/quote]

之前通過修改TableInputFormatBase類實現了客戶端分拆每個HRegion，從而實現一個region可以同時被多個map同時讀取，原文：
[url]http://blackwing.iteye.com/admin/blogs/1763961[/url]

但以上方法是把數據取回客戶端進行，速度慢，現在改用coprocessor的endpoint方式，直接在server端計算好InputSplit後返回給客戶端。

Hbase的coprocessor詳解請參考：
[url]https://blogs.apache.org/hbase/entry/coprocessor_introduction[/url]

coprocessor的開發還是很直觀、簡單的。
1.繼承SplitRowProtocol

public interface SplitRowProtocol extends CoprocessorProtocol {

	public List<InputSplit> getSplitRow(byte [] splitStart, byte [] splitEnd, byte [] tableName,String regionLocation, int mappersPerSplit) throws IOException; 
}

把自己需要的函數、參數定義好。

2.實現剛纔繼承的接口、繼承BaseEndpointCoprocessor

public class SplitRowEndPoint extends BaseEndpointCoprocessor implements
		SplitRowProtocol {

	@Override
	public List<InputSplit> getSplitRow(byte[] splitStart, byte[] splitEnd,
			byte[] tableName,String regionLocation,int mappersPerSplit) throws IOException {
		RegionCoprocessorEnvironment environment = (RegionCoprocessorEnvironment) getEnvironment();
		List<InputSplit> splits = new ArrayList<InputSplit>();
		HRegion region = environment.getRegion();

		byte[] splitRow = region.checkSplit();

		if (null != splitRow)
			return splits;

		try {
			HTableInterface table =  environment.getTable(tableName);
			Scan scan = new Scan();
			scan.setFilter(new FirstKeyOnlyFilter());
			scan.setStartRow(splitStart);
			scan.setStopRow(splitEnd);
			scan.setBatch(300);
			/*String regionLocation = table.getRegionLocation(splitStart,true)
					.getHostname();*/

			InternalScanner scanner = region.getScanner(scan);

			List<String> rows = new ArrayList<String>();

			try {
				List<KeyValue> curVals = new ArrayList<KeyValue>();
				boolean hasMore = false;
				do {
					curVals.clear();
					hasMore = scanner.next(curVals);
					KeyValue kv = curVals.get(0);
					rows.add(Bytes.toString(curVals.get(0).getRow()));
				} while (hasMore);
			} finally {
				scanner.close();
			}

			int splitSize = rows.size() / mappersPerSplit;

			for (int j = 0; j < mappersPerSplit; j++) {
				TableSplit tablesplit = null;
				if (j == mappersPerSplit - 1)
					tablesplit = new TableSplit(table.getTableName(), rows.get(
							j * splitSize).getBytes(), rows
							.get(rows.size() - 1).getBytes(), regionLocation);
				else
					tablesplit = new TableSplit(table.getTableName(), rows.get(
							j * splitSize).getBytes(), rows.get(
							j * splitSize + splitSize - 1).getBytes(),
							regionLocation);
				splits.add(tablesplit);
			}

		} catch (IOException e) {
			e.printStackTrace();
		}
		return splits;
	}

}

3.爲需要使用到該coprocessor的表加載coprocessor
加載coprocessor有3種方式

1）一種是通過配置文件，在hbase-site.xml中配置：

<property>
    <name>hbase.coprocessor.region.classes</name>
  <value>com.blackwing.util.hbase.coprocessor.SplitRowEndPoint</value>
 </property>

這種方法缺點是，需要重啓hbase。

2）通過hbase shell設置coprocessor
主要通過alter和table_att實現coprocessor的設置，之前需要disable表才能進行操作：

alter 'user_video_pref_t2',METHOD=>'table_att','coprocessor'=>'hdfs://myhadoop:8020/user/hadoop/coprocessor.jar|com.blackwing.util.hbase.coprocessor.SplitRowEndPoint|1073741823'

跟着enable表，再describe這個表，就能看到已經爲該表添加了coprocessor。

3）java動態加載
動態加載，是通過java程序，實現某表的coprocessor設置，優點當然是無需重啓hbase。



		HBaseAdmin admin;
		String[] truncatedTableInfo;
		HTableDescriptor desc;
		truncatedTableInfo = conf.getStrings("user_video_pref");
		conf.addResource(propertyFileName);

		try {
			admin = new HBaseAdmin(conf);
			desc = new HTableDescriptor(truncatedTableInfo[0]);
			desc.setValue("VERSIONS", "1");

			HColumnDescriptor coldef = new HColumnDescriptor(truncatedTableInfo[1]);
			desc.addFamily(coldef);

			int priority = 0;
			if(conf.get("coprocessor.pref.priority").equals("USER"))
				priority = Coprocessor.PRIORITY_USER;
				else
					priority = Coprocessor.PRIORITY_SYSTEM;

			//2013-2-2 增加coprocessor
			desc.setValue("COPROCESSOR$1", conf.get("coprocessor.pref.path")+"|"
					+conf.get("coprocessor.pref.class")+
					"|"+priority);

			try {
				if(admin.isTableAvailable(truncatedTableInfo[0]))
				{
					//清表
					admin.disableTable(truncatedTableInfo[0]);
					admin.deleteTable(truncatedTableInfo[0]);

					if(admin.isTableAvailable(truncatedTableInfo[0]))
						LOG.info("truncate table : user_video_pref fail !");
					//建表
					admin.createTable(desc);
				}
				if(admin.isTableAvailable(truncatedTableInfo[0]))
					LOG.info("create table : user_video_pref done !");
			} catch (IOException e) {
				e.printStackTrace();
			}
		} catch (MasterNotRunningException e) {
			e.printStackTrace();
		} catch (ZooKeeperConnectionException e) {
			e.printStackTrace();
		}

以上3種方法只是把coprocessor增加到某表，但因爲hbase不會檢查路徑上的jar是否存在，類是否正確，所以要最終確認coprocessor是否真正添加成功，需要：
1）在hbase shell下，輸入status 'detailed'，看看對應表的屬性中是否有：coprocessors=[SplitRowEndPoint]

或者：
2）在hbase的60010 web界面中，找到剛增加了coprocessor的表，點擊進去其region server，查看該表的“Metrics”列，是否有：coprocessors=[SplitRowEndPoint]

如果有該屬性，說明coprocessor已經添加成功，這樣就能進行客戶端的遠程調用了。

客戶端調用coprocessor也有兩種方式，如下：

public <T extends CoprocessorProtocol> T coprocessorProxy(Class<T> protocol, Row row);

public <T extends CoprocessorProtocol, R> void coprocessorExec(
    Class<T> protocol, List<? extends Row> rows,
    BatchCall<T,R> callable, BatchCallback<R> callback);

public <T extends CoprocessorProtocol, R> voidcoprocessorExec(
    Class<T> protocol, RowRange range,
    BatchCall<T,R> callable, BatchCallback<R> callback);

一是使用coprocessorProxy方法，二是使用voidcoprocessorExec方法。二者的區別是[img]https://blogs.apache.org/hbase/mediaresource/71e2816c-c109-475a-9d64-bc6b74e61443[/img]
就是Exec方法是並行的，效率更高。

具體調用代碼如下：

	public static void main(String[] args) {
		Configuration conf = HBaseConfiguration.create();
		conf.addResource("FilePath.xml");
		String tableName="user_video_pref_t2";
		try {
			HTable table = new HTable(conf,tableName.getBytes());

			Pair<byte[][], byte[][]> keys = table.getStartEndKeys();

			for (int i = 0; i < keys.getFirst().length; i++) {
				String regionLocation = table.getRegionLocation(keys.getFirst()[i],
						true).getHostname();
				Batch.Call call = Batch.forMethod(SplitRowProtocol.class,
						"getSplitRow", f.getBytes(), e.getBytes(), tableName.getBytes(),regionLocation,1);
				Map<byte[], List<InputSplit>> results = table
						.coprocessorExec(SplitRowProtocol.class,
								f.getBytes(), e.getBytes(), call);
				// 2013-2-4 取得返回的所有InputSplit
				for (List<InputSplit> list : results.values())
				{
					System.out.println("total input splits : " + list.size());
				}
			}
		} catch (Throwable e) {
			e.printStackTrace();
		}

	}

coprocessor的另外一種模式，oberser模式，類似於傳統數據庫的觸發器，針對某個動作進行響應，例如preGet方法，就是在客戶端get操作前觸發執行，具體略過。

iteye_5062

發佈了62 篇原創文章 · 獲贊 1 · 訪問量 7318

私信關注

HBase的coprocessor分拆HRegion

SQL優化-20231016

hadoop的java.opts設置有誤導致job setup失敗

MySQL的Communications link failure

enable和disable表時出現表未disable/enable異常處理

ROOT不在線的另外一種原因及解決辦法

Centos下yum安裝wine

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結