文章目錄
1. 基本介紹
BufferedMutator
主要用來異步批量的將數據寫入一個hbase表,就像Htable
一樣。通過Connection
獲取一個實例。
Map/reduce 任務是BufferedMutator
的好的使用案例。Map/Reduce任務獲益於batch操作,但是沒有留出flush接口。BufferedMutator
從Map/Reduce任務接受數據,會依據一些先驗性的經驗批量提交數據,比如puts堆積的數量,由於批量提交時異步的,所以M/R邏輯不會因爲數據的batch提交而阻塞。Map/Reduce 批處理任務每個線程會有一個BufferedMutator
。單個BufferedMutator
也能夠很高效用於大數據量的在線系統,來成批的寫puts入hbase表。
2. BufferedMutator使用舉例
這裏分爲以下兩個批量寫入場景
2.1 單次一張表批量寫入
Configuration conf = HBaseConfiguration.create();
conf.set("hbase.zookeeper.quorum", "zookeeperHost");
final BufferedMutator.ExceptionListener listener = new BufferedMutator.ExceptionListener() {
@Override
public void onException(RetriesExhaustedWithDetailsException e, BufferedMutator mutator) {
for (int i = 0; i < e.getNumExceptions(); i++) {
LOG.info("Failed to sent put " + e.getRow(i) + "."); }
}
};
BufferedMutatorParams params = new BufferedMutatorParams(TABLE)
.listener(listener);
params.writeBufferSize(123123L);
try {
Connection conn = ConnectionFactory.createConnection(conf);
BufferedMutator mutator = conn.getBufferedMutator(params);
Put p = new Put(Bytes.toBytes("someRow"));
p.addColumn(FAMILY, Bytes.toBytes("someQualifier"), Bytes.toBytes("some value"));
mutator.mutate(p);
mutator.close();
conn.close();
} catch (IOException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
}
多次多張表批量寫入
可以使用一個Map保存多個Table的連接,這裏使用的是線程安全的ConcurrentHashMap
,如果是單線程的場景可以換成HashMap
以提高效率。
private static Map<String, BufferedMutator> tableConnectionMgr = new ConcurrentHashMap<>();
private BufferedMutator getTableConnection(String tableName) throws IOException {
if (tableConnectionMgr.get(tableName) != null) {
return tableConnectionMgr.get(tableName);
}
Connection connection = ConnectionFactory.createConnection(config);
BufferedMutator table = connection.getBufferedMutator(TableName.valueOf(tableName));
tableConnectionMgr.put(tableName, table);
log.info("hbase table: {} connect established!", tableName);
return tableConnectionMgr.get(tableName);
}
3 源碼介紹
3.1 主要類介紹
BufferedMutatorParams
實例化一個BufferedMutator所需要的參數。
主要參數TableName(表名),writeBufferSize(寫緩存大小),maxKeyValueSize(最大key-value大小),ExecutorService(執行線程池),ExceptionListener(監聽BufferedMutator的異常)。
BufferedMutatorImpl
用來和hbase表交互,類似於Htable,但是意味着批量,異步的puts。通過HConnectionImplementation獲得實例,具體方法如下:
public BufferedMutator getBufferedMutator(BufferedMutatorParams params) {
if (params.getTableName() == null) {
throw new IllegalArgumentException("TableName cannot be null.");
}
if (params.getPool() == null) {
params.pool(HTable.getDefaultExecutor(getConfiguration()));
}
if (params.getWriteBufferSize() == BufferedMutatorParams.UNSET) {
params.writeBufferSize(connectionConfig.getWriteBufferSize());
}
if (params.getMaxKeyValueSize() == BufferedMutatorParams.UNSET) {
params.maxKeyValueSize(connectionConfig.getMaxKeyValueSize());
}
return new BufferedMutatorImpl(this, rpcCallerFactory, rpcControllerFactory, params);
}
AsyncProcess
AsyncProcess
內部維護的有一個線程池,我們的操作會被封裝成runnable,然後扔到線程池裏執行。這個過程是異步的,直到任務數達到最大值。
HConnectionImplementation
一個集羣的鏈接。通過它可以找到master,定位到regions的分佈,保持locations的緩存,並指導如何校準localtions信息。
3.2 源碼過程
3.2.1 BufferedMutator構建的過程
-
首先是要構建一個HBaseConfiguration
Configuration conf = HBaseConfiguration.create(); conf.set(“hbase.zookeeper.quorum”, “zookeeperHost”);
-
接着是構建BufferedMutatorParams
final BufferedMutator.ExceptionListener listener = new BufferedMutator.ExceptionListener() { @Override public void onException(RetriesExhaustedWithDetailsException e, BufferedMutator mutator) { for (int i = 0; i < e.getNumExceptions(); i++) { LOG.info("Failed to sent put " + e.getRow(i) + "."); } } }; BufferedMutatorParams params = new BufferedMutatorParams(TABLE) .listener(listener); params.writeBufferSize(123);
-
最後構建HConnection
Connection conn = ConnectionFactory.createConnection(getConf())
-
最後構建BufferMutator
BufferedMutator mutator = conn.getBufferedMutator(params)
3.2.2 數據發送的過程
-
構建put或者List[put]
-
調用BufferedMutator.mutate方法
-
刷寫到hbase
刷寫到hbase三種方法:
一,顯式調用BufferedMutator.flush
二,發送結束的時候調用BufferedMutator.close
三,它根據當前緩存大於了設置的寫緩存大小
while (undealtMutationCount.get() != 0 && currentWriteBufferSize.get() > writeBufferSize) { backgroundFlushCommits(false); }
最終都是調用的backgroundFlushCommits方法。
-
rpc的過程
入口是backgroundFlushCommits方法。Ap是AsyncProcess的實例。
ap.submit(tableName, taker, true, null, false);
首先是構建了一個HashMap,可以通過server找到該server上我們需要的region
//可以根據我們的server找到要發送到該server的actions Map<ServerName, MultiAction<Row>> actionsByServer = new HashMap<ServerName, MultiAction<Row>>();
獲取所有的region信息,所有region的副本都被包括在內
RegionLocations locs = connection.locateRegion( tableName, r.getRow(), true, true, RegionReplicaUtil.DEFAULT_REPLICA_ID);
獲取默認的region信息此時一個region只會返回一個默認id指定的位置。
loc = locs.getDefaultRegionLocation();
將row操作轉變爲action,並加入actionsByServer
//可以操作將row操作變爲Action Action<Row> action = new Action<Row>(r, ++posInList); setNonce(ng, r, action); retainedActions.add(action); // TODO: replica-get is not supported on this path byte[] regionName = loc.getRegionInfo().getRegionName(); addAction(loc.getServerName(), regionName, action, actionsByServer, nonceGroup); it.remove();
接着是
AsyncProcess.submitMultiActions
AsyncRequestFutureImpl
.sendMultiAction(actionsByServer, 1, null, false);
內部主要是根據server,獲取MultiAction,然後構建Runnable
for (Map.Entry<ServerName, MultiAction<Row>> e : actionsByServer.entrySet()) { ServerName server = e.getKey(); MultiAction<Row> multiAction = e.getValue(); Collection<? extends Runnable> runnables = getNewMultiActionRunnable(server, multiAction, numAttempt); // make sure we correctly count the number of runnables before we try to reuse the send // thread, in case we had to split the request into different runnables because of backoff if (runnables.size() > actionsRemaining) { actionsRemaining = runnables.size(); }
然後,遍歷執行Runnable
for (Runnable runnable : runnables) { if ((--actionsRemaining == 0) && reuseThread && numAttempt % HConstants.DEFAULT_HBASE_CLIENT_RETRIES_NUMBER != 0) { runnable.run(); } else { try { pool.submit(runnable);
-
Runnable的構建及Run方法
主要是進入getNewMultiActionRunnable
List<Runnable> toReturn = new ArrayList<Runnable>(actions.size()); for (DelayingRunner runner : actions.values()) { incTaskCounters(runner.getActions().getRegions(), server); String traceText = "AsyncProcess.sendMultiAction"; Runnable runnable = createSingleServerRequest(runner.getActions(), numAttempt, server, callsInProgress); // use a delay runner only if we need to sleep for some time if (runner.getSleepTime() > 0) { runner.setRunner(runnable); traceText = "AsyncProcess.clientBackoff.sendMultiAction"; runnable = runner; if (connection.getConnectionMetrics() != null) { connection.getConnectionMetrics().incrDelayRunners(); connection.getConnectionMetrics().updateDelayInterval(runner.getSleepTime()); } } else { if (connection.getConnectionMetrics() != null) { connection.getConnectionMetrics().incrNormalRunners(); } } runnable = Trace.wrap(traceText, runnable); toReturn.add(runnable);
進入SingleServerRequestRunnable,分析其Run方法
// setup the callable based on the actions, if we don't have one already from the request if (callable == null) { callable = createCallable(server, tableName, multiAction); } RpcRetryingCaller<MultiResponse> caller = createCaller(callable, rpcTimeout); try { if (callsInProgress != null) { callsInProgress.add(callable); } res = caller.callWithoutRetries(callable, operationTimeout);
然後是RpcRetryingCaller中調用了MultiServerCallable的call方法,主要是構建請求,調用RPC。這就進入了服務端也即RSRpcServices的mutil方法。
responseProto = getStub().multi(controller, requestProto);
3.2.3 HRegionserver端處理
RSRpcServices是服務端,本文對應的服務端實現是RSRpcServices.mutli。
if (request.hasCondition()) {
Condition condition = request.getCondition();
byte[] row = condition.getRow().toByteArray();
byte[] family = condition.getFamily().toByteArray();
byte[] qualifier = condition.getQualifier().toByteArray();
CompareOp compareOp = CompareOp.valueOf(condition.getCompareType().name());
ByteArrayComparable comparator =
ProtobufUtil.toComparator(condition.getComparator());
processed = checkAndRowMutate(region, regionAction.getActionList(),
cellScanner, row, family, qualifier, compareOp,
comparator, regionActionResultBuilder);
} else {
mutateRows(region, regionAction.getActionList(), cellScanner,
regionActionResultBuilder);
processed = Boolean.TRUE;
}
根據條件進入checkAndRowMutate或者mutateRows。
根據類型做不同的操作,然後正式進入執行操作
MutationType type = action.getMutation().getMutateType();
if (rm == null) {
rm = new RowMutations(action.getMutation().getRow().toByteArray());
}
switch (type) {
case PUT:
rm.add(ProtobufUtil.toPut(action.getMutation(), cellScanner));
break;
case DELETE:
rm.add(ProtobufUtil.toDelete(action.getMutation(), cellScanner));
break;
default:
throw new DoNotRetryIOException("Atomic put and/or delete only, not " + type.name());
}
// To unify the response format with doNonAtomicRegionMutation and read through client's
// AsyncProcess we have to add an empty result instance per operation
resultOrExceptionOrBuilder.clear();
resultOrExceptionOrBuilder.setIndex(i++);
builder.addResultOrException(
resultOrExceptionOrBuilder.build());
}
region.mutateRow(rm);
HRegion.mutateRow方法
HRegion.mutateRowsWithLocks
public void mutateRowsWithLocks(Collection<Mutation> mutations,
Collection<byte[]> rowsToLock) throws IOException {
mutateRowsWithLocks(mutations, rowsToLock, HConstants.NO_NONCE, HConstants.NO_NONCE);
}
public void mutateRowsWithLocks(Collection<Mutation> mutations,
Collection<byte[]> rowsToLock, long nonceGroup, long nonce) throws IOException {
MultiRowMutationProcessor proc = new MultiRowMutationProcessor(mutations, rowsToLock);
processRowsWithLocks(proc, -1, nonceGroup, nonce);
}
具體處理的過程,可以自行去看了,源碼註釋條例很清晰。
4. 總結
Hbase的JAVA API客戶端,寫操作有三種實現:
-
HTablePool
源碼請看hbase權威指南。
-
HConnection
這種方式要自己實現一個線程池。
Connection conn = ConnectionFactory.createConnection(conf); TableName tabName= TableName.valueOf("tableName"); Table table=conn.getTable(tabName);
-
BufferedMutator
建議put操作採用這種方式。
批量,異步puts操作。
5. Ref
- https://cloud.tencent.com/developer/article/1032502
- hbase權威指南