源碼解讀--(1)hbase客戶端源代碼	http://aperise.iteye.com/blog/2372350
源碼解讀--(2)hbase-examples BufferedMutator Example	http://aperise.iteye.com/blog/2372505
源碼解讀--(3)hbase-examples MultiThreadedClientExample	http://aperise.iteye.com/blog/2372534

1.hbase客戶端使用

1.1 在maven工程中引入hbase客戶端jar

		<!-- hbase -->
		<dependency>
			<groupId>org.apache.hbase</groupId>
			<artifactId>hbase-client</artifactId>
			<version>1.2.1</version>
		</dependency>

1.2 推薦的創建hbase客戶端代碼

推薦的客戶端使用方式一：

Configuration configuration = HBaseConfiguration.create();    
configuration.set("hbase.zookeeper.property.clientPort", "2181");    
configuration.set("hbase.client.write.buffer", "2097152");    
configuration.set("hbase.zookeeper.quorum","192.168.199.31,192.168.199.32,192.168.199.33,192.168.199.34,192.168.199.35");    
//默認connection實現是org.apache.hadoop.hbase.client.ConnectionManager.HConnectionImplementation
Connection connection = ConnectionFactory.createConnection(configuration);    
//默認table實現是org.apache.hadoop.hbase.client.HTable
Table table = connection.getTable(TableName.valueOf("tableName")); 

//3177不是我杜撰的，是2*hbase.client.write.buffer/put.heapSize()計算出來的 
int bestBathPutSize = 3177;   

try {    
  // Use the table as needed, for a single operation and a single thread    
  // construct List<Put> putLists    
  List<Put> putLists = new ArrayList<Put>();  
  for(int count=0;count<100000;count++){  
    Put put = new Put(rowkey.getBytes());  
    put.addImmutable("columnFamily1".getBytes(), "columnName1".getBytes(), "columnValue1".getBytes());  
    put.addImmutable("columnFamily1".getBytes(), "columnName2".getBytes(), "columnValue2".getBytes());  
    put.addImmutable("columnFamily1".getBytes(), "columnName3".getBytes(), "columnValue3".getBytes());  
    put.setDurability(Durability.SKIP_WAL);
    putLists.add(put);  
      
    if(putLists.size()==bestBathPutSize){  
      //達到最佳大小值了，馬上提交一把  
        table.put(putLists);  
        putLists.clear();  
    }  
  }  
  //剩下的未提交數據，最後做一次提交  
  table.put(putLists)    
} finally {    
  table.close();    
  connection.close();    
}

推薦的客戶端使用方式二：

Configuration configuration = HBaseConfiguration.create();        
configuration.set("hbase.zookeeper.property.clientPort", "2181");        
configuration.set("hbase.client.write.buffer", "2097152");        
configuration.set("hbase.zookeeper.quorum","192.168.199.31,192.168.199.32,192.168.199.33,192.168.199.34,192.168.199.35");  
  
BufferedMutatorParams params = new BufferedMutatorParams(TableName.valueOf("tableName"));  
  
//3177不是我杜撰的，是2*hbase.client.write.buffer/put.heapSize()計算出來的     
int bestBathPutSize = 3177;     
  
//這裏利用jdk1.7裏的新特性try(必須實現java.io.Closeable的對象){}catch (Exception e) {}  
//相當於調用了finally功能，調用(必須實現java.io.Closeable的對象)的close()方法，也即會調用conn.close(),mutator.close()  
try(  
  //默認connection實現是org.apache.hadoop.hbase.client.ConnectionManager.HConnectionImplementation   
  Connection conn = ConnectionFactory.createConnection(configuration);  
  //默認mutator實現是org.apache.hadoop.hbase.client.BufferedMutatorImpl  
  BufferedMutator mutator = conn.getBufferedMutator(params);  
){           
  List<Put> putLists = new ArrayList<Put>();      
  for(int count=0;count<100000;count++){      
    Put put = new Put(rowkey.getBytes());      
    put.addImmutable("columnFamily1".getBytes(), "columnName1".getBytes(), "columnValue1".getBytes());      
    put.addImmutable("columnFamily1".getBytes(), "columnName2".getBytes(), "columnValue2".getBytes());      
    put.addImmutable("columnFamily1".getBytes(), "columnName3".getBytes(), "columnValue3".getBytes());      
    put.setDurability(Durability.SKIP_WAL);    
    putLists.add(put);      
          
    if(putLists.size()==bestBathPutSize){      
      //達到最佳大小值了，馬上提交一把      
        mutator.mutate(putLists);     
        mutator.flush();  
        putLists.clear();  
    }      
  }      
  //剩下的未提交數據，最後做一次提交         
  mutator.mutate(putLists);     
  mutator.flush();  
}catch(IOException e) {  
  LOG.info("exception while creating/destroying Connection or BufferedMutator", e);  
}

兩種方式做一個對比如下：

Table.put(List<Put>)	BufferedMutator.mutate(List<Put>)
Table.put(List<Put>)源代碼本質是將BufferedMutator.mutate(List<Put>)進行了包裝，多了個autoFlush標誌，首先調用BufferedMutator.mutate(List<Put>)按照設定的hbase.client.write.buffer(默認2MB)不斷提交，最後因爲默認的autoFlush=true，所以每次都會提交	BufferedMutator.mutate(List<Put>)會計算所給集合所佔內存，如果超過hbase.client.write.buffer(默認2MB)就提交一次，直到不超過就等待，一直等待到表要關閉前再次提交一次

1.3 被遺棄的hbase客戶端使用代碼

被遺棄的創建方式一：直接通過HTable(Configuration conf, final String tableName)創建

Configuration configuration = HBaseConfiguration.create();    
configuration.set("hbase.zookeeper.property.clientPort", "2181");    
configuration.set("hbase.client.write.buffer", "2097152");    
configuration.set("hbase.zookeeper.quorum","192.168.199.31,192.168.199.32,192.168.199.33,192.168.199.34,192.168.199.35");    
Table table = new HTable(configuration, "tableName"); 

//3177不是我杜撰的，是2*hbase.client.write.buffer/put.heapSize()計算出來的 
int bestBathPutSize = 3177;   

try {    
  // Use the table as needed, for a single operation and a single thread    
  // construct List<Put> putLists    
  List<Put> putLists = new ArrayList<Put>();  
  for(int count=0;count<100000;count++){  
    Put put = new Put(rowkey.getBytes());  
    put.addImmutable("columnFamily1".getBytes(), "columnName1".getBytes(), "columnValue1".getBytes());  
    put.addImmutable("columnFamily1".getBytes(), "columnName2".getBytes(), "columnValue2".getBytes());  
    put.addImmutable("columnFamily1".getBytes(), "columnName3".getBytes(), "columnValue3".getBytes());  
    put.setDurability(Durability.SKIP_WAL);
    putLists.add(put);  
      
    if(putLists.size()==(bestBathPutSize-1)){  
      //達到最佳大小值了，馬上提交一把  
        table.put(putLists);  
        putLists.clear();  
    }  
  }  
  //剩下的未提交數據，最後做一次提交  
  table.put(putLists)    
} finally {    
  table.close();    
  connection.close();    
}

被遺棄的方式二：通過HConnectionManager.createConnection(Configuration conf)獲取HTableInterface

Configuration configuration = HBaseConfiguration.create();    
configuration.set("hbase.zookeeper.property.clientPort", "2181");    
configuration.set("hbase.client.write.buffer", "2097152");    
configuration.set("hbase.zookeeper.quorum","192.168.199.31,192.168.199.32,192.168.199.33,192.168.199.34,192.168.199.35");    
HConnection connection = HConnectionManager.createConnection(configuration);
HTableInterface table = connection.getTable(TableName.valueOf("tableName"));

//3177不是我杜撰的，是2*hbase.client.write.buffer/put.heapSize()計算出來的 
int bestBathPutSize = 3177;   

try {    
  // Use the table as needed, for a single operation and a single thread    
  // construct List<Put> putLists    
  List<Put> putLists = new ArrayList<Put>();  
  for(int count=0;count<100000;count++){  
    Put put = new Put(rowkey.getBytes());  
    put.addImmutable("columnFamily1".getBytes(), "columnName1".getBytes(), "columnValue1".getBytes());  
    put.addImmutable("columnFamily1".getBytes(), "columnName2".getBytes(), "columnValue2".getBytes());  
    put.addImmutable("columnFamily1".getBytes(), "columnName3".getBytes(), "columnValue3".getBytes());  
    put.setDurability(Durability.SKIP_WAL);
    putLists.add(put);  
      
    if(putLists.size()==(bestBathPutSize-1)){  
      //達到最佳大小值了，馬上提交一把  
        table.put(putLists);  
        putLists.clear();  
    }  
  }  
  //剩下的未提交數據，最後做一次提交  
  table.put(putLists)    
} finally {    
  table.close();    
  connection.close();    
}

2.hbase客戶端源碼解讀

前面我們說過，推薦的使用hbase客戶端的方式如下：

Connection connection = ConnectionFactory.createConnection(configuration);  
Table table = connection.getTable(TableName.valueOf("tableName"));

那源代碼的查看就從這兩行代碼開始，先來看下ConnectionFactory.createConnection(configuration)

2.1 ConnectionFactory.createConnection(Configuration conf)

先看下createConnection(Configuration conf)的源代碼,如下：

  public static Connection createConnection(Configuration conf) throws IOException {
    return createConnection(conf, null, null);
  }

傳入我們構造的Configuration對象，然後調用了ConnectionFactory.createConnection(Configuration conf, ExecutorService pool, User user)，繼續看ConnectionFactory.createConnection(Configuration conf, ExecutorService pool, User user)的源代碼，如下：

  public static Connection createConnection(Configuration conf, ExecutorService pool, User user)
  throws IOException {
    //因爲上面傳入的user爲null，這裏代碼不會執行
    if (user == null) {
      UserProvider provider = UserProvider.instantiate(conf);
      user = provider.getCurrent();
    }

    return createConnection(conf, false, pool, user);
  }

這裏繼續調用了ConnectionFactory.createConnection(final Configuration conf, final boolean managed, final ExecutorService pool, final User user)，那麼我們繼續看下相關代碼，如下：

static Connection createConnection(final Configuration conf, final boolean managed, final ExecutorService pool, final User user)
  throws IOException {
    //默認HBASE_CLIENT_CONNECTION_IMPL = "hbase.client.connection.impl"
    //hbase.client.connection.impl供hbase使用者實現自己的hbase鏈接實現類並配置進來使用
    //默認hbase已經提供了實現，無需實現，那麼這裏就取默認實現ConnectionManager.HConnectionImplementation.class.getName()
    //默認hbase的connection實現類也即HConnectionImplementation類
    String className = conf.get(HConnection.HBASE_CLIENT_CONNECTION_IMPL,ConnectionManager.HConnectionImplementation.class.getName());
    Class<?> clazz = null;
    try {
      clazz = Class.forName(className);
    } catch (ClassNotFoundException e) {
      throw new IOException(e);
    }
    try {
      // Default HCM#HCI is not accessible; make it so before invoking.
      //這裏調用HConnectionImplementation類的構造方法HConnectionImplementation(Configuration conf, boolean managed, ExecutorService pool, User user)
      Constructor<?> constructor = clazz.getDeclaredConstructor(Configuration.class, boolean.class, ExecutorService.class, User.class);
      constructor.setAccessible(true);
      return (Connection) constructor.newInstance(conf, managed, pool, user);
    } catch (Exception e) {
      throw new IOException(e);
    }
  }
}

上面的代碼默認調用ConnectionManager.HConnectionImplementation類返回Connection對象，繼續跟蹤HConnectionImplementation(Configuration conf, boolean managed, ExecutorService pool, User user)代碼：

HConnectionImplementation(Configuration conf, boolean managed, ExecutorService pool, User user) throws IOException {
      //這裏代碼我們需要重點關注
      this(conf);
      //這裏this.user=null
      this.user = user;
      //這裏this.batchPool=null
      this.batchPool = pool;
      //這裏this.managed=false
      this.managed = managed;
      //這裏setupRegistry()默認從hbase.client.registry.impl獲取客戶端使用者實現的zookeeper註冊類，沒有配置就默認創建ZooKeeperRegistry類對象並設置，這個類非常重要，客戶端與zookeeper的交互類就由此類負責
      this.registry = setupRegistry();
      //默認通過ZooKeeperRegistry對象從zookeeper獲取hbase集羣的clusterId
      retrieveClusterId();

       //如果Configuration沒配置hbase.rpc.client.impl就默認創建RpcClientImpl並設置給this.rpcClient
      this.rpcClient = RpcClientFactory.createClient(this.conf, this.clusterId, this.metrics);
      this.rpcControllerFactory = RpcControllerFactory.instantiate(conf);

      // Do we publish the status?
      //如果Configuration沒配置hbase.status.published就默認設置shouldListen=false
      boolean shouldListen = conf.getBoolean(HConstants.STATUS_PUBLISHED, HConstants.STATUS_PUBLISHED_DEFAULT);
          
      //如果Configuration沒配置hbase.status.listener.class就默認創建MulticastListener對象並設置給listenerClass   
      Class<? extends ClusterStatusListener.Listener> listenerClass = conf.getClass(ClusterStatusListener.STATUS_LISTENER_CLASS, ClusterStatusListener.DEFAULT_STATUS_LISTENER_CLASS, ClusterStatusListener.Listener.class);
      if (shouldListen) {
        if (listenerClass == null) {
          LOG.warn(HConstants.STATUS_PUBLISHED + " is true, but " + ClusterStatusListener.STATUS_LISTENER_CLASS + " is not set - not listening status");
        } else {
          //這裏通過hbase事件監聽器監視hbase服務端事件，當hbase服務端服務不可用時，調用rpcClient.cancelConnections關閉鏈接
          clusterStatusListener = new ClusterStatusListener(
              new ClusterStatusListener.DeadServerHandler() {
                @Override
                public void newDead(ServerName sn) {
                  clearCaches(sn);
                  rpcClient.cancelConnections(sn);
                }
              }, conf, listenerClass);
        }
      }
    }

上面的代碼我們主要關注this(conf);另外一個需要注意的就是方法setupRegistry()，setupRegistry()這裏默認設置的是org.apache.hadoop.hbase.client.ZooKeeperRegistry，這一行並將在後面繼續分析，其它的代碼都比較簡單，我在上面代碼中已經做代碼註釋，繼續看this(conf)代碼：

protected HConnectionImplementation(Configuration conf) {
      //這裏把客戶端使用者傳入的Configuration賦值給this.conf
      this.conf = conf;
      //這裏HConnectionImplementation基於我們傳入的Configuration構建了自己的Configuration類對象this.connectionConfig
      this.connectionConfig = new ConnectionConfiguration(conf);
      this.closed = false;
      //客戶端使用者的Configuration沒有配置hbase.client.pause，那麼就設置默認值this.pause=100
      this.pause = conf.getLong(HConstants.HBASE_CLIENT_PAUSE, HConstants.DEFAULT_HBASE_CLIENT_PAUSE);
      //客戶端使用者的Configuration沒有配置hbase.meta.replicas.use，那麼就設置默認值this.useMetaReplicas=false
      this.useMetaReplicas = conf.getBoolean(HConstants.USE_META_REPLICAS, HConstants.DEFAULT_USE_META_REPLICAS);
      //從this.connectionConfig裏獲取值設置，而客戶端使用者的Configuration沒有配置hbase.client.retries.number就默認設置this.numTries=31
      this.numTries = connectionConfig.getRetriesNumber();
      //客戶端使用者的Configuration沒有配置hbase.rpc.timeout，那麼就設置默認值this.rpcTimeout=60000毫秒
      this.rpcTimeout = conf.getInt(HConstants.HBASE_RPC_TIMEOUT_KEY, HConstants.DEFAULT_HBASE_RPC_TIMEOUT);
      if (conf.getBoolean(CLIENT_NONCES_ENABLED_KEY, true)) {
        synchronized (nonceGeneratorCreateLock) {
          if (ConnectionManager.nonceGenerator == null) {
            ConnectionManager.nonceGenerator = new PerClientRandomNonceGenerator();
          }
          this.nonceGenerator = ConnectionManager.nonceGenerator;
        }
      } else {
        this.nonceGenerator = new NoNonceGenerator();
      }
      //跟蹤region的統計信息
      stats = ServerStatisticTracker.create(conf);
      //hbase客戶端異步操作類
      this.asyncProcess = createAsyncProcess(this.conf);
      this.interceptor = (new RetryingCallerInterceptorFactory(conf)).build();
      this.rpcCallerFactory = RpcRetryingCallerFactory.instantiate(conf, interceptor, this.stats);
      this.backoffPolicy = ClientBackoffPolicyFactory.create(conf);
      if (conf.getBoolean(CLIENT_SIDE_METRICS_ENABLED_KEY, false)) {
        this.metrics = new MetricsConnection(this);
      } else {
        this.metrics = null;
      }
      
      this.hostnamesCanChange = conf.getBoolean(RESOLVE_HOSTNAME_ON_FAIL_KEY, true);
      this.metaCache = new MetaCache(this.metrics);
    }

上面代碼比較重要的一點是，儘管客戶端傳入了Configuration，但是HConnectionImplementation不會直接使用客戶端傳入的Configuration，而是基於客戶端傳入的Configuration構建了自己的Configuration對象，原因是客戶端傳入的Configuration對象只給了部分值，很多其它值都未給出，那麼HConnectionImplementation就有必要創建自己的Configuration，首先構建自己默認的Configuration，然後把客戶端已經設置的Configuration的相關值覆蓋那些默認值，客戶端沒設置的值就使用默認值，我們繼續看下this.connectionConfig = new ConnectionConfiguration(conf)的源代碼：

ConnectionConfiguration(Configuration conf) {
    //客戶端的Configuration沒有配置hbase.client.pause，那麼就設置默認值this.writeBufferSize=2097152
    this.writeBufferSize = conf.getLong(WRITE_BUFFER_SIZE_KEY, WRITE_BUFFER_SIZE_DEFAULT);
    
    //客戶端的Configuration沒有配置hbase.client.write.buffer，那麼就設置默認值this.metaOperationTimeout=1200000
    this.metaOperationTimeout = conf.getInt(HConstants.HBASE_CLIENT_META_OPERATION_TIMEOUT, HConstants.DEFAULT_HBASE_CLIENT_OPERATION_TIMEOUT);

    //客戶端的Configuration沒有配置hbase.client.meta.operation.timeout，那麼就設置默認值this.operationTimeout=1200000
    this.operationTimeout = conf.getInt(HConstants.HBASE_CLIENT_OPERATION_TIMEOUT, HConstants.DEFAULT_HBASE_CLIENT_OPERATION_TIMEOUT);

    //客戶端的Configuration沒有配置hbase.client.operation.timeout，那麼就設置默認值this.scannerCaching=Integer.MAX_VALUE
    this.scannerCaching = conf.getInt(HConstants.HBASE_CLIENT_SCANNER_CACHING, HConstants.DEFAULT_HBASE_CLIENT_SCANNER_CACHING);

    //客戶端的Configuration沒有配置hbase.client.scanner.max.result.size，那麼就設置默認值this.scannerMaxResultSize=2 * 1024 * 1024
    this.scannerMaxResultSize = conf.getLong(HConstants.HBASE_CLIENT_SCANNER_MAX_RESULT_SIZE_KEY, HConstants.DEFAULT_HBASE_CLIENT_SCANNER_MAX_RESULT_SIZE);

    //客戶端的Configuration沒有配置hbase.client.primaryCallTimeout.get，那麼就設置默認值this.primaryCallTimeoutMicroSecond=10000
    this.primaryCallTimeoutMicroSecond = conf.getInt("hbase.client.primaryCallTimeout.get", 10000); // 10000ms

    //客戶端的Configuration沒有配置hbase.client.replicaCallTimeout.scan，那麼就設置默認值this.replicaCallTimeoutMicroSecondScan=1000000
    this.replicaCallTimeoutMicroSecondScan = conf.getInt("hbase.client.replicaCallTimeout.scan", 1000000); // 1000000ms

    //客戶端的Configuration沒有配置hbase.client.retries.number，那麼就設置默認值this.retries=31
    this.retries = conf.getInt(HConstants.HBASE_CLIENT_RETRIES_NUMBER, HConstants.DEFAULT_HBASE_CLIENT_RETRIES_NUMBER);

    //客戶端的Configuration沒有配置hbase.client.keyvalue.maxsize，那麼就設置默認值this.maxKeyValueSize=-1
    this.maxKeyValueSize = conf.getInt(MAX_KEYVALUE_SIZE_KEY, MAX_KEYVALUE_SIZE_DEFAULT);
  }

上面的代碼主要是初始化HConnectionImplementation自己的Configuration類型屬性this.connectionConfig，默認客戶端不設置屬性值，這裏創建的this.connectionConfig就使用默認值，這裏將hbase客戶端默認值抽取如下：

hbase.client.write.buffer 默認2097152Byte，也即2MB
hbase.client.meta.operation.timeout 默認1200000毫秒
hbase.client.operation.timeout 默認1200000毫秒
hbase.client.scanner.caching 默認Integer.MAX_VALUE
hbase.client.scanner.max.result.size 默認2MB
hbase.client.primaryCallTimeout.get 默認10000毫秒
hbase.client.replicaCallTimeout.scan 默認1000000毫秒
hbase.client.retries.number 默認31次
hbase.client.keyvalue.maxsize 默認-1，不限制
hbase.client.ipc.pool.type
hbase.client.ipc.pool.size
hbase.client.pause 100
hbase.client.max.total.tasks 100
hbase.client.max.perserver.tasks 2
hbase.client.max.perregion.tasks 1
hbase.client.instance.id
hbase.client.scanner.timeout.period 60000
hbase.client.rpc.codec
hbase.regionserver.lease.period 被hbase.client.scanner.timeout.period代替，60000
hbase.client.fast.fail.mode.enabled FALSE
hbase.client.fastfail.threshold 60000
hbase.client.fast.fail.cleanup.duration 600000
hbase.client.fast.fail.interceptor.impl
hbase.client.backpressure.enabled false

2.2 與zookeeper交互的ZooKeeperRegistry

上面我們分析知道客戶端使用者傳入的Configuration只有設置的值纔會在客戶端上生效，而未設置的值則交由默認值設置，另外一個非常重要的就是剛纔所提到的與zookeeper交互的類org.apache.hadoop.hbase.client.ZooKeeperRegistry

package org.apache.hadoop.hbase.client;

import java.io.IOException;
import java.io.InterruptedIOException;
import java.util.List;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.hadoop.hbase.HRegionInfo;
import org.apache.hadoop.hbase.HRegionLocation;
import org.apache.hadoop.hbase.RegionLocations;
import org.apache.hadoop.hbase.ServerName;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.zookeeper.MetaTableLocator;
import org.apache.hadoop.hbase.zookeeper.ZKClusterId;
import org.apache.hadoop.hbase.zookeeper.ZKTableStateClientSideReader;
import org.apache.hadoop.hbase.zookeeper.ZKUtil;
import org.apache.zookeeper.KeeperException;

/**
 * A cluster registry that stores to zookeeper.
 */
class ZooKeeperRegistry implements Registry {
  private static final Log LOG = LogFactory.getLog(ZooKeeperRegistry.class);
  // hbase連接，在初始化函數中會進行設置
  ConnectionManager.HConnectionImplementation hci;

  @Override
  public void init(Connection connection) {
    if (!(connection instanceof ConnectionManager.HConnectionImplementation)) {
      throw new RuntimeException("This registry depends on HConnectionImplementation");
    }
    //設置hbase連接
    this.hci = (ConnectionManager.HConnectionImplementation)connection;
  }

  @Override
  public RegionLocations getMetaRegionLocation() throws IOException {
  	//通過hbase連接中的Configuration獲取zookeeper地址後，通過hbase連接獲取與zookeeper交互的ZooKeeperKeepAliveConnection
    ZooKeeperKeepAliveConnection zkw = hci.getKeepAliveZooKeeperWatcher();

    try {
      if (LOG.isTraceEnabled()) {
        LOG.trace("Looking up meta region location in ZK," + " connection=" + this);
      }
      //從zookeeper中獲取所有的hbase region元數據信息
      List<ServerName> servers = new MetaTableLocator().blockUntilAvailable(zkw, hci.rpcTimeout, hci.getConfiguration());
      if (LOG.isTraceEnabled()) {
        if (servers == null) {
          LOG.trace("Looked up meta region location, connection=" + this + "; servers = null");
        } else {
          StringBuilder str = new StringBuilder();
          for (ServerName s : servers) {
            str.append(s.toString());
            str.append(" ");
          }
          LOG.trace("Looked up meta region location, connection=" + this + "; servers = " + str.toString());
        }
      }
      if (servers == null) return null;
      
      //組裝hbase RegionLocations數組進行返回
      HRegionLocation[] locs = new HRegionLocation[servers.size()];
      int i = 0;
      for (ServerName server : servers) {
        HRegionInfo h = RegionReplicaUtil.getRegionInfoForReplica(HRegionInfo.FIRST_META_REGIONINFO, i);
        if (server == null) locs[i++] = null;
        else locs[i++] = new HRegionLocation(h, server, 0);
      }
      return new RegionLocations(locs);
    } catch (InterruptedException e) {
      Thread.currentThread().interrupt();
      return null;
    } finally {
      zkw.close();
    }
  }

  private String clusterId = null;

  @Override
  public String getClusterId() {
    if (this.clusterId != null) return this.clusterId;
    // No synchronized here, worse case we will retrieve it twice, that's
    //  not an issue.
    ZooKeeperKeepAliveConnection zkw = null;
    try {
      zkw = hci.getKeepAliveZooKeeperWatcher();
      this.clusterId = ZKClusterId.readClusterIdZNode(zkw);
      if (this.clusterId == null) {
        LOG.info("ClusterId read in ZooKeeper is null");
      }
    } catch (KeeperException e) {
      LOG.warn("Can't retrieve clusterId from Zookeeper", e);
    } catch (IOException e) {
      LOG.warn("Can't retrieve clusterId from Zookeeper", e);
    } finally {
      if (zkw != null) zkw.close();
    }
    return this.clusterId;
  }

  @Override
  public boolean isTableOnlineState(TableName tableName, boolean enabled)
  throws IOException {
    ZooKeeperKeepAliveConnection zkw = hci.getKeepAliveZooKeeperWatcher();
    try {
      if (enabled) {
        return ZKTableStateClientSideReader.isEnabledTable(zkw, tableName);
      }
      return ZKTableStateClientSideReader.isDisabledTable(zkw, tableName);
    } catch (KeeperException e) {
      throw new IOException("Enable/Disable failed", e);
    } catch (InterruptedException e) {
      throw new InterruptedIOException();
    } finally {
       zkw.close();
    }
  }

  @Override
  public int getCurrentNrHRS() throws IOException {
    ZooKeeperKeepAliveConnection zkw = hci.getKeepAliveZooKeeperWatcher();
    try {
      // We go to zk rather than to master to get count of regions to avoid
      // HTable having a Master dependency.  See HBase-2828
      return ZKUtil.getNumberOfChildren(zkw, zkw.rsZNode);
    } catch (KeeperException ke) {
      throw new IOException("Unexpected ZooKeeper exception", ke);
    } finally {
        zkw.close();
    }
  }
}

這個類非常重要，因爲所有的與zookeeper的交互都由它來完成。

2.3 HConnectionImplementation.getTable(TableName tableName)

前面我們說過，推薦的使用hbase客戶端的方式如下:

Connection connection = ConnectionFactory.createConnection(configuration);  
Table table = connection.getTable(TableName.valueOf("tableName"));

上面2.1中已經知悉默認connection實現是HConnectionImplementation，那麼這裏我們繼續跟蹤HConnectionImplementation.getTable(TableName tableName)方法，代碼如下：

    public HTableInterface getTable(TableName tableName) throws IOException {
      return getTable(tableName, getBatchPool());
    }

繼續看HConnectionImplementation.getTable(TableName tableName, ExecutorService pool)的代碼：

    public HTableInterface getTable(TableName tableName, ExecutorService pool) throws IOException {
      //默認managed=false
      if (managed) {
        throw new NeedUnmanagedConnectionException();
      }
      return new HTable(tableName, this, connectionConfig, rpcCallerFactory, rpcControllerFactory, pool);
    }

繼續看HTable的構造方法HTable(TableName tableName, final ClusterConnection connection, final ConnectionConfiguration tableConfig, final RpcRetryingCallerFactory rpcCallerFactory, final RpcControllerFactory rpcControllerFactory, final ExecutorService pool)，代碼如下：

public HTable(TableName tableName, final ClusterConnection connection, final ConnectionConfiguration tableConfig, final RpcRetryingCallerFactory rpcCallerFactory, final RpcControllerFactory rpcControllerFactory, final ExecutorService pool) throws IOException {
    if (connection == null || connection.isClosed()) {
      throw new IllegalArgumentException("Connection is null or closed.");
    }
    //設置hbase數據表名
    this.tableName = tableName;
    //調用close方法時，默認不關閉連接，這一點非常重要，默認調用table.close()是不會關閉之前創建的connection的，這一點在後面的table.close()裏會介紹
    this.cleanupConnectionOnClose = false;
    //設置this.connection值爲HConnectionImplementation創建的connection實現類
    this.connection = connection;
    //從HConnectionImplementation獲取客戶端傳入的configuration對象
    this.configuration = connection.getConfiguration();
    //從HConnectionImplementation獲取HConnectionImplementation基於客戶端傳入的configuration創建的configuration對象
    this.connConfiguration = tableConfig;
    //從HConnectionImplementation獲取pool,HConnectionImplementation的默認pool爲this.batchPool = getThreadPool(conf.getInt("hbase.hconnection.threads.max", 256)
    this.pool = pool;
    if (pool == null) {
      this.pool = getDefaultExecutor(this.configuration);
      this.cleanupPoolOnClose = true;
    } else {
      //在HConnectionImplementation中已經初始化了this.batchPool = getThreadPool(conf.getInt("hbase.hconnection.threads.max", 256)，所以這裏會設置cleanupPoolOnClose，默認也不會關閉線程池
      this.cleanupPoolOnClose = false;
    }

    this.rpcCallerFactory = rpcCallerFactory;
    this.rpcControllerFactory = rpcControllerFactory;

    //這個方法我們後面重點關注，其根據客戶端傳入的Configuration初始化HTable的參數
    this.finishSetup();
  }

上面的代碼我已經加了註釋，需要注意的是cleanupConnectionOnClose屬性，該屬性默認值爲false，在調用table.close()方法時候，只是關閉了table而已但table後面的connection是沒有關閉的，再者是屬性cleanupPoolOnClose，雖然我們沒有傳入線程池，但是HConnectionImplementation會自己創建線程池this.batchPool = getThreadPool(conf.getInt("hbase.hconnection.threads.max", 256)傳過來使用，所以這裏會設置this.cleanupPoolOnClose = false，默認在table.close()調用時候，也不會關閉線程池，那麼這裏這裏繼續跟蹤上面代碼最後的this.finishSetup()，代碼如下：

private void finishSetup() throws IOException {
    //HTable的屬性connConfiguration若爲空，就基於客戶端傳入的Configuration構建新的connConfiguration
    if (connConfiguration == null) {
      connConfiguration = new ConnectionConfiguration(configuration);
    }

    //HTable的屬性設置
    this.operationTimeout = tableName.isSystemTable() ? connConfiguration.getMetaOperationTimeout() : connConfiguration.getOperationTimeout();
    this.scannerCaching = connConfiguration.getScannerCaching();
    this.scannerMaxResultSize = connConfiguration.getScannerMaxResultSize();
    if (this.rpcCallerFactory == null) {
      this.rpcCallerFactory = connection.getNewRpcRetryingCallerFactory(configuration);
    }
    if (this.rpcControllerFactory == null) {
      this.rpcControllerFactory = RpcControllerFactory.instantiate(configuration);
    }

    // puts need to track errors globally due to how the APIs currently work.
    //hbase的異步操作類
    multiAp = this.connection.getAsyncProcess();

    this.closed = false;
    //hbase的region操作工具類
    this.locator = new HRegionLocator(tableName, connection);
  }

經過上面的分析，我們有必要看下table.close()的源代碼：

public void close() throws IOException {
    //如果已經關閉了，直接返回
    if (this.closed) {
      return;
    }
    //關閉前做最後一次提交
    flushCommits();
    //默認在構造HTable時候，cleanupPoolOnClose=false，這裏不會去關閉線程池
    if (cleanupPoolOnClose) {
      this.pool.shutdown();
      try {
        boolean terminated = false;
        do {
          // wait until the pool has terminated
          terminated = this.pool.awaitTermination(60, TimeUnit.SECONDS);
        } while (!terminated);
      } catch (InterruptedException e) {
        this.pool.shutdownNow();
        LOG.warn("waitForTermination interrupted");
      }
    }
    //默認在構造HTable時候，cleanupConnectionOnClose=false，這裏不會去關閉table持有的connection
    if (cleanupConnectionOnClose) {
      if (this.connection != null) {
        this.connection.close();
      }
    }
    this.closed = true;
  }

2.4 HTable.put(final List<Put> puts)

我們已經通過如下代碼：

Connection connection = ConnectionFactory.createConnection(configuration);  
Table table = connection.getTable(TableName.valueOf("tableName"));

創建了connection，其默認實現類爲org.apache.hadoop.hbase.client.ConnectionManager.HConnectionImplementation，然後創建了table，其默認實現類爲org.apache.hadoop.hbase.client.HTable，那麼接下來就是分析客戶端的批量提交方法：HTable.put(final List<Put> puts),代碼如下：

  public void put(final List<Put> puts) throws IOException {
    //根據設置的緩存大小，達到緩存相關值就進行批量提交
    getBufferedMutator().mutate(puts);
    //不管有無數據未提交，默認autoFlush=true，那麼就最後提交一次
    if (autoFlush) {
      flushCommits();
    }
  }

這裏先看下HTable.getBufferedMutator()源代碼：

  BufferedMutator getBufferedMutator() throws IOException {
    if (mutator == null) {
      //從HConnectionImplementation獲取pool,HConnectionImplementation的默認pool爲this.batchPool = getThreadPool(conf.getInt("hbase.hconnection.threads.max", 256)
      //根據hbase.client.write.buffer設置的值，默認2MB，構造緩衝區
      this.mutator = (BufferedMutatorImpl) connection.getBufferedMutator(
          new BufferedMutatorParams(tableName)
              .pool(pool)
              .writeBufferSize(connConfiguration.getWriteBufferSize())
              .maxKeyValueSize(connConfiguration.getMaxKeyValueSize())
      );
    }
    return mutator;
  }

上面的代碼默認構造了一個BufferedMutatorImpl類並返回，繼續跟蹤BufferedMutatorImpl的方法mutate(List<? extends Mutation> ms)

public void mutate(List<? extends Mutation> ms) throws InterruptedIOException, RetriesExhaustedWithDetailsException {
    //如果BufferedMutatorImpl已經關閉，直接退出返回
    if (closed) {
      throw new IllegalStateException("Cannot put when the BufferedMutator is closed.");
    }

    //這裏先不斷循環累計提交的List<Put>記錄所佔的空間，放置到toAddSize
    long toAddSize = 0;
    for (Mutation m : ms) {
      if (m instanceof Put) {
        validatePut((Put) m);
      }
      toAddSize += m.heapSize();
    }

    // This behavior is highly non-intuitive... it does not protect us against
    // 94-incompatible behavior, which is a timing issue because hasError, the below code
    // and setter of hasError are not synchronized. Perhaps it should be removed.
    if (ap.hasError()) {
      //設置BufferedMutatorImpl當前記錄的提交記錄所佔空間值爲toAddSize
      currentWriteBufferSize.addAndGet(toAddSize);
      //把提交的記錄List<Put>放置到緩存對象writeAsyncBuffer，在爲提交完成前先不進行清理
      writeAsyncBuffer.addAll(ms);
      //這裏當捕獲到異常時候，再進行異常前的一次數據提交
      backgroundFlushCommits(true);
    } else {
      //設置BufferedMutatorImpl當前記錄的提交記錄所佔空間值爲toAddSize
      currentWriteBufferSize.addAndGet(toAddSize);
      //把提交的記錄List<Put>放置到緩存對象writeAsyncBuffer，在爲提交完成前先不進行清理
      writeAsyncBuffer.addAll(ms);
    }

    // Now try and queue what needs to be queued.
    // 如果當前提交的List<Put>記錄所佔空間大於hbase.client.write.buffer設置的值，默認2MB，那麼就馬上調用backgroundFlushCommits方法
    // 如果小於hbase.client.write.buffer設置的值，那麼就直接退出，啥也不做
    while (currentWriteBufferSize.get() > writeBufferSize) {
      backgroundFlushCommits(false);
    }
  }

上面的代碼不斷循環累計提交的List<Put>記錄所佔的空間，如果所佔空間大於hbase.client.write.buffer設置的值，那麼就馬上調用backgroundFlushCommits(false)方法，否則啥也不做，如果出錯就馬上調用一次backgroundFlushCommits(true)，所以我們很有必要繼續跟蹤BufferedMutatorImpl.backgroundFlushCommits(boolean synchronous)代碼：

private void backgroundFlushCommits(boolean synchronous) throws InterruptedIOException, RetriesExhaustedWithDetailsException {
    LinkedList<Mutation> buffer = new LinkedList<>();
    // Keep track of the size so that this thread doesn't spin forever
    long dequeuedSize = 0;

    try {
      //分析所有提交的List<Put>,Put是Mutation的實現
      Mutation m;
      //如果(hbase.client.write.buffer <= 0 || 0 < (whbase.client.write.buffer * 2) || synchronous)&& writeAsyncBuffer裏仍然有Mutation對象
      //那麼就不斷計算所佔空間大小dequeuedSize
      //currentWriteBufferSize的大小則遞減
      while ((writeBufferSize <= 0 || dequeuedSize < (writeBufferSize * 2) || synchronous) && (m = writeAsyncBuffer.poll()) != null) {
        buffer.add(m);
        long size = m.heapSize();
        dequeuedSize += size;
        currentWriteBufferSize.addAndGet(-size);
      }

      //backgroundFlushCommits(false)時候，當List<Put>，這裏不會進入
      if (!synchronous && dequeuedSize == 0) {
        return;
      }

      //backgroundFlushCommits(false)時候，這裏會進入,並且不會等待結果返回
      if (!synchronous) {
        //不會等待結果返回
        ap.submit(tableName, buffer, true, null, false);
        if (ap.hasError()) {
          LOG.debug(tableName + ": One or more of the operations have failed -"
              + " waiting for all operation in progress to finish (successfully or not)");
        }
      }
      //backgroundFlushCommits(true)時候，這裏會進入,並且會等待結果返回
      if (synchronous || ap.hasError()) {
        while (!buffer.isEmpty()) {
          ap.submit(tableName, buffer, true, null, false);
        }
        //會等待結果返回
        RetriesExhaustedWithDetailsException error = ap.waitForAllPreviousOpsAndReset(null);
        if (error != null) {
          if (listener == null) {
            throw error;
          } else {
            this.listener.onException(error, this);
          }
        }
      }
    } finally {
      //如果還有數據，那麼給到外面最後提交
      for (Mutation mut : buffer) {
        long size = mut.heapSize();
        currentWriteBufferSize.addAndGet(size);
        dequeuedSize -= size;
        writeAsyncBuffer.add(mut);
      }
    }
  }

這裏會調用ap.submit(tableName, buffer, true, null, false)直接提交，並且不會等待返回結果，而ap.submit(tableName, buffer, true, null, false)會調用AsyncProcess.submit(ExecutorService pool, TableName tableName,List<? extends Row> rows, boolean atLeastOne, Batch.Callback<CResult> callback,boolean needResults)，這裏源代碼如下：

  public <CResult> AsyncRequestFuture submit(TableName tableName, List<? extends Row> rows,
      boolean atLeastOne, Batch.Callback<CResult> callback, boolean needResults)
      throws InterruptedIOException {
    return submit(null, tableName, rows, atLeastOne, callback, needResults);
  }

public <CResult> AsyncRequestFuture submit(ExecutorService pool, TableName tableName, List<? extends Row> rows, boolean atLeastOne, Batch.Callback<CResult> callback, boolean needResults) throws InterruptedIOException {
    //如果提交的記錄數爲0，就直接返回NO_REQS_RESULT
    if (rows.isEmpty()) {
      return NO_REQS_RESULT;
    }

    Map<ServerName, MultiAction<Row>> actionsByServer = new HashMap<ServerName, MultiAction<Row>>();
    //依據提交的List<Put>的記錄數構建retainedActions
    List<Action<Row>> retainedActions = new ArrayList<Action<Row>>(rows.size());

    NonceGenerator ng = this.connection.getNonceGenerator();
    long nonceGroup = ng.getNonceGroup(); // Currently, nonce group is per entire client.

    // Location errors that happen before we decide what requests to take.
    List<Exception> locationErrors = null;
    List<Integer> locationErrorRows = null;
    //只要retainedActions不爲空，那麼就一直執行
    do {
      // Wait until there is at least one slot for a new task.
      // 默認maxTotalConcurrentTasks=100，即最多100個異步線程用於處理元數據獲取任務，如果超過100，就等待
      waitForMaximumCurrentTasks(maxTotalConcurrentTasks - 1);

      // Remember the previous decisions about regions or region servers we put in the
      //  final multi.
      // 記錄本次提交的List<Put>對應的region和regionserver
      Map<HRegionInfo, Boolean> regionIncluded = new HashMap<HRegionInfo, Boolean>();
      Map<ServerName, Boolean> serverIncluded = new HashMap<ServerName, Boolean>();

      int posInList = -1;
      Iterator<? extends Row> it = rows.iterator();
      while (it.hasNext()) {
        //這裏默認傳入一個Put對象，因爲Put是Row的繼承類
        Row r = it.next();
        //建立變量loc用來存儲Put對象對應的region對應的元數據信息
        HRegionLocation loc;
        try {
          if (r == null) {
            throw new IllegalArgumentException("#" + id + ", row cannot be null");
          }
          // Make sure we get 0-s replica.
          //取得Put對象對應的region元數據信息的所有備份信息，第一次調用時候會緩存中是沒有元數據信息的，那麼就會去鏈接zookeeper上查找，找到後就加入到緩存，下一次直接從緩存中獲取
          RegionLocations locs = connection.locateRegion(
              tableName, r.getRow(), true, true, RegionReplicaUtil.DEFAULT_REPLICA_ID);
          if (locs == null || locs.isEmpty() || locs.getDefaultRegionLocation() == null) {
            throw new IOException("#" + id + ", no location found, aborting submit for"
                + " tableName=" + tableName + " rowkey=" + Bytes.toStringBinary(r.getRow()));
          }
          //取得Put對象對應的region元數據信息的所有備份信息數組中的第一個
          loc = locs.getDefaultRegionLocation();
        } catch (IOException ex) {
          locationErrors = new ArrayList<Exception>();
          locationErrorRows = new ArrayList<Integer>();
          LOG.error("Failed to get region location ", ex);
          // This action failed before creating ars. Retain it, but do not add to submit list.
          // We will then add it to ars in an already-failed state.
          retainedActions.add(new Action<Row>(r, ++posInList));
          locationErrors.add(ex);
          locationErrorRows.add(posInList);
          it.remove();
          break; // Backward compat: we stop considering actions on location error.
        }

        //這裏判斷是否可以操作，因爲最多也就100個異步線程獲取元數據信息，如果都忙就等待
        if (canTakeOperation(loc, regionIncluded, serverIncluded)) {
          Action<Row> action = new Action<Row>(r, ++posInList);
          setNonce(ng, r, action);//
          retainedActions.add(action);
          // TODO: replica-get is not supported on this path
          byte[] regionName = loc.getRegionInfo().getRegionName();
          //把同一個區的提交任務進行收集，這裏先只獲知元數據信息，用於知道數據需要提交到哪個region和regionserver，最後循環外再做提交
          addAction(loc.getServerName(), regionName, action, actionsByServer, nonceGroup);
          it.remove();
        }
      }
    } while (retainedActions.isEmpty() && atLeastOne && (locationErrors == null));

    if (retainedActions.isEmpty()) return NO_REQS_RESULT;

    // 這裏已經知道數據該提交到哪個region和regionserver，就進行批量提交
    return submitMultiActions(tableName, retainedActions, nonceGroup, callback, null, needResults, locationErrors, locationErrorRows, actionsByServer, pool);
  }

上面代碼會去尋找提交的List<Put>的每個Put對象對應的region是哪個，對應的regionserver是哪個，然後進行批量提交，這裏要提到另外一個值hbase.client.max.total.tasks(默認值100，意思爲客戶端最大處理線程數)，如果去請求Put對象對應的region是哪個和對應的regionserver是哪個的操作大於100，那麼就要等待，我們回到最初的客戶端批量提交代碼：

  public void put(final List<Put> puts) throws IOException {
    //根據設置的緩存大小，達到緩存相關值就進行批量提交
    getBufferedMutator().mutate(puts);
    //不管有無數據未提交，默認autoFlush=true，那麼就最後提交一次
    if (autoFlush) {
      flushCommits();
    }
  }

上面的分析可知，如果客戶端提交的List<Put>所佔空間滿足不同條件會進行不同處理，總結如下：

List<Put>所佔空間<hbase.client.write.buffer:getBufferedMutator().mutate(puts)會直接退出，直接執行flushCommits()
hbase.client.write.buffer<List<Put>所佔空間<2*hbase.client.write.buffer:getBufferedMutator().mutate(puts)裏面會執行backgroundFlushCommits(false)，處理完後執行flushCommits()
2*hbase.client.write.buffer<List<Put>所佔空間:getBufferedMutator().mutate(puts)裏面會執行backgroundFlushCommits(false),多餘的未提交數據會保留，然後執行flushCommits()

緊接着，如果HTable的屬性autoFlush（默認爲true），那麼不管剩下的數據多少，也會進行最後一次提交數據到hbase服務端，這時候flushCommits()裏調用的是getBufferedMutator().flush()，而getBufferedMutator().flush()調用的是BufferedMutatorImpl.backgroundFlushCommits(true)，最後調用上面的ap.submit(tableName, buffer, true, null, false)並且會調用ap.waitForAllPreviousOpsAndReset(null)等待返回結果，至此hbase客戶端批量提交的源代碼分析完畢。

2.5.HConnectionImplementation.locateRegionInMeta

上面的代碼HTable.put(final List<Put> puts)分析中我們需要關注另一個重要的信息，就是org.apache.hadoop.hbase.client.AsyncProcess的方法public <CResult> AsyncRequestFuture submit(TableName tableName, List<? extends Row> rows, boolean atLeastOne, Batch.Callback<CResult> callback, boolean needResults)，在這個方法裏有這麼一段代碼：

          // 獲取我們的數據表的region信息
          RegionLocations locs = connection.locateRegion(tableName,r.getRow(), true, true, RegionReplicaUtil.DEFAULT_REPLICA_ID);

實質是調用了org.apache.hadoop.hbase.client.ConnectionManager.HConnectionImplementation的方法public RegionLocations locateRegion(final TableName tableName, final byte [] row, boolean useCache, boolean retry, int replicaId)，這個方法加載了我們的hbase數據表的region信息，代碼解釋如下：

public RegionLocations locateRegion(final TableName tableName, final byte [] row, boolean useCache, boolean retry, int replicaId) throws IOException {
      //如果當前連接已經關閉，拋出異常
      if (this.closed) throw new IOException(toString() + " closed");
      //如果客戶端傳入hbase數據表爲空，拋出異常
      if (tableName== null || tableName.getName().length == 0) {
        throw new IllegalArgumentException("table name cannot be null or zero length");
      }
      //TableName.META_TABLE_NAME=hbase:meta(冒號前hbase爲包名，meta爲表名)
      //我們傳入的是我們自己的hbase數據表名，而不是hbase:meta,所以這裏不會進入
      if (tableName.equals(TableName.META_TABLE_NAME)) {
        return locateMeta(tableName, useCache, replicaId);
      } else {
        // 這裏的代碼會進入
        // 這裏會去hbase的元數據信息表hbase:meta裏去按照我們所給的數據表名和rowkey尋找我們的hbase數據表的region信息
        return locateRegionInMeta(tableName, row, useCache, retry, replicaId);
      }
    }

我們繼續關注locateRegionInMeta(tableName, row, useCache, retry, replicaId)，代碼註釋如下：

    /*
      * 這裏會去hbase的元數據信息表hbase:meta裏去按照我們所給的數據表名和rowkey尋找我們的hbase數據表的region信息
      */
    private RegionLocations locateRegionInMeta(TableName tableName, byte[] row, boolean useCache, boolean retry, int replicaId) throws IOException {
      // 這裏傳入的useCache=true，所以會進入
      if (useCache) {
      //雖然進入了，但是第一次從緩存中找不到我們的數據表的相關信息
        RegionLocations locations = getCachedLocation(tableName, row);
        if (locations != null && locations.getRegionLocation(replicaId) != null) {
          return locations;
        }
      }

      //這裏去元數據表hbase:meta中找數據，所以需要構造rowkey
      // rowkey=tableName+我們傳入的rowkey+"99999999999999"+前面字符的md5HashBytes
      byte[] metaKey = HRegionInfo.createRegionName(tableName, row, HConstants.NINES, false);

      //這裏構造元數據表hbase:meta的查詢scan
      Scan s = new Scan();
      s.setReversed(true);
      s.setStartRow(metaKey);
      s.setSmall(true);
      s.setCaching(1);
      if (this.useMetaReplicas) {
        s.setConsistency(Consistency.TIMELINE);
      }

      //默認numTries=31次，無法從元數據表hbase:meta獲取信息，那麼就一直嘗試31次
      int localNumRetries = (retry ? numTries : 1);

      for (int tries = 0; true; tries++) {
        if (tries >= localNumRetries) {
          throw new NoServerForRegionException("Unable to find region for " + Bytes.toStringBinary(row) + " in " + tableName + " after " + localNumRetries + " tries.");
        }
        if (useCache) {//這裏雖然進入了，因爲useCache=true,但是我們第一次還是無法從緩存拿到數據
          RegionLocations locations = getCachedLocation(tableName, row);
          if (locations != null && locations.getRegionLocation(replicaId) != null) {
            return locations;
          }
        } else {
          // If we are not supposed to be using the cache, delete any existing cached location
          // so it won't interfere.
          metaCache.clearCache(tableName, row);
        }

        
        // 因爲緩存拿不到，那麼就從元數據表hbase:meta獲取region信息
        try {
          Result regionInfoRow = null;
          ReversedClientScanner rcs = null;
          try {
            //這裏很重要，告訴剛纔構造的scan用於表TableName.META_TABLE_NAME，而TableName.META_TABLE_NAME=hbase:meta
            rcs = new ClientSmallReversedScanner(conf, s, TableName.META_TABLE_NAME, this, rpcCallerFactory, rpcControllerFactory, getMetaLookupPool(), 0);
            //好了，這裏拿到了我們的數據表的regionInfoRow信息，regionInfoRow是元數據表hbase:meta中的一行數據
            regionInfoRow = rcs.next();
          } finally {
            if (rcs != null) {
              rcs.close();
            }
          }

          if (regionInfoRow == null) {
            throw new TableNotFoundException(tableName);
          }

          // 轉換數據表的regionInfoRow信息爲我們需要的HRegionLocation
          RegionLocations locations = MetaTableAccessor.getRegionLocations(regionInfoRow);
          if (locations == null || locations.getRegionLocation(replicaId) == null) {
            throw new IOException("HRegionInfo was null in " + tableName + ", row=" + regionInfoRow);
          }
          
          //我們拿到了我們的hbase數據表的HRegionLocation，但是此時再做個檢查，避免此時hbase宕機了或者已經split了或者拿錯了
          HRegionInfo regionInfo = locations.getRegionLocation(replicaId).getRegionInfo();
          if (regionInfo == null) {
            throw new IOException("HRegionInfo was null or empty in " + TableName.META_TABLE_NAME + ", row=" + regionInfoRow);
          }
          if (!regionInfo.getTable().equals(tableName)) {
            throw new TableNotFoundException( "Table '" + tableName + "' was not found, got: " + regionInfo.getTable() + ".");
          }
          if (regionInfo.isSplit()) {
            throw new RegionOfflineException("the only available region for" + " the required row is a split parent," + " the daughters should be online soon: " + regionInfo.getRegionNameAsString());
          }
          if (regionInfo.isOffline()) {
            throw new RegionOfflineException("the region is offline, could" + " be caused by a disable table call: " + regionInfo.getRegionNameAsString());
          }
          ServerName serverName = locations.getRegionLocation(replicaId).getServerName();
          if (serverName == null) {
            throw new NoServerForRegionException("No server address listed " + "in " + TableName.META_TABLE_NAME + " for region " + regionInfo.getRegionNameAsString() + " containing row " + Bytes.toStringBinary(row));
          }
          if (isDeadServer(serverName)){
            throw new RegionServerStoppedException("hbase:meta says the region "+ regionInfo.getRegionNameAsString()+" is managed by the server " + serverName + ", but it is dead.");
          }
          
          // 好了檢查無誤了，那麼爲了讓下一次不要這麼麻煩，先緩存起來，這樣拿的也快
          cacheLocation(tableName, locations);
          // 好了，該返回region信息了
          return locations;
        } catch (TableNotFoundException e) {
          // if we got this error, probably means the table just plain doesn't
          // exist. rethrow the error immediately. this should always be coming
          // from the HTable constructor.
          throw e;
        } catch (IOException e) {
          ExceptionUtil.rethrowIfInterrupt(e);

          if (e instanceof RemoteException) {
            e = ((RemoteException)e).unwrapRemoteException();
          }
          if (tries < localNumRetries - 1) {
            if (LOG.isDebugEnabled()) {
              LOG.debug("locateRegionInMeta parentTable=" + TableName.META_TABLE_NAME + ", metaLocation=" + ", attempt=" + tries + " of " + localNumRetries + " failed; retrying after sleep of " + ConnectionUtils.getPauseTime(this.pause, tries) + " because: " + e.getMessage());
            }
          } else {
            throw e;
          }
          // Only relocate the parent region if necessary
          if(!(e instanceof RegionOfflineException || e instanceof NoServerForRegionException)) {
            relocateRegion(TableName.META_TABLE_NAME, metaKey, replicaId);
          }
        }
        //沒找到，那麼沉睡一段時間然後重試次數未到31次，那麼繼續循環找吧，直到找到，如果次數大於31，那麼只有拋出異常
        try{
          Thread.sleep(ConnectionUtils.getPauseTime(this.pause, tries));
        } catch (InterruptedException e) {
          throw new InterruptedIOException("Giving up trying to location region in " + "meta: thread is interrupted.");
        }
      }
    }

上述代碼我們可以得知在首次org.apache.hadoop.hbase.client.ConnectionManager.HConnectionImplementation是如何加載我們需要的hbase數據表的信息的，我們看到hbase有個元數據表hbase:meta，這裏hbase是namespace而meta是表名，我們自己創建的數據表的元數據信息都存儲在這個元數據表hbase:meta中，第一次的時候會去元數據表hbase：meta中查找，找到後就加入緩存，第二次的時候直接從緩存獲取我們的數據表的region信息

3.從分析源碼中學到的對於hbase客戶端的優化知識

hbase客戶端裏傳入hbase.client.write.buffer(默認2MB)，加到客戶端提交的緩存大小；
hbase客戶端提交採用批量提交，批量提交的List<Put>的size計算公式=hbase.client.write.buffer*2/Put大小，Put大小可通過put.heapSize()獲取，以hbase.client.write.buffer=2097152，put.heapSize()=1320舉例，最佳的批量提交記錄大小=2*2097152/1320=3177;
hbase客戶端儘量採用多線程併發寫
hbase客戶端所在機器性能要好，不然速度上不去
能接受關閉WAL的話儘量關閉，速度也會相應提升

源碼解讀--(1)hbase客戶端源代碼

1.hbase客戶端使用

1.1 在maven工程中引入hbase客戶端jar

1.2 推薦的創建hbase客戶端代碼

1.3 被遺棄的hbase客戶端使用代碼

2.hbase客戶端源碼解讀

2.1 ConnectionFactory.createConnection(Configuration conf)

2.2 與zookeeper交互的ZooKeeperRegistry

2.3 HConnectionImplementation.getTable(TableName tableName)

2.4 HTable.put(final List<Put> puts)

2.5.HConnectionImplementation.locateRegionInMeta

3.從分析源碼中學到的對於hbase客戶端的優化知識

4.hbase性能調研寫入速度測試記錄

AI 畫圖真刺激，手把手教你如何用 ComfyUI 來畫出刺激的圖

公司剛入職了一名 Java 中級開發，短短 4 行代碼居然湊齊了 3 個 bug！我哭了~~

數據展示動態（跑分）顯示

公衆號5月C#/.NET熱文一覽

git 下載大陸鏡像地址

關於netty kafka hdfs hbase性能調研記錄

源碼解讀--(2)hbase-examples BufferedMutator Example

源碼解讀--(1)hbase客戶端源代碼

anaconda+pycharm (1) anaconda

源碼解讀--(3)hbase-examples MultiThreadedClientExample

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結