【追光者系列】HikariCP 源碼分析之故障檢測那些思考 fail fast & allowPoolSuspension


摘要: 原創出處 https://mp.weixin.qq.com/s/awLR5hZC_Bbv2znJT218bQ 「渣渣王子」歡迎轉載,保留摘要,謝謝!

  • 模擬數據庫掛掉
  • allowPoolSuspension
  • 參考資料

由於時間原因,本文主要內容參考了 https://segmentfault.com/a/1190000013136251 ,並結合一些思考做了增注

模擬數據庫掛掉

首先解釋一下connectionTimeout的意思,這並不是獲取連接的超時時間,而是從連接池返回連接的超時時間。 SQL執行的超時時間,JDBC 可以直接使用 Statement.setQueryTimeout,Spring 可以使用 @Transactional(timeout=10)。

connectionTimeout This property controls the maximum number of milliseconds that a client (that's you) will wait for a connection from the pool. If this time is exceeded without a connection becoming available, a SQLException will be thrown. Lowest acceptable connection timeout is 250 ms. Default: 30000 (30 seconds)

如果是沒有空閒連接且連接池滿不能新建連接的情況下,hikari則是阻塞connectionTimeout的時間,沒有得到連接拋出SQLTransientConnectionException。

如果是有空閒連接的情況,hikari是在connectionTimeout時間內不斷循環獲取下一個空閒連接進行校驗,校驗失敗繼續獲取下一個空閒連接,直到超時拋出SQLTransientConnectionException。(hikari在獲取一個連接的時候,會在connectionTimeout時間內循環把空閒連接挨個validate一次,最後timeout拋出異常;之後的獲取連接操作,則一直阻塞connectionTimeout時間再拋出異常)

如果微服務使用了連接的健康監測,如果你catch了此異常,就會不斷的打出健康監測的錯誤

hikari如果connectionTimeout設置太大的話,在數據庫掛的時候,很容易阻塞業務線程

根據以上結論我們擼一遍源碼,首先看一下getConnection的源碼,大致流程是如果borrow的poolEntry爲空,就會跳出循環,拋異常,包括超時時間也會打出來如下:

java.sql.SQLTransientConnectionException: communications-link-failure-db - Connection is not available, request timed out after 447794ms.
    at com.zaxxer.hikari.pool.HikariPool.createTimeoutException(HikariPool.java:666)
    at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:182)
    at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:147)
/**
    * Get a connection from the pool, or timeout after the specified number of milliseconds.
    *
    * @param hardTimeout the maximum time to wait for a connection from the pool
    * @return a java.sql.Connection instance
    * @throws SQLException thrown if a timeout occurs trying to obtain a connection
    */
   public Connection getConnection(final long hardTimeout) throws SQLException
   {
      suspendResumeLock.acquire();
      final long startTime = currentTime();
      try {
         long timeout = hardTimeout;
         do {
            PoolEntry poolEntry = connectionBag.borrow(timeout, MILLISECONDS);
            if (poolEntry == null) {
               break; // We timed out... break and throw exception
            }
            final long now = currentTime();
            if (poolEntry.isMarkedEvicted() || (elapsedMillis(poolEntry.lastAccessed, now) > ALIVE_BYPASS_WINDOW_MS && !isConnectionAlive(poolEntry.connection))) {
               closeConnection(poolEntry, poolEntry.isMarkedEvicted() ? EVICTED_CONNECTION_MESSAGE : DEAD_CONNECTION_MESSAGE);
               timeout = hardTimeout - elapsedMillis(startTime);
            }
            else {
               metricsTracker.recordBorrowStats(poolEntry, startTime);
               return poolEntry.createProxyConnection(leakTaskFactory.schedule(poolEntry), now);
            }
         } while (timeout > 0L);
         metricsTracker.recordBorrowTimeoutStats(startTime);
         throw createTimeoutException(startTime);
      }
      catch (InterruptedException e) {
         Thread.currentThread().interrupt();
         throw new SQLException(poolName + " - Interrupted during connection acquisition", e);
      }
      finally {
         suspendResumeLock.release();
      }
   }

我們聚焦一下borrow源碼,該方法的意思和其註釋所說的一樣,The method will borrow a BagEntry from the bag, blocking for the specified timeout if none are available. 那麼final T bagEntry = handoffQueue.poll(timeout, NANOSECONDS); 這段代碼就是在數據庫掛掉的情況下,會產生一段耗時的地方

   /**
    * The method will borrow a BagEntry from the bag, blocking for the
    * specified timeout if none are available.
    *
    * @param timeout how long to wait before giving up, in units of unit
    * @param timeUnit a <code>TimeUnit</code> determining how to interpret the timeout parameter
    * @return a borrowed instance from the bag or null if a timeout occurs
    * @throws InterruptedException if interrupted while waiting
    */
   public T borrow(long timeout, final TimeUnit timeUnit) throws InterruptedException
   {
      // Try the thread-local list first
      final List<Object> list = threadList.get();
      for (int i = list.size() - 1; i >= 0; i--) {
         final Object entry = list.remove(i);
         @SuppressWarnings("unchecked")
         final T bagEntry = weakThreadLocals ? ((WeakReference<T>) entry).get() : (T) entry;
         if (bagEntry != null && bagEntry.compareAndSet(STATE_NOT_IN_USE, STATE_IN_USE)) {
            return bagEntry;
         }
      }
      // Otherwise, scan the shared list ... then poll the handoff queue
      final int waiting = waiters.incrementAndGet();
      try {
         for (T bagEntry : sharedList) {
            if (bagEntry.compareAndSet(STATE_NOT_IN_USE, STATE_IN_USE)) {
               // If we may have stolen another waiter's connection, request another bag add.
               if (waiting > 1) {
                  listener.addBagItem(waiting - 1);
               }
               return bagEntry;
            }
         }
         listener.addBagItem(waiting);
         timeout = timeUnit.toNanos(timeout);
         do {
            final long start = currentTime();
            final T bagEntry = handoffQueue.poll(timeout, NANOSECONDS);
            if (bagEntry == null || bagEntry.compareAndSet(STATE_NOT_IN_USE, STATE_IN_USE)) {
               return bagEntry;
            }
            timeout -= elapsedNanos(start);
         } while (timeout > 10_000);
         return null;
      }
      finally {
         waiters.decrementAndGet();
      }
   }

這裏使用了JUC的SynchronousQueue

/**
     * Retrieves and removes the head of this queue, waiting
     * if necessary up to the specified wait time, for another thread
     * to insert it.
     *
     * @return the head of this queue, or {@code null} if the
     *         specified waiting time elapses before an element is present
     * @throws InterruptedException {@inheritDoc}
     */
    public E poll(long timeout, TimeUnit unit) throws InterruptedException {
        E e = transferer.transfer(null, true, unit.toNanos(timeout));
        if (e != null || !Thread.interrupted())
            return e;
        throw new InterruptedException();
    }

此時拿到空的poolEntry在getConnection中跳出循環,拋異常

HikariPool還有一個內部類叫PoolEntryCreator

 /**
    * Creating and adding poolEntries (connections) to the pool.
    */
   private final class PoolEntryCreator implements Callable<Boolean>
   {
      private final String loggingPrefix;
      PoolEntryCreator(String loggingPrefix)
      {
         this.loggingPrefix = loggingPrefix;
      }
      @Override
      public Boolean call() throws Exception
      {
         long sleepBackoff = 250L;
         while (poolState == POOL_NORMAL && shouldCreateAnotherConnection()) {
            final PoolEntry poolEntry = createPoolEntry();
            if (poolEntry != null) {
               connectionBag.add(poolEntry);
               LOGGER.debug("{} - Added connection {}", poolName, poolEntry.connection);
               if (loggingPrefix != null) {
                  logPoolState(loggingPrefix);
               }
               return Boolean.TRUE;
            }
            // failed to get connection from db, sleep and retry
            quietlySleep(sleepBackoff);
            sleepBackoff = Math.min(SECONDS.toMillis(10), Math.min(connectionTimeout, (long) (sleepBackoff * 1.5)));
         }
         // Pool is suspended or shutdown or at max size
         return Boolean.FALSE;
      }
      /**
       * We only create connections if we need another idle connection or have threads still waiting
       * for a new connection.  Otherwise we bail out of the request to create.
       *
       * @return true if we should create a connection, false if the need has disappeared
       */
      private boolean shouldCreateAnotherConnection() {
         return getTotalConnections() < config.getMaximumPoolSize() &&
            (connectionBag.getWaitingThreadCount() > 0 || getIdleConnections() < config.getMinimumIdle());
      }
   }

shouldCreateAnotherConnection方法決定了是否需要添加新的連接

HikariPool初始化的時候會初始化兩個PoolEntryCreator,分別是POOL_ENTRY_CREATOR和POST_FILL_POOL_ENTRY_CREATOR,是兩個異步線程

 private final PoolEntryCreator POOL_ENTRY_CREATOR = new PoolEntryCreator(null /*logging prefix*/);
   private final PoolEntryCreator POST_FILL_POOL_ENTRY_CREATOR = new PoolEntryCreator("After adding ");

POOL_ENTRY_CREATOR主要是會被private final ThreadPoolExecutor addConnectionExecutor;調用到,一處是fillPool,從當前的空閒連接(在執行時被感知到的)填充到minimumIdle(HikariCP嘗試在池中維護的最小空閒連接數,如果空閒連接低於此值並且池中的總連接數少於maximumPoolSize,HikariCP將盡最大努力快速高效地添加其他連接)。 補充新連接也會遭遇Connection refused相關的異常。

  /**
    * Fill pool up from current idle connections (as they are perceived at the point of execution) to minimumIdle connections.
    */
   private synchronized void fillPool()
   {
      final int connectionsToAdd = Math.min(config.getMaximumPoolSize() - getTotalConnections(), config.getMinimumIdle() - getIdleConnections())
                                   - addConnectionQueue.size();
      for (int i = 0; i < connectionsToAdd; i++) {
         addConnectionExecutor.submit((i < connectionsToAdd - 1) ? POOL_ENTRY_CREATOR : POST_FILL_POOL_ENTRY_CREATOR);
      }
   }

還有一處是addBagItem

/** {@inheritDoc} */
   @Override
   public void addBagItem(final int waiting)
   {
      final boolean shouldAdd = waiting - addConnectionQueue.size() >= 0; // Yes, >= is intentional.
      if (shouldAdd) {
         addConnectionExecutor.submit(POOL_ENTRY_CREATOR);
      }
   }

最後再補充兩個屬性idleTimeout和minimumIdle

idleTimeout This property controls the maximum amount of time that a connection is allowed to sit idle in the pool. This setting only applies when minimumIdle is defined to be less than maximumPoolSize. Idle connections will not be retired once the pool reaches minimumIdle connections. Whether a connection is retired as idle or not is subject to a maximum variation of +30 seconds, and average variation of +15 seconds. A connection will never be retired as idle before this timeout. A value of 0 means that idle connections are never removed from the pool. The minimum allowed value is 10000ms (10 seconds). Default: 600000 (10 minutes)

默認是600000毫秒,即10分鐘。如果idleTimeout+1秒>maxLifetime 且 maxLifetime>0,則會被重置爲0;如果idleTimeout!=0且小於10秒,則會被重置爲10秒。如果idleTimeout=0則表示空閒的連接在連接池中永遠不被移除。

只有當minimumIdle小於maximumPoolSize時,這個參數才生效,當空閒連接數超過minimumIdle,而且空閒時間超過idleTimeout,則會被移除。

minimumIdle This property controls the minimum number of idle connections that HikariCP tries to maintain in the pool. If the idle connections dip below this value and total connections in the pool are less than maximumPoolSize, HikariCP will make a best effort to add additional connections quickly and efficiently. However, for maximum performance and responsiveness to spike demands, we recommend not setting this value and instead allowing HikariCP to act as a fixed size connection pool. Default: same as maximumPoolSize

控制連接池空閒連接的最小數量,當連接池空閒連接少於minimumIdle,而且總共連接數不大於maximumPoolSize時,HikariCP會盡力補充新的連接。爲了性能考慮,不建議設置此值,而是讓HikariCP把連接池當做固定大小的處理,默認minimumIdle與maximumPoolSize一樣。 當 minIdle<0 或者 minIdle>maxPoolSize, 則被重置爲maxPoolSize,該值默認爲10。

Hikari會啓動一個HouseKeeper定時任務,在HikariPool構造器裏頭初始化,默認的是初始化後100毫秒執行,之後每執行完一次之後隔HOUSEKEEPING_PERIOD_MS(30秒)時間執行。 這個定時任務的作用就是根據idleTimeout的值,移除掉空閒超時的連接。 首先檢測時鐘是否倒退,如果倒退了則立即對過期的連接進行標記evict;之後當idleTimeout>0且配置的minimumIdle<maximumPoolSize時纔開始處理超時的空閒連接。 取出狀態是STATE_NOT_IN_USE的連接數,如果大於minimumIdle,則遍歷STATE_NOT_IN_USE的連接的連接,將空閒超時達到idleTimeout的連接從connectionBag移除掉,若移除成功則關閉該連接,然後toRemove--。 在空閒連接移除之後,再調用fillPool,嘗試補充空間連接數到minimumIdle值

hikari的連接泄露是每次getConnection的時候單獨觸發一個延時任務來處理,而空閒連接的清除則是使用HouseKeeper定時任務來處理,其運行間隔由com.zaxxer.hikari.housekeeping.periodMs環境變量控制,默認爲30秒。

allowPoolSuspension

關於這個參數,用來標記釋放允許暫停連接池,一旦被暫停,所有的getConnection方法都會被阻塞。

作者是這麼說的: https://github.com/brettwooldridge/HikariCP/issues/1060

All of the suspend use cases I have heard have centered around a pattern of:

  • Suspend the pool.
  • Alter the pool configuration, or alter DNS configuration (to point to a new master).
  • Soft-evict existing connections.
  • Resume the pool.

我做過試驗,Suspend期間getConnection確實不會超時,SQL執行都會被保留下來,軟驅除現有連接之後,一直保持到池恢復Resume時,這些SQL依然會繼續執行,也就是說用戶並不會丟數據。 但是在實際生產中,不影響業務很難,即使繼續執行,業務也可能超時了。 故障注入是中間件開發應該要做的,這個點的功能在實現chaosmonkey以模擬數據庫連接故障,但是監控過程中我發現hikaricp_pending_threads指標並沒有提升、MBean的threadAwaitingConnections也沒有改變,所以包括故障演練以後也可以不用搞得那麼複雜,收攏在中間件內部做可能更好,前提是對於這個參數,中間件還需要自研以增加模擬拋異常或是一些監控指標進行加強。 另外,長期阻塞該參數存在讓微服務卡死的風險

詳細推薦看一下 【追光者系列】HikariCP源碼分析之allowPoolSuspension

參考資料

https://segmentfault.com/u/codecraft/articles?page=4

666. 彩蛋


摘要: 原創出處 https://mp.weixin.qq.com/s/awLR5hZC_Bbv2znJT218bQ 「渣渣王子」歡迎轉載,保留摘要,謝謝!

  • 模擬數據庫掛掉
  • allowPoolSuspension
  • 參考資料

由於時間原因,本文主要內容參考了 https://segmentfault.com/a/1190000013136251 ,並結合一些思考做了增注

模擬數據庫掛掉

首先解釋一下connectionTimeout的意思,這並不是獲取連接的超時時間,而是從連接池返回連接的超時時間。 SQL執行的超時時間,JDBC 可以直接使用 Statement.setQueryTimeout,Spring 可以使用 @Transactional(timeout=10)。

connectionTimeout This property controls the maximum number of milliseconds that a client (that's you) will wait for a connection from the pool. If this time is exceeded without a connection becoming available, a SQLException will be thrown. Lowest acceptable connection timeout is 250 ms. Default: 30000 (30 seconds)

如果是沒有空閒連接且連接池滿不能新建連接的情況下,hikari則是阻塞connectionTimeout的時間,沒有得到連接拋出SQLTransientConnectionException。

如果是有空閒連接的情況,hikari是在connectionTimeout時間內不斷循環獲取下一個空閒連接進行校驗,校驗失敗繼續獲取下一個空閒連接,直到超時拋出SQLTransientConnectionException。(hikari在獲取一個連接的時候,會在connectionTimeout時間內循環把空閒連接挨個validate一次,最後timeout拋出異常;之後的獲取連接操作,則一直阻塞connectionTimeout時間再拋出異常)

如果微服務使用了連接的健康監測,如果你catch了此異常,就會不斷的打出健康監測的錯誤

hikari如果connectionTimeout設置太大的話,在數據庫掛的時候,很容易阻塞業務線程

根據以上結論我們擼一遍源碼,首先看一下getConnection的源碼,大致流程是如果borrow的poolEntry爲空,就會跳出循環,拋異常,包括超時時間也會打出來如下:

java.sql.SQLTransientConnectionException: communications-link-failure-db - Connection is not available, request timed out after 447794ms.
    at com.zaxxer.hikari.pool.HikariPool.createTimeoutException(HikariPool.java:666)
    at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:182)
    at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:147)
/**
    * Get a connection from the pool, or timeout after the specified number of milliseconds.
    *
    * @param hardTimeout the maximum time to wait for a connection from the pool
    * @return a java.sql.Connection instance
    * @throws SQLException thrown if a timeout occurs trying to obtain a connection
    */
   public Connection getConnection(final long hardTimeout) throws SQLException
   {
      suspendResumeLock.acquire();
      final long startTime = currentTime();
      try {
         long timeout = hardTimeout;
         do {
            PoolEntry poolEntry = connectionBag.borrow(timeout, MILLISECONDS);
            if (poolEntry == null) {
               break; // We timed out... break and throw exception
            }
            final long now = currentTime();
            if (poolEntry.isMarkedEvicted() || (elapsedMillis(poolEntry.lastAccessed, now) > ALIVE_BYPASS_WINDOW_MS && !isConnectionAlive(poolEntry.connection))) {
               closeConnection(poolEntry, poolEntry.isMarkedEvicted() ? EVICTED_CONNECTION_MESSAGE : DEAD_CONNECTION_MESSAGE);
               timeout = hardTimeout - elapsedMillis(startTime);
            }
            else {
               metricsTracker.recordBorrowStats(poolEntry, startTime);
               return poolEntry.createProxyConnection(leakTaskFactory.schedule(poolEntry), now);
            }
         } while (timeout > 0L);
         metricsTracker.recordBorrowTimeoutStats(startTime);
         throw createTimeoutException(startTime);
      }
      catch (InterruptedException e) {
         Thread.currentThread().interrupt();
         throw new SQLException(poolName + " - Interrupted during connection acquisition", e);
      }
      finally {
         suspendResumeLock.release();
      }
   }

我們聚焦一下borrow源碼,該方法的意思和其註釋所說的一樣,The method will borrow a BagEntry from the bag, blocking for the specified timeout if none are available. 那麼final T bagEntry = handoffQueue.poll(timeout, NANOSECONDS); 這段代碼就是在數據庫掛掉的情況下,會產生一段耗時的地方

   /**
    * The method will borrow a BagEntry from the bag, blocking for the
    * specified timeout if none are available.
    *
    * @param timeout how long to wait before giving up, in units of unit
    * @param timeUnit a <code>TimeUnit</code> determining how to interpret the timeout parameter
    * @return a borrowed instance from the bag or null if a timeout occurs
    * @throws InterruptedException if interrupted while waiting
    */
   public T borrow(long timeout, final TimeUnit timeUnit) throws InterruptedException
   {
      // Try the thread-local list first
      final List<Object> list = threadList.get();
      for (int i = list.size() - 1; i >= 0; i--) {
         final Object entry = list.remove(i);
         @SuppressWarnings("unchecked")
         final T bagEntry = weakThreadLocals ? ((WeakReference<T>) entry).get() : (T) entry;
         if (bagEntry != null && bagEntry.compareAndSet(STATE_NOT_IN_USE, STATE_IN_USE)) {
            return bagEntry;
         }
      }
      // Otherwise, scan the shared list ... then poll the handoff queue
      final int waiting = waiters.incrementAndGet();
      try {
         for (T bagEntry : sharedList) {
            if (bagEntry.compareAndSet(STATE_NOT_IN_USE, STATE_IN_USE)) {
               // If we may have stolen another waiter's connection, request another bag add.
               if (waiting > 1) {
                  listener.addBagItem(waiting - 1);
               }
               return bagEntry;
            }
         }
         listener.addBagItem(waiting);
         timeout = timeUnit.toNanos(timeout);
         do {
            final long start = currentTime();
            final T bagEntry = handoffQueue.poll(timeout, NANOSECONDS);
            if (bagEntry == null || bagEntry.compareAndSet(STATE_NOT_IN_USE, STATE_IN_USE)) {
               return bagEntry;
            }
            timeout -= elapsedNanos(start);
         } while (timeout > 10_000);
         return null;
      }
      finally {
         waiters.decrementAndGet();
      }
   }

這裏使用了JUC的SynchronousQueue

/**
     * Retrieves and removes the head of this queue, waiting
     * if necessary up to the specified wait time, for another thread
     * to insert it.
     *
     * @return the head of this queue, or {@code null} if the
     *         specified waiting time elapses before an element is present
     * @throws InterruptedException {@inheritDoc}
     */
    public E poll(long timeout, TimeUnit unit) throws InterruptedException {
        E e = transferer.transfer(null, true, unit.toNanos(timeout));
        if (e != null || !Thread.interrupted())
            return e;
        throw new InterruptedException();
    }

此時拿到空的poolEntry在getConnection中跳出循環,拋異常

HikariPool還有一個內部類叫PoolEntryCreator

 /**
    * Creating and adding poolEntries (connections) to the pool.
    */
   private final class PoolEntryCreator implements Callable<Boolean>
   {
      private final String loggingPrefix;
      PoolEntryCreator(String loggingPrefix)
      {
         this.loggingPrefix = loggingPrefix;
      }
      @Override
      public Boolean call() throws Exception
      {
         long sleepBackoff = 250L;
         while (poolState == POOL_NORMAL && shouldCreateAnotherConnection()) {
            final PoolEntry poolEntry = createPoolEntry();
            if (poolEntry != null) {
               connectionBag.add(poolEntry);
               LOGGER.debug("{} - Added connection {}", poolName, poolEntry.connection);
               if (loggingPrefix != null) {
                  logPoolState(loggingPrefix);
               }
               return Boolean.TRUE;
            }
            // failed to get connection from db, sleep and retry
            quietlySleep(sleepBackoff);
            sleepBackoff = Math.min(SECONDS.toMillis(10), Math.min(connectionTimeout, (long) (sleepBackoff * 1.5)));
         }
         // Pool is suspended or shutdown or at max size
         return Boolean.FALSE;
      }
      /**
       * We only create connections if we need another idle connection or have threads still waiting
       * for a new connection.  Otherwise we bail out of the request to create.
       *
       * @return true if we should create a connection, false if the need has disappeared
       */
      private boolean shouldCreateAnotherConnection() {
         return getTotalConnections() < config.getMaximumPoolSize() &&
            (connectionBag.getWaitingThreadCount() > 0 || getIdleConnections() < config.getMinimumIdle());
      }
   }

shouldCreateAnotherConnection方法決定了是否需要添加新的連接

HikariPool初始化的時候會初始化兩個PoolEntryCreator,分別是POOL_ENTRY_CREATOR和POST_FILL_POOL_ENTRY_CREATOR,是兩個異步線程

 private final PoolEntryCreator POOL_ENTRY_CREATOR = new PoolEntryCreator(null /*logging prefix*/);
   private final PoolEntryCreator POST_FILL_POOL_ENTRY_CREATOR = new PoolEntryCreator("After adding ");

POOL_ENTRY_CREATOR主要是會被private final ThreadPoolExecutor addConnectionExecutor;調用到,一處是fillPool,從當前的空閒連接(在執行時被感知到的)填充到minimumIdle(HikariCP嘗試在池中維護的最小空閒連接數,如果空閒連接低於此值並且池中的總連接數少於maximumPoolSize,HikariCP將盡最大努力快速高效地添加其他連接)。 補充新連接也會遭遇Connection refused相關的異常。

  /**
    * Fill pool up from current idle connections (as they are perceived at the point of execution) to minimumIdle connections.
    */
   private synchronized void fillPool()
   {
      final int connectionsToAdd = Math.min(config.getMaximumPoolSize() - getTotalConnections(), config.getMinimumIdle() - getIdleConnections())
                                   - addConnectionQueue.size();
      for (int i = 0; i < connectionsToAdd; i++) {
         addConnectionExecutor.submit((i < connectionsToAdd - 1) ? POOL_ENTRY_CREATOR : POST_FILL_POOL_ENTRY_CREATOR);
      }
   }

還有一處是addBagItem

/** {@inheritDoc} */
   @Override
   public void addBagItem(final int waiting)
   {
      final boolean shouldAdd = waiting - addConnectionQueue.size() >= 0; // Yes, >= is intentional.
      if (shouldAdd) {
         addConnectionExecutor.submit(POOL_ENTRY_CREATOR);
      }
   }

最後再補充兩個屬性idleTimeout和minimumIdle

idleTimeout This property controls the maximum amount of time that a connection is allowed to sit idle in the pool. This setting only applies when minimumIdle is defined to be less than maximumPoolSize. Idle connections will not be retired once the pool reaches minimumIdle connections. Whether a connection is retired as idle or not is subject to a maximum variation of +30 seconds, and average variation of +15 seconds. A connection will never be retired as idle before this timeout. A value of 0 means that idle connections are never removed from the pool. The minimum allowed value is 10000ms (10 seconds). Default: 600000 (10 minutes)

默認是600000毫秒,即10分鐘。如果idleTimeout+1秒>maxLifetime 且 maxLifetime>0,則會被重置爲0;如果idleTimeout!=0且小於10秒,則會被重置爲10秒。如果idleTimeout=0則表示空閒的連接在連接池中永遠不被移除。

只有當minimumIdle小於maximumPoolSize時,這個參數才生效,當空閒連接數超過minimumIdle,而且空閒時間超過idleTimeout,則會被移除。

minimumIdle This property controls the minimum number of idle connections that HikariCP tries to maintain in the pool. If the idle connections dip below this value and total connections in the pool are less than maximumPoolSize, HikariCP will make a best effort to add additional connections quickly and efficiently. However, for maximum performance and responsiveness to spike demands, we recommend not setting this value and instead allowing HikariCP to act as a fixed size connection pool. Default: same as maximumPoolSize

控制連接池空閒連接的最小數量,當連接池空閒連接少於minimumIdle,而且總共連接數不大於maximumPoolSize時,HikariCP會盡力補充新的連接。爲了性能考慮,不建議設置此值,而是讓HikariCP把連接池當做固定大小的處理,默認minimumIdle與maximumPoolSize一樣。 當 minIdle<0 或者 minIdle>maxPoolSize, 則被重置爲maxPoolSize,該值默認爲10。

Hikari會啓動一個HouseKeeper定時任務,在HikariPool構造器裏頭初始化,默認的是初始化後100毫秒執行,之後每執行完一次之後隔HOUSEKEEPING_PERIOD_MS(30秒)時間執行。 這個定時任務的作用就是根據idleTimeout的值,移除掉空閒超時的連接。 首先檢測時鐘是否倒退,如果倒退了則立即對過期的連接進行標記evict;之後當idleTimeout>0且配置的minimumIdle<maximumPoolSize時纔開始處理超時的空閒連接。 取出狀態是STATE_NOT_IN_USE的連接數,如果大於minimumIdle,則遍歷STATE_NOT_IN_USE的連接的連接,將空閒超時達到idleTimeout的連接從connectionBag移除掉,若移除成功則關閉該連接,然後toRemove--。 在空閒連接移除之後,再調用fillPool,嘗試補充空間連接數到minimumIdle值

hikari的連接泄露是每次getConnection的時候單獨觸發一個延時任務來處理,而空閒連接的清除則是使用HouseKeeper定時任務來處理,其運行間隔由com.zaxxer.hikari.housekeeping.periodMs環境變量控制,默認爲30秒。

allowPoolSuspension

關於這個參數,用來標記釋放允許暫停連接池,一旦被暫停,所有的getConnection方法都會被阻塞。

作者是這麼說的: https://github.com/brettwooldridge/HikariCP/issues/1060

All of the suspend use cases I have heard have centered around a pattern of:

  • Suspend the pool.
  • Alter the pool configuration, or alter DNS configuration (to point to a new master).
  • Soft-evict existing connections.
  • Resume the pool.

我做過試驗,Suspend期間getConnection確實不會超時,SQL執行都會被保留下來,軟驅除現有連接之後,一直保持到池恢復Resume時,這些SQL依然會繼續執行,也就是說用戶並不會丟數據。 但是在實際生產中,不影響業務很難,即使繼續執行,業務也可能超時了。 故障注入是中間件開發應該要做的,這個點的功能在實現chaosmonkey以模擬數據庫連接故障,但是監控過程中我發現hikaricp_pending_threads指標並沒有提升、MBean的threadAwaitingConnections也沒有改變,所以包括故障演練以後也可以不用搞得那麼複雜,收攏在中間件內部做可能更好,前提是對於這個參數,中間件還需要自研以增加模擬拋異常或是一些監控指標進行加強。 另外,長期阻塞該參數存在讓微服務卡死的風險

詳細推薦看一下 【追光者系列】HikariCP源碼分析之allowPoolSuspension

參考資料

https://segmentfault.com/u/codecraft/articles?page=4

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章