記一個hive1.2.1 orc 事務表不能正常提交合並任務的問題

正常情況下，hive通過 CompactionTxnHandler中的findPotentialCompactions 方法獲取需要合併的表信息，如下所示，會分別掃描COMPLETED_TXN_COMPONENTS和 TXNS, TXN_COMPONENTS，獲取已commit 的表信息和abort的事務表信息。

/**
   * This will look through the completed_txn_components table and look for partitions or tables
   * that may be ready for compaction.  Also, look through txns and txn_components tables for
   * aborted transactions that we should add to the list.
   * @param maxAborted Maximum number of aborted queries to allow before marking this as a
   *                   potential compaction.
   * @return list of CompactionInfo structs.  These will not have id, type,
   * or runAs set since these are only potential compactions not actual ones.
   */
  public Set<CompactionInfo> findPotentialCompactions(int maxAborted) throws MetaException {
    Connection dbConn = null;
    Set<CompactionInfo> response = new HashSet<CompactionInfo>();
    Statement stmt = null;
    try {
      try {
        dbConn = getDbConn(Connection.TRANSACTION_READ_COMMITTED);
        stmt = dbConn.createStatement();
        // Check for completed transactions
        String s = "select distinct ctc_database, ctc_table, " +
          "ctc_partition from COMPLETED_TXN_COMPONENTS";
        LOG.debug("Going to execute query <" + s + ">");
        ResultSet rs = stmt.executeQuery(s);
        while (rs.next()) {
          CompactionInfo info = new CompactionInfo();
          info.dbname = rs.getString(1);
          info.tableName = rs.getString(2);
          info.partName = rs.getString(3);
          response.add(info);
        }

        // Check for aborted txns
        s = "select tc_database, tc_table, tc_partition " +
          "from TXNS, TXN_COMPONENTS " +
          "where txn_id = tc_txnid and txn_state = '" + TXN_ABORTED + "' " +
          "group by tc_database, tc_table, tc_partition " +
          "having count(*) > " + maxAborted;

        LOG.debug("Going to execute query <" + s + ">");
        rs = stmt.executeQuery(s);
        while (rs.next()) {
          CompactionInfo info = new CompactionInfo();
          info.dbname = rs.getString(1);
          info.tableName = rs.getString(2);
          info.partName = rs.getString(3);
          info.tooManyAborts = true;
          response.add(info);
        }

        LOG.debug("Going to rollback");
        dbConn.rollback();
      } catch (SQLException e) {
        LOG.error("Unable to connect to transaction database " + e.getMessage());
        checkRetryable(dbConn, e, "findPotentialCompactions(maxAborted:" + maxAborted + ")");
      } finally {
        closeDbConn(dbConn);
        closeStmt(stmt);
      }
      return response;
    }
    catch (RetryException e) {
      return findPotentialCompactions(maxAborted);
    }
  }

這裏還有個生成有效事務號的方法，這裏包含了爲open狀態的最小事務id（txn-id）

 /**
   * Transform a {@link org.apache.hadoop.hive.metastore.api.GetOpenTxnsInfoResponse} to a
   * {@link org.apache.hadoop.hive.common.ValidTxnList}.  This assumes that the caller intends to
   * compact the files, and thus treats only open transactions as invalid.
   * @param txns txn list from the metastore
   * @return a valid txn list.
   */
  public static ValidTxnList createValidCompactTxnList(GetOpenTxnsInfoResponse txns) {
    long highWater = txns.getTxn_high_water_mark();
    long minOpenTxn = Long.MAX_VALUE;
    long[] exceptions = new long[txns.getOpen_txnsSize()];
    int i = 0;
    for (TxnInfo txn : txns.getOpen_txns()) {
      if (txn.getState() == TxnState.OPEN) minOpenTxn = Math.min(minOpenTxn, txn.getId());
      exceptions[i++] = txn.getId();
    }
    return new ValidCompactorTxnList(exceptions, minOpenTxn, highWater);
  }
}

然後在Initiator初始化和提交合並任務前，會去做check，是否某表或分區滿足合併條件，並且決定合併的類型（major，或者是minor）：

調用找如下

這裏面有個關鍵的方法會決定compact的類型，有兩個可配置的參數：

下面這個參數決定minor合併的觸發條件：
HIVE_COMPACTOR_DELTA_NUM_THRESHOLD("hive.compactor.delta.num.threshold", 10,
    "Number of delta directories in a table or partition that will trigger a minor\n" +
    "compaction."),

下面這個參數決定major合併的觸發條件：
HIVE_COMPACTOR_DELTA_PCT_THRESHOLD("hive.compactor.delta.pct.threshold", 0.1f,
    "Percentage (fractional) size of the delta files relative to the base that will trigger\n" +
    "a major compaction. (1.0 = 100%, so the default 0.1 = 10%.)"),

private CompactionType determineCompactionType(CompactionInfo ci, ValidTxnList txns,
                                                 StorageDescriptor sd)
      throws IOException, InterruptedException {
    boolean noBase = false;
    Path location = new Path(sd.getLocation());
    FileSystem fs = location.getFileSystem(conf);
    AcidUtils.Directory dir = AcidUtils.getAcidState(location, conf, txns);
    Path base = dir.getBaseDirectory();
    long baseSize = 0;
    FileStatus stat = null;
    if (base != null) {
      stat = fs.getFileStatus(base);
      if (!stat.isDir()) {
        LOG.error("Was assuming base " + base.toString() + " is directory, but it's a file!");
        return null;
      }
      baseSize = sumDirSize(fs, base);
    }

    List<FileStatus> originals = dir.getOriginalFiles();
    for (FileStatus origStat : originals) {
      baseSize += origStat.getLen();
    }

    long deltaSize = 0;
    List<AcidUtils.ParsedDelta> deltas = dir.getCurrentDirectories();
    for (AcidUtils.ParsedDelta delta : deltas) {
      stat = fs.getFileStatus(delta.getPath());
      if (!stat.isDir()) {
        LOG.error("Was assuming delta " + delta.getPath().toString() + " is a directory, " +
            "but it's a file!");
        return null;
      }
      deltaSize += sumDirSize(fs, delta.getPath());
    }

    if (baseSize == 0 && deltaSize > 0) {
      noBase = true;
    } else {
      float deltaPctThreshold = HiveConf.getFloatVar(conf,
          HiveConf.ConfVars.HIVE_COMPACTOR_DELTA_PCT_THRESHOLD);
      boolean bigEnough =   (float)deltaSize/(float)baseSize > deltaPctThreshold;
      if (LOG.isDebugEnabled()) {
        StringBuffer msg = new StringBuffer("delta size: ");
        msg.append(deltaSize);
        msg.append(" base size: ");
        msg.append(baseSize);
        msg.append(" threshold: ");
        msg.append(deltaPctThreshold);
        msg.append(" will major compact: ");
        msg.append(bigEnough);
        LOG.debug(msg);
      }
      if (bigEnough) return CompactionType.MAJOR;
    }

    int deltaNumThreshold = HiveConf.getIntVar(conf,
        HiveConf.ConfVars.HIVE_COMPACTOR_DELTA_NUM_THRESHOLD);
    boolean enough = deltas.size() > deltaNumThreshold;
    if (enough) {
      LOG.debug("Found " + deltas.size() + " delta files, threshold is " + deltaNumThreshold +
          (enough ? "" : "not") + " and no base, requesting " + (noBase ? "major" : "minor") +
          " compaction");
      // If there's no base file, do a major compaction
      return noBase ? CompactionType.MAJOR : CompactionType.MINOR;
    }
    return null;
  }

但是這裏有個問題就是，掃描的文件夾，要在有效的範圍內，還記的上面獲取了有效的txnlist嗎，這裏調用

AcidUtils.Directory dir = AcidUtils.getAcidState(location, conf, txns);

會去檢查文件夾所屬的事務是否已經正常結束。最終是調用ValidCompactorTxnList的isTxnRangeValid方法。如下所示：

  @Override
  public RangeResponse isTxnRangeValid(long minTxnId, long maxTxnId) {
    if (highWatermark < minTxnId) {
      return RangeResponse.NONE;
    } else if (minOpenTxn < 0) {
      return highWatermark >= maxTxnId ? RangeResponse.ALL : RangeResponse.NONE;
    } else {
      return minOpenTxn > maxTxnId ? RangeResponse.ALL : RangeResponse.NONE;
    }
  }

因此，當txns表中存在一些異常結束（比如thrift異常掛了）但是狀態殘留爲o沒有修改的情況下，就會出現無法提交合並請求的情況。這種爲o情況，在下一次提交事務時未將timeout的事務狀態修改爲a，從而受到hive.compactor.abortedtxn.threshold參數的影響，決定在失敗事務達到一定的閾值是觸發合併。

但是奇怪的是，在我們的環境中，總是有一些異常o沒有能夠被正常修改爲a，導致合併過程阻塞，使得表的小文件過多，影響讀寫效率。但是該問題目前尚未定位到原因。只能手動刪除txns表中，殘留的過期異常o狀態事務，使得合併能夠繼續。

記一個hive1.2.1 orc 事務表不能正常提交合並任務的問題

如何使用 JS 判斷用戶是否處於活躍狀態

Mono 支持LoongArch架構

lightdb秒級增加列和刪除列（not null帶默認值）

lightdb數據庫超時相關控制參數

通過HPA+CronHPA組合應對業務複雜彈性伸縮場景

❤️‍🔥 Solon Cloud Event 新的事務特性與應用

網絡爬蟲的祕密：如何高效地抓取JD.com視頻鏈接

lightdb mysql 8.0兼容之不可見主鍵

使用 JS 實現在瀏覽器控制檯打印圖片 console.image()

基於Ubuntu-22.04安裝K8s-v1.28.2實驗（四）使用域名訪問網站應用

Spark default 分區爲空時無法查詢的問題解決

記一個hive1.2.1 orc 事務表不能正常提交合並任務的問題

從零開始搭建一個windows下的presto開發調試環境

Idea開發調試MapReduce的wordCount

記一個Spark2.3 JDBC連接thriftServer 創建臨時函數的bug

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結