記一個hive1.2.1 orc 事務表不能正常提交合並任務的問題

正常情況下,hive通過 CompactionTxnHandler中的findPotentialCompactions 方法獲取需要合併的表信息,如下所示,會分別掃描COMPLETED_TXN_COMPONENTS和 TXNS, TXN_COMPONENTS,獲取已commit 的表信息和abort的事務表信息。

/**
   * This will look through the completed_txn_components table and look for partitions or tables
   * that may be ready for compaction.  Also, look through txns and txn_components tables for
   * aborted transactions that we should add to the list.
   * @param maxAborted Maximum number of aborted queries to allow before marking this as a
   *                   potential compaction.
   * @return list of CompactionInfo structs.  These will not have id, type,
   * or runAs set since these are only potential compactions not actual ones.
   */
  public Set<CompactionInfo> findPotentialCompactions(int maxAborted) throws MetaException {
    Connection dbConn = null;
    Set<CompactionInfo> response = new HashSet<CompactionInfo>();
    Statement stmt = null;
    try {
      try {
        dbConn = getDbConn(Connection.TRANSACTION_READ_COMMITTED);
        stmt = dbConn.createStatement();
        // Check for completed transactions
        String s = "select distinct ctc_database, ctc_table, " +
          "ctc_partition from COMPLETED_TXN_COMPONENTS";
        LOG.debug("Going to execute query <" + s + ">");
        ResultSet rs = stmt.executeQuery(s);
        while (rs.next()) {
          CompactionInfo info = new CompactionInfo();
          info.dbname = rs.getString(1);
          info.tableName = rs.getString(2);
          info.partName = rs.getString(3);
          response.add(info);
        }

        // Check for aborted txns
        s = "select tc_database, tc_table, tc_partition " +
          "from TXNS, TXN_COMPONENTS " +
          "where txn_id = tc_txnid and txn_state = '" + TXN_ABORTED + "' " +
          "group by tc_database, tc_table, tc_partition " +
          "having count(*) > " + maxAborted;

        LOG.debug("Going to execute query <" + s + ">");
        rs = stmt.executeQuery(s);
        while (rs.next()) {
          CompactionInfo info = new CompactionInfo();
          info.dbname = rs.getString(1);
          info.tableName = rs.getString(2);
          info.partName = rs.getString(3);
          info.tooManyAborts = true;
          response.add(info);
        }

        LOG.debug("Going to rollback");
        dbConn.rollback();
      } catch (SQLException e) {
        LOG.error("Unable to connect to transaction database " + e.getMessage());
        checkRetryable(dbConn, e, "findPotentialCompactions(maxAborted:" + maxAborted + ")");
      } finally {
        closeDbConn(dbConn);
        closeStmt(stmt);
      }
      return response;
    }
    catch (RetryException e) {
      return findPotentialCompactions(maxAborted);
    }
  }

這裏還有個生成有效事務號的方法,這裏包含了爲open狀態的最小事務id(txn-id)

 /**
   * Transform a {@link org.apache.hadoop.hive.metastore.api.GetOpenTxnsInfoResponse} to a
   * {@link org.apache.hadoop.hive.common.ValidTxnList}.  This assumes that the caller intends to
   * compact the files, and thus treats only open transactions as invalid.
   * @param txns txn list from the metastore
   * @return a valid txn list.
   */
  public static ValidTxnList createValidCompactTxnList(GetOpenTxnsInfoResponse txns) {
    long highWater = txns.getTxn_high_water_mark();
    long minOpenTxn = Long.MAX_VALUE;
    long[] exceptions = new long[txns.getOpen_txnsSize()];
    int i = 0;
    for (TxnInfo txn : txns.getOpen_txns()) {
      if (txn.getState() == TxnState.OPEN) minOpenTxn = Math.min(minOpenTxn, txn.getId());
      exceptions[i++] = txn.getId();
    }
    return new ValidCompactorTxnList(exceptions, minOpenTxn, highWater);
  }
}

然後在Initiator初始化和提交合並任務前,會去做check, 是否某表或分區滿足合併條件,並且決定合併的類型(major,或者是minor):

調用找如下

這裏面有個關鍵的方法會決定compact的類型,有兩個可配置的參數:

下面這個參數決定minor合併的觸發條件:
HIVE_COMPACTOR_DELTA_NUM_THRESHOLD("hive.compactor.delta.num.threshold", 10,
    "Number of delta directories in a table or partition that will trigger a minor\n" +
    "compaction."),
下面這個參數決定major合併的觸發條件:
HIVE_COMPACTOR_DELTA_PCT_THRESHOLD("hive.compactor.delta.pct.threshold", 0.1f,
    "Percentage (fractional) size of the delta files relative to the base that will trigger\n" +
    "a major compaction. (1.0 = 100%, so the default 0.1 = 10%.)"), 
private CompactionType determineCompactionType(CompactionInfo ci, ValidTxnList txns,
                                                 StorageDescriptor sd)
      throws IOException, InterruptedException {
    boolean noBase = false;
    Path location = new Path(sd.getLocation());
    FileSystem fs = location.getFileSystem(conf);
    AcidUtils.Directory dir = AcidUtils.getAcidState(location, conf, txns);
    Path base = dir.getBaseDirectory();
    long baseSize = 0;
    FileStatus stat = null;
    if (base != null) {
      stat = fs.getFileStatus(base);
      if (!stat.isDir()) {
        LOG.error("Was assuming base " + base.toString() + " is directory, but it's a file!");
        return null;
      }
      baseSize = sumDirSize(fs, base);
    }

    List<FileStatus> originals = dir.getOriginalFiles();
    for (FileStatus origStat : originals) {
      baseSize += origStat.getLen();
    }

    long deltaSize = 0;
    List<AcidUtils.ParsedDelta> deltas = dir.getCurrentDirectories();
    for (AcidUtils.ParsedDelta delta : deltas) {
      stat = fs.getFileStatus(delta.getPath());
      if (!stat.isDir()) {
        LOG.error("Was assuming delta " + delta.getPath().toString() + " is a directory, " +
            "but it's a file!");
        return null;
      }
      deltaSize += sumDirSize(fs, delta.getPath());
    }

    if (baseSize == 0 && deltaSize > 0) {
      noBase = true;
    } else {
      float deltaPctThreshold = HiveConf.getFloatVar(conf,
          HiveConf.ConfVars.HIVE_COMPACTOR_DELTA_PCT_THRESHOLD);
      boolean bigEnough =   (float)deltaSize/(float)baseSize > deltaPctThreshold;
      if (LOG.isDebugEnabled()) {
        StringBuffer msg = new StringBuffer("delta size: ");
        msg.append(deltaSize);
        msg.append(" base size: ");
        msg.append(baseSize);
        msg.append(" threshold: ");
        msg.append(deltaPctThreshold);
        msg.append(" will major compact: ");
        msg.append(bigEnough);
        LOG.debug(msg);
      }
      if (bigEnough) return CompactionType.MAJOR;
    }

    int deltaNumThreshold = HiveConf.getIntVar(conf,
        HiveConf.ConfVars.HIVE_COMPACTOR_DELTA_NUM_THRESHOLD);
    boolean enough = deltas.size() > deltaNumThreshold;
    if (enough) {
      LOG.debug("Found " + deltas.size() + " delta files, threshold is " + deltaNumThreshold +
          (enough ? "" : "not") + " and no base, requesting " + (noBase ? "major" : "minor") +
          " compaction");
      // If there's no base file, do a major compaction
      return noBase ? CompactionType.MAJOR : CompactionType.MINOR;
    }
    return null;
  }

但是這裏有個問題就是,掃描的文件夾,要在有效的範圍內,還記的上面獲取了有效的txnlist嗎,這裏調用

AcidUtils.Directory dir = AcidUtils.getAcidState(location, conf, txns); 

會去檢查文件夾所屬的事務是否已經正常結束。最終是調用ValidCompactorTxnList的isTxnRangeValid方法。如下所示:

  @Override
  public RangeResponse isTxnRangeValid(long minTxnId, long maxTxnId) {
    if (highWatermark < minTxnId) {
      return RangeResponse.NONE;
    } else if (minOpenTxn < 0) {
      return highWatermark >= maxTxnId ? RangeResponse.ALL : RangeResponse.NONE;
    } else {
      return minOpenTxn > maxTxnId ? RangeResponse.ALL : RangeResponse.NONE;
    }
  }

因此,當txns表中存在一些異常結束(比如thrift異常掛了)但是狀態殘留爲o沒有修改的情況下,就會出現無法提交合並請求的情況。這種爲o情況,在下一次提交事務時未將timeout的事務狀態修改爲a,從而受到hive.compactor.abortedtxn.threshold參數的影響,決定在失敗事務達到一定的閾值是觸發合併。

但是奇怪的是,在我們的環境中,總是有一些異常o沒有能夠被正常修改爲a,導致合併過程阻塞,使得表的小文件過多,影響讀寫效率。但是該問題目前尚未定位到原因。只能手動刪除txns表中,殘留的過期異常o狀態事務,使得合併能夠繼續。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章