记一个hive1.2.1 orc 事务表不能正常提交合并任务的问题

正常情况下,hive通过 CompactionTxnHandler中的findPotentialCompactions 方法获取需要合并的表信息,如下所示,会分别扫描COMPLETED_TXN_COMPONENTS和 TXNS, TXN_COMPONENTS,获取已commit 的表信息和abort的事务表信息。

/**
   * This will look through the completed_txn_components table and look for partitions or tables
   * that may be ready for compaction.  Also, look through txns and txn_components tables for
   * aborted transactions that we should add to the list.
   * @param maxAborted Maximum number of aborted queries to allow before marking this as a
   *                   potential compaction.
   * @return list of CompactionInfo structs.  These will not have id, type,
   * or runAs set since these are only potential compactions not actual ones.
   */
  public Set<CompactionInfo> findPotentialCompactions(int maxAborted) throws MetaException {
    Connection dbConn = null;
    Set<CompactionInfo> response = new HashSet<CompactionInfo>();
    Statement stmt = null;
    try {
      try {
        dbConn = getDbConn(Connection.TRANSACTION_READ_COMMITTED);
        stmt = dbConn.createStatement();
        // Check for completed transactions
        String s = "select distinct ctc_database, ctc_table, " +
          "ctc_partition from COMPLETED_TXN_COMPONENTS";
        LOG.debug("Going to execute query <" + s + ">");
        ResultSet rs = stmt.executeQuery(s);
        while (rs.next()) {
          CompactionInfo info = new CompactionInfo();
          info.dbname = rs.getString(1);
          info.tableName = rs.getString(2);
          info.partName = rs.getString(3);
          response.add(info);
        }

        // Check for aborted txns
        s = "select tc_database, tc_table, tc_partition " +
          "from TXNS, TXN_COMPONENTS " +
          "where txn_id = tc_txnid and txn_state = '" + TXN_ABORTED + "' " +
          "group by tc_database, tc_table, tc_partition " +
          "having count(*) > " + maxAborted;

        LOG.debug("Going to execute query <" + s + ">");
        rs = stmt.executeQuery(s);
        while (rs.next()) {
          CompactionInfo info = new CompactionInfo();
          info.dbname = rs.getString(1);
          info.tableName = rs.getString(2);
          info.partName = rs.getString(3);
          info.tooManyAborts = true;
          response.add(info);
        }

        LOG.debug("Going to rollback");
        dbConn.rollback();
      } catch (SQLException e) {
        LOG.error("Unable to connect to transaction database " + e.getMessage());
        checkRetryable(dbConn, e, "findPotentialCompactions(maxAborted:" + maxAborted + ")");
      } finally {
        closeDbConn(dbConn);
        closeStmt(stmt);
      }
      return response;
    }
    catch (RetryException e) {
      return findPotentialCompactions(maxAborted);
    }
  }

这里还有个生成有效事务号的方法,这里包含了为open状态的最小事务id(txn-id)

 /**
   * Transform a {@link org.apache.hadoop.hive.metastore.api.GetOpenTxnsInfoResponse} to a
   * {@link org.apache.hadoop.hive.common.ValidTxnList}.  This assumes that the caller intends to
   * compact the files, and thus treats only open transactions as invalid.
   * @param txns txn list from the metastore
   * @return a valid txn list.
   */
  public static ValidTxnList createValidCompactTxnList(GetOpenTxnsInfoResponse txns) {
    long highWater = txns.getTxn_high_water_mark();
    long minOpenTxn = Long.MAX_VALUE;
    long[] exceptions = new long[txns.getOpen_txnsSize()];
    int i = 0;
    for (TxnInfo txn : txns.getOpen_txns()) {
      if (txn.getState() == TxnState.OPEN) minOpenTxn = Math.min(minOpenTxn, txn.getId());
      exceptions[i++] = txn.getId();
    }
    return new ValidCompactorTxnList(exceptions, minOpenTxn, highWater);
  }
}

然后在Initiator初始化和提交合并任务前,会去做check, 是否某表或分区满足合并条件,并且决定合并的类型(major,或者是minor):

调用找如下

这里面有个关键的方法会决定compact的类型,有两个可配置的参数:

下面这个参数决定minor合并的触发条件:
HIVE_COMPACTOR_DELTA_NUM_THRESHOLD("hive.compactor.delta.num.threshold", 10,
    "Number of delta directories in a table or partition that will trigger a minor\n" +
    "compaction."),
下面这个参数决定major合并的触发条件:
HIVE_COMPACTOR_DELTA_PCT_THRESHOLD("hive.compactor.delta.pct.threshold", 0.1f,
    "Percentage (fractional) size of the delta files relative to the base that will trigger\n" +
    "a major compaction. (1.0 = 100%, so the default 0.1 = 10%.)"), 
private CompactionType determineCompactionType(CompactionInfo ci, ValidTxnList txns,
                                                 StorageDescriptor sd)
      throws IOException, InterruptedException {
    boolean noBase = false;
    Path location = new Path(sd.getLocation());
    FileSystem fs = location.getFileSystem(conf);
    AcidUtils.Directory dir = AcidUtils.getAcidState(location, conf, txns);
    Path base = dir.getBaseDirectory();
    long baseSize = 0;
    FileStatus stat = null;
    if (base != null) {
      stat = fs.getFileStatus(base);
      if (!stat.isDir()) {
        LOG.error("Was assuming base " + base.toString() + " is directory, but it's a file!");
        return null;
      }
      baseSize = sumDirSize(fs, base);
    }

    List<FileStatus> originals = dir.getOriginalFiles();
    for (FileStatus origStat : originals) {
      baseSize += origStat.getLen();
    }

    long deltaSize = 0;
    List<AcidUtils.ParsedDelta> deltas = dir.getCurrentDirectories();
    for (AcidUtils.ParsedDelta delta : deltas) {
      stat = fs.getFileStatus(delta.getPath());
      if (!stat.isDir()) {
        LOG.error("Was assuming delta " + delta.getPath().toString() + " is a directory, " +
            "but it's a file!");
        return null;
      }
      deltaSize += sumDirSize(fs, delta.getPath());
    }

    if (baseSize == 0 && deltaSize > 0) {
      noBase = true;
    } else {
      float deltaPctThreshold = HiveConf.getFloatVar(conf,
          HiveConf.ConfVars.HIVE_COMPACTOR_DELTA_PCT_THRESHOLD);
      boolean bigEnough =   (float)deltaSize/(float)baseSize > deltaPctThreshold;
      if (LOG.isDebugEnabled()) {
        StringBuffer msg = new StringBuffer("delta size: ");
        msg.append(deltaSize);
        msg.append(" base size: ");
        msg.append(baseSize);
        msg.append(" threshold: ");
        msg.append(deltaPctThreshold);
        msg.append(" will major compact: ");
        msg.append(bigEnough);
        LOG.debug(msg);
      }
      if (bigEnough) return CompactionType.MAJOR;
    }

    int deltaNumThreshold = HiveConf.getIntVar(conf,
        HiveConf.ConfVars.HIVE_COMPACTOR_DELTA_NUM_THRESHOLD);
    boolean enough = deltas.size() > deltaNumThreshold;
    if (enough) {
      LOG.debug("Found " + deltas.size() + " delta files, threshold is " + deltaNumThreshold +
          (enough ? "" : "not") + " and no base, requesting " + (noBase ? "major" : "minor") +
          " compaction");
      // If there's no base file, do a major compaction
      return noBase ? CompactionType.MAJOR : CompactionType.MINOR;
    }
    return null;
  }

但是这里有个问题就是,扫描的文件夹,要在有效的范围内,还记的上面获取了有效的txnlist吗,这里调用

AcidUtils.Directory dir = AcidUtils.getAcidState(location, conf, txns); 

会去检查文件夹所属的事务是否已经正常结束。最终是调用ValidCompactorTxnList的isTxnRangeValid方法。如下所示:

  @Override
  public RangeResponse isTxnRangeValid(long minTxnId, long maxTxnId) {
    if (highWatermark < minTxnId) {
      return RangeResponse.NONE;
    } else if (minOpenTxn < 0) {
      return highWatermark >= maxTxnId ? RangeResponse.ALL : RangeResponse.NONE;
    } else {
      return minOpenTxn > maxTxnId ? RangeResponse.ALL : RangeResponse.NONE;
    }
  }

因此,当txns表中存在一些异常结束(比如thrift异常挂了)但是状态残留为o没有修改的情况下,就会出现无法提交合并请求的情况。这种为o情况,在下一次提交事务时未将timeout的事务状态修改为a,从而受到hive.compactor.abortedtxn.threshold参数的影响,决定在失败事务达到一定的阈值是触发合并。

但是奇怪的是,在我们的环境中,总是有一些异常o没有能够被正常修改为a,导致合并过程阻塞,使得表的小文件过多,影响读写效率。但是该问题目前尚未定位到原因。只能手动删除txns表中,残留的过期异常o状态事务,使得合并能够继续。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章