正常情况下,hive通过 CompactionTxnHandler中的findPotentialCompactions 方法获取需要合并的表信息,如下所示,会分别扫描COMPLETED_TXN_COMPONENTS和 TXNS, TXN_COMPONENTS,获取已commit 的表信息和abort的事务表信息。
/**
* This will look through the completed_txn_components table and look for partitions or tables
* that may be ready for compaction. Also, look through txns and txn_components tables for
* aborted transactions that we should add to the list.
* @param maxAborted Maximum number of aborted queries to allow before marking this as a
* potential compaction.
* @return list of CompactionInfo structs. These will not have id, type,
* or runAs set since these are only potential compactions not actual ones.
*/
public Set<CompactionInfo> findPotentialCompactions(int maxAborted) throws MetaException {
Connection dbConn = null;
Set<CompactionInfo> response = new HashSet<CompactionInfo>();
Statement stmt = null;
try {
try {
dbConn = getDbConn(Connection.TRANSACTION_READ_COMMITTED);
stmt = dbConn.createStatement();
// Check for completed transactions
String s = "select distinct ctc_database, ctc_table, " +
"ctc_partition from COMPLETED_TXN_COMPONENTS";
LOG.debug("Going to execute query <" + s + ">");
ResultSet rs = stmt.executeQuery(s);
while (rs.next()) {
CompactionInfo info = new CompactionInfo();
info.dbname = rs.getString(1);
info.tableName = rs.getString(2);
info.partName = rs.getString(3);
response.add(info);
}
// Check for aborted txns
s = "select tc_database, tc_table, tc_partition " +
"from TXNS, TXN_COMPONENTS " +
"where txn_id = tc_txnid and txn_state = '" + TXN_ABORTED + "' " +
"group by tc_database, tc_table, tc_partition " +
"having count(*) > " + maxAborted;
LOG.debug("Going to execute query <" + s + ">");
rs = stmt.executeQuery(s);
while (rs.next()) {
CompactionInfo info = new CompactionInfo();
info.dbname = rs.getString(1);
info.tableName = rs.getString(2);
info.partName = rs.getString(3);
info.tooManyAborts = true;
response.add(info);
}
LOG.debug("Going to rollback");
dbConn.rollback();
} catch (SQLException e) {
LOG.error("Unable to connect to transaction database " + e.getMessage());
checkRetryable(dbConn, e, "findPotentialCompactions(maxAborted:" + maxAborted + ")");
} finally {
closeDbConn(dbConn);
closeStmt(stmt);
}
return response;
}
catch (RetryException e) {
return findPotentialCompactions(maxAborted);
}
}
这里还有个生成有效事务号的方法,这里包含了为open状态的最小事务id(txn-id)
/**
* Transform a {@link org.apache.hadoop.hive.metastore.api.GetOpenTxnsInfoResponse} to a
* {@link org.apache.hadoop.hive.common.ValidTxnList}. This assumes that the caller intends to
* compact the files, and thus treats only open transactions as invalid.
* @param txns txn list from the metastore
* @return a valid txn list.
*/
public static ValidTxnList createValidCompactTxnList(GetOpenTxnsInfoResponse txns) {
long highWater = txns.getTxn_high_water_mark();
long minOpenTxn = Long.MAX_VALUE;
long[] exceptions = new long[txns.getOpen_txnsSize()];
int i = 0;
for (TxnInfo txn : txns.getOpen_txns()) {
if (txn.getState() == TxnState.OPEN) minOpenTxn = Math.min(minOpenTxn, txn.getId());
exceptions[i++] = txn.getId();
}
return new ValidCompactorTxnList(exceptions, minOpenTxn, highWater);
}
}
然后在Initiator初始化和提交合并任务前,会去做check, 是否某表或分区满足合并条件,并且决定合并的类型(major,或者是minor):
调用找如下
这里面有个关键的方法会决定compact的类型,有两个可配置的参数:
下面这个参数决定minor合并的触发条件: HIVE_COMPACTOR_DELTA_NUM_THRESHOLD("hive.compactor.delta.num.threshold", 10, "Number of delta directories in a table or partition that will trigger a minor\n" + "compaction."),
下面这个参数决定major合并的触发条件: HIVE_COMPACTOR_DELTA_PCT_THRESHOLD("hive.compactor.delta.pct.threshold", 0.1f, "Percentage (fractional) size of the delta files relative to the base that will trigger\n" + "a major compaction. (1.0 = 100%, so the default 0.1 = 10%.)"),
private CompactionType determineCompactionType(CompactionInfo ci, ValidTxnList txns,
StorageDescriptor sd)
throws IOException, InterruptedException {
boolean noBase = false;
Path location = new Path(sd.getLocation());
FileSystem fs = location.getFileSystem(conf);
AcidUtils.Directory dir = AcidUtils.getAcidState(location, conf, txns);
Path base = dir.getBaseDirectory();
long baseSize = 0;
FileStatus stat = null;
if (base != null) {
stat = fs.getFileStatus(base);
if (!stat.isDir()) {
LOG.error("Was assuming base " + base.toString() + " is directory, but it's a file!");
return null;
}
baseSize = sumDirSize(fs, base);
}
List<FileStatus> originals = dir.getOriginalFiles();
for (FileStatus origStat : originals) {
baseSize += origStat.getLen();
}
long deltaSize = 0;
List<AcidUtils.ParsedDelta> deltas = dir.getCurrentDirectories();
for (AcidUtils.ParsedDelta delta : deltas) {
stat = fs.getFileStatus(delta.getPath());
if (!stat.isDir()) {
LOG.error("Was assuming delta " + delta.getPath().toString() + " is a directory, " +
"but it's a file!");
return null;
}
deltaSize += sumDirSize(fs, delta.getPath());
}
if (baseSize == 0 && deltaSize > 0) {
noBase = true;
} else {
float deltaPctThreshold = HiveConf.getFloatVar(conf,
HiveConf.ConfVars.HIVE_COMPACTOR_DELTA_PCT_THRESHOLD);
boolean bigEnough = (float)deltaSize/(float)baseSize > deltaPctThreshold;
if (LOG.isDebugEnabled()) {
StringBuffer msg = new StringBuffer("delta size: ");
msg.append(deltaSize);
msg.append(" base size: ");
msg.append(baseSize);
msg.append(" threshold: ");
msg.append(deltaPctThreshold);
msg.append(" will major compact: ");
msg.append(bigEnough);
LOG.debug(msg);
}
if (bigEnough) return CompactionType.MAJOR;
}
int deltaNumThreshold = HiveConf.getIntVar(conf,
HiveConf.ConfVars.HIVE_COMPACTOR_DELTA_NUM_THRESHOLD);
boolean enough = deltas.size() > deltaNumThreshold;
if (enough) {
LOG.debug("Found " + deltas.size() + " delta files, threshold is " + deltaNumThreshold +
(enough ? "" : "not") + " and no base, requesting " + (noBase ? "major" : "minor") +
" compaction");
// If there's no base file, do a major compaction
return noBase ? CompactionType.MAJOR : CompactionType.MINOR;
}
return null;
}
但是这里有个问题就是,扫描的文件夹,要在有效的范围内,还记的上面获取了有效的txnlist吗,这里调用
AcidUtils.Directory dir = AcidUtils.getAcidState(location, conf, txns);
会去检查文件夹所属的事务是否已经正常结束。最终是调用ValidCompactorTxnList的isTxnRangeValid方法。如下所示:
@Override
public RangeResponse isTxnRangeValid(long minTxnId, long maxTxnId) {
if (highWatermark < minTxnId) {
return RangeResponse.NONE;
} else if (minOpenTxn < 0) {
return highWatermark >= maxTxnId ? RangeResponse.ALL : RangeResponse.NONE;
} else {
return minOpenTxn > maxTxnId ? RangeResponse.ALL : RangeResponse.NONE;
}
}
因此,当txns表中存在一些异常结束(比如thrift异常挂了)但是状态残留为o没有修改的情况下,就会出现无法提交合并请求的情况。这种为o情况,在下一次提交事务时未将timeout的事务状态修改为a,从而受到hive.compactor.abortedtxn.threshold参数的影响,决定在失败事务达到一定的阈值是触发合并。
但是奇怪的是,在我们的环境中,总是有一些异常o没有能够被正常修改为a,导致合并过程阻塞,使得表的小文件过多,影响读写效率。但是该问题目前尚未定位到原因。只能手动删除txns表中,残留的过期异常o状态事务,使得合并能够继续。