presto源碼分析(hive的分區處理)

修改源碼時遇到一個問題,就是對分區的處理,當遇到join查詢時,如上篇文章presto join連接時的謂詞處理所述,對於某些情況下,如果謂詞帶or,會吧分區字段當做普通字段處理,不會下推到表掃描處。但是hive是如何處理這種情況的呢?

1 hive處理分區時的調用棧

這裏寫圖片描述

1.1 代碼分析

 HiveTableHandle hiveTableHandle = checkType(tableHandle, HiveTableHandle.class, "tableHandle");
        requireNonNull(effectivePredicate, "effectivePredicate is null");

        SchemaTableName tableName = hiveTableHandle.getSchemaTableName();
        Table table = getTable(metastore, tableName);
        Optional<HiveBucketHandle> hiveBucketHandle = getHiveBucketHandle(connectorId, table, forceIntegralToBigint);

        List<HiveColumnHandle> partitionColumns = getPartitionKeyColumnHandles(connectorId, table, forceIntegralToBigint);
        Optional<HiveBucketing.HiveBucket> bucket = getHiveBucket(table, TupleDomain.extractFixedValues(effectivePredicate).get());

        TupleDomain<HiveColumnHandle> compactEffectivePredicate = toCompactTupleDomain(effectivePredicate, domainCompactionThreshold);

        if (effectivePredicate.isNone()) {
            return new HivePartitionResult(partitionColumns, ImmutableList.of(), TupleDomain.none(), TupleDomain.none(), hiveBucketHandle);
        }

        if (partitionColumns.isEmpty()) {
            return new HivePartitionResult(
                    partitionColumns,
                    ImmutableList.of(new HivePartition(tableName, compactEffectivePredicate, bucket)),
                    effectivePredicate,
                    TupleDomain.none(),
                    hiveBucketHandle);
        }

        List<String> partitionNames = getFilteredPartitionNames(metastore, tableName, partitionColumns, effectivePredicate);//這裏的effectivePredicate是個TumpleDomain集合,包含了離散的謂詞信息,例如:where l_dt='20151122' and l_partkey>100, effectivePredicate將包含兩個元素。

getFilteredPartitionNames方法:

private List<String> getFilteredPartitionNames(SemiTransactionalHiveMetastore metastore, SchemaTableName tableName, List<HiveColumnHandle> partitionKeys, TupleDomain<ColumnHandle> effectivePredicate)
    {
        checkArgument(effectivePredicate.getDomains().isPresent());

        List<String> filter = new ArrayList<>();
        for (HiveColumnHandle partitionKey : partitionKeys) {//遍歷所有的分區字段,在effectivePredicate中尋找該分區字段的具體條件
            Domain domain = effectivePredicate.getDomains().get().get(partitionKey);
            if (domain != null && domain.isNullableSingleValue()) {
                Object value = domain.getNullableSingleValue();
                if (value == null) {
                    filter.add(HivePartitionKey.HIVE_DEFAULT_DYNAMIC_PARTITION);
                }
                else if (value instanceof Slice) {
                    filter.add(((Slice) value).toStringUtf8());
                }
                else if ((value instanceof Boolean) || (value instanceof Double) || (value instanceof Long)) {
                    if (assumeCanonicalPartitionKeys) {
                        filter.add(value.toString());
                    }
                    else {
                        // Hive treats '0', 'false', and 'False' the same. However, the metastore differentiates between these.
                        filter.add(PARTITION_VALUE_WILDCARD);
                    }
                }
                else {
                    throw new PrestoException(NOT_SUPPORTED, "Only Boolean, Double and Long partition keys are supported");
                }
            }
            else {
                filter.add(PARTITION_VALUE_WILDCARD);
            }
        }

        // fetch the partition names
        return metastore.getPartitionNamesByParts(tableName.getSchemaName(), tableName.getTableName(), filter)
                .orElseThrow(() -> new TableNotFoundException(tableName));
    }

在獲取到sql謂詞中hive的分區信息後,會構造hiveTablelayoutHandler,其中包含了上邊分析出的partition信息,用於構造TableScanNode,這個TableScanNode在DistributedPlanner的visitTableScan中進行hive分區數據的加載

2 hive分區加載

在進行完謂詞中分區字段的處理後,會在後臺啓動hive分區數據的加載,調用棧如下:
這裏寫圖片描述

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章