presto源码分析(hive的分区处理)

修改源码时遇到一个问题,就是对分区的处理,当遇到join查询时,如上篇文章presto join连接时的谓词处理所述,对于某些情况下,如果谓词带or,会吧分区字段当做普通字段处理,不会下推到表扫描处。但是hive是如何处理这种情况的呢?

1 hive处理分区时的调用栈

这里写图片描述

1.1 代码分析

 HiveTableHandle hiveTableHandle = checkType(tableHandle, HiveTableHandle.class, "tableHandle");
        requireNonNull(effectivePredicate, "effectivePredicate is null");

        SchemaTableName tableName = hiveTableHandle.getSchemaTableName();
        Table table = getTable(metastore, tableName);
        Optional<HiveBucketHandle> hiveBucketHandle = getHiveBucketHandle(connectorId, table, forceIntegralToBigint);

        List<HiveColumnHandle> partitionColumns = getPartitionKeyColumnHandles(connectorId, table, forceIntegralToBigint);
        Optional<HiveBucketing.HiveBucket> bucket = getHiveBucket(table, TupleDomain.extractFixedValues(effectivePredicate).get());

        TupleDomain<HiveColumnHandle> compactEffectivePredicate = toCompactTupleDomain(effectivePredicate, domainCompactionThreshold);

        if (effectivePredicate.isNone()) {
            return new HivePartitionResult(partitionColumns, ImmutableList.of(), TupleDomain.none(), TupleDomain.none(), hiveBucketHandle);
        }

        if (partitionColumns.isEmpty()) {
            return new HivePartitionResult(
                    partitionColumns,
                    ImmutableList.of(new HivePartition(tableName, compactEffectivePredicate, bucket)),
                    effectivePredicate,
                    TupleDomain.none(),
                    hiveBucketHandle);
        }

        List<String> partitionNames = getFilteredPartitionNames(metastore, tableName, partitionColumns, effectivePredicate);//这里的effectivePredicate是个TumpleDomain集合,包含了离散的谓词信息,例如:where l_dt='20151122' and l_partkey>100, effectivePredicate将包含两个元素。

getFilteredPartitionNames方法:

private List<String> getFilteredPartitionNames(SemiTransactionalHiveMetastore metastore, SchemaTableName tableName, List<HiveColumnHandle> partitionKeys, TupleDomain<ColumnHandle> effectivePredicate)
    {
        checkArgument(effectivePredicate.getDomains().isPresent());

        List<String> filter = new ArrayList<>();
        for (HiveColumnHandle partitionKey : partitionKeys) {//遍历所有的分区字段,在effectivePredicate中寻找该分区字段的具体条件
            Domain domain = effectivePredicate.getDomains().get().get(partitionKey);
            if (domain != null && domain.isNullableSingleValue()) {
                Object value = domain.getNullableSingleValue();
                if (value == null) {
                    filter.add(HivePartitionKey.HIVE_DEFAULT_DYNAMIC_PARTITION);
                }
                else if (value instanceof Slice) {
                    filter.add(((Slice) value).toStringUtf8());
                }
                else if ((value instanceof Boolean) || (value instanceof Double) || (value instanceof Long)) {
                    if (assumeCanonicalPartitionKeys) {
                        filter.add(value.toString());
                    }
                    else {
                        // Hive treats '0', 'false', and 'False' the same. However, the metastore differentiates between these.
                        filter.add(PARTITION_VALUE_WILDCARD);
                    }
                }
                else {
                    throw new PrestoException(NOT_SUPPORTED, "Only Boolean, Double and Long partition keys are supported");
                }
            }
            else {
                filter.add(PARTITION_VALUE_WILDCARD);
            }
        }

        // fetch the partition names
        return metastore.getPartitionNamesByParts(tableName.getSchemaName(), tableName.getTableName(), filter)
                .orElseThrow(() -> new TableNotFoundException(tableName));
    }

在获取到sql谓词中hive的分区信息后,会构造hiveTablelayoutHandler,其中包含了上边分析出的partition信息,用于构造TableScanNode,这个TableScanNode在DistributedPlanner的visitTableScan中进行hive分区数据的加载

2 hive分区加载

在进行完谓词中分区字段的处理后,会在后台启动hive分区数据的加载,调用栈如下:
这里写图片描述

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章