Impala編譯:一個maven編譯錯誤的解決

編譯Impala時遇到了一個maven錯誤,準確地說是編譯testdata模塊時報的錯。我用的指令是 “./buildall.sh -skiptests -format -testdata”,遇到的錯誤如下:

========================================================================
Running mvn  -U package
Directory /home/quanlong/workspace/Impala/testdata
========================================================================
19:54:15 [WARNING] Could not transfer metadata com.cloudera.cdh:cdh-root:6.x-SNAPSHOT/maven-metadata.xml from/to ${distMgmtSnapshotsId} (${distMgmtSnapshotsUrl}): Cannot access ${distMgmtSnapshotsUrl} with type default using the available connector factories: BasicRepositoryConnectorFactory
19:54:26 [WARNING] The POM for org.apache.parquet:parquet-avro:jar:1.10.99-cdh6.x-20200124.115524-1814051 is invalid, transitive dependencies (if any) will not be available, enable debug logging for more details
19:54:29 [ERROR] COMPILATION ERROR : 
19:54:29 [ERROR] /home/quanlong/workspace/Impala/testdata/src/main/java/org/apache/impala/datagenerator/RandomNestedDataGenerator.java:[39,42] package org.apache.parquet.hadoop.metadata does not exist
19:54:29 [ERROR] /home/quanlong/workspace/Impala/testdata/src/main/java/org/apache/impala/datagenerator/JsonToParquetConverter.java:[79,46] cannot access org.apache.parquet.hadoop.ParquetWriter
19:54:29 [ERROR] /home/quanlong/workspace/Impala/testdata/src/main/java/org/apache/impala/datagenerator/JsonToParquetConverter.java:[80,26] cannot find symbol
19:54:29 [ERROR] /home/quanlong/workspace/Impala/testdata/src/main/java/org/apache/impala/datagenerator/JsonToParquetConverter.java:[81,26] cannot find symbol
19:54:29 [ERROR] /home/quanlong/workspace/Impala/testdata/src/main/java/org/apache/impala/datagenerator/JsonToParquetConverter.java:[78,47] cannot access org.apache.parquet.hadoop.api.WriteSupport
19:54:29 [ERROR] /home/quanlong/workspace/Impala/testdata/src/main/java/org/apache/impala/datagenerator/JsonToParquetConverter.java:[87,15] cannot find symbol
19:54:29 [ERROR] /home/quanlong/workspace/Impala/testdata/src/main/java/org/apache/impala/datagenerator/JsonToParquetConverter.java:[90,13] cannot find symbol
19:54:29 [ERROR] /home/quanlong/workspace/Impala/testdata/src/main/java/org/apache/impala/datagenerator/RandomNestedDataGenerator.java:[66,9] cannot find symbol
19:54:29 [ERROR] /home/quanlong/workspace/Impala/testdata/src/main/java/org/apache/impala/datagenerator/RandomNestedDataGenerator.java:[67,26] cannot find symbol
19:54:29 [ERROR] /home/quanlong/workspace/Impala/testdata/src/main/java/org/apache/impala/datagenerator/RandomNestedDataGenerator.java:[68,26] cannot find symbol
19:54:29 [ERROR] /home/quanlong/workspace/Impala/testdata/src/main/java/org/apache/impala/datagenerator/RandomNestedDataGenerator.java:[75,15] cannot find symbol
19:54:29 [ERROR] /home/quanlong/workspace/Impala/testdata/src/main/java/org/apache/impala/datagenerator/RandomNestedDataGenerator.java:[78,13] cannot find symbol
19:54:29 [INFO] BUILD FAILURE
19:54:29 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.3:compile (default-compile) on project impala-testdata: Compilation failure: Compilation failure: 
19:54:29 [ERROR] /home/quanlong/workspace/Impala/testdata/src/main/java/org/apache/impala/datagenerator/RandomNestedDataGenerator.java:[39,42] package org.apache.parquet.hadoop.metadata does not exist
19:54:29 [ERROR] /home/quanlong/workspace/Impala/testdata/src/main/java/org/apache/impala/datagenerator/JsonToParquetConverter.java:[79,46] cannot access org.apache.parquet.hadoop.ParquetWriter
19:54:29 [ERROR]   class file for org.apache.parquet.hadoop.ParquetWriter not found
19:54:29 [ERROR] /home/quanlong/workspace/Impala/testdata/src/main/java/org/apache/impala/datagenerator/JsonToParquetConverter.java:[80,26] cannot find symbol
19:54:29 [ERROR]   symbol:   variable DEFAULT_BLOCK_SIZE
19:54:29 [ERROR]   location: class org.apache.parquet.avro.AvroParquetWriter
19:54:29 [ERROR] /home/quanlong/workspace/Impala/testdata/src/main/java/org/apache/impala/datagenerator/JsonToParquetConverter.java:[81,26] cannot find symbol
19:54:29 [ERROR]   symbol:   variable DEFAULT_PAGE_SIZE
19:54:29 [ERROR]   location: class org.apache.parquet.avro.AvroParquetWriter
19:54:29 [ERROR] /home/quanlong/workspace/Impala/testdata/src/main/java/org/apache/impala/datagenerator/JsonToParquetConverter.java:[78,47] cannot access org.apache.parquet.hadoop.api.WriteSupport
19:54:29 [ERROR]   class file for org.apache.parquet.hadoop.api.WriteSupport not found
19:54:29 [ERROR] /home/quanlong/workspace/Impala/testdata/src/main/java/org/apache/impala/datagenerator/JsonToParquetConverter.java:[87,15] cannot find symbol
19:54:29 [ERROR]   symbol:   method write(org.apache.avro.generic.GenericRecord)
19:54:29 [ERROR]   location: variable writer of type org.apache.parquet.avro.AvroParquetWriter<org.apache.avro.generic.GenericRecord>
19:54:29 [ERROR] /home/quanlong/workspace/Impala/testdata/src/main/java/org/apache/impala/datagenerator/JsonToParquetConverter.java:[90,13] cannot find symbol
19:54:29 [ERROR]   symbol:   method close()
19:54:29 [ERROR]   location: variable writer of type org.apache.parquet.avro.AvroParquetWriter<org.apache.avro.generic.GenericRecord>
19:54:29 [ERROR] /home/quanlong/workspace/Impala/testdata/src/main/java/org/apache/impala/datagenerator/RandomNestedDataGenerator.java:[66,9] cannot find symbol
19:54:29 [ERROR]   symbol:   variable CompressionCodecName
19:54:29 [ERROR]   location: class org.apache.impala.datagenerator.RandomNestedDataGenerator
19:54:29 [ERROR] /home/quanlong/workspace/Impala/testdata/src/main/java/org/apache/impala/datagenerator/RandomNestedDataGenerator.java:[67,26] cannot find symbol
19:54:29 [ERROR]   symbol:   variable DEFAULT_BLOCK_SIZE
19:54:29 [ERROR]   location: class org.apache.parquet.avro.AvroParquetWriter
19:54:29 [ERROR] /home/quanlong/workspace/Impala/testdata/src/main/java/org/apache/impala/datagenerator/RandomNestedDataGenerator.java:[68,26] cannot find symbol
19:54:29 [ERROR]   symbol:   variable DEFAULT_PAGE_SIZE
19:54:29 [ERROR]   location: class org.apache.parquet.avro.AvroParquetWriter
19:54:29 [ERROR] /home/quanlong/workspace/Impala/testdata/src/main/java/org/apache/impala/datagenerator/RandomNestedDataGenerator.java:[75,15] cannot find symbol
19:54:29 [ERROR]   symbol:   method write(org.apache.avro.generic.GenericData.Record)
19:54:29 [ERROR]   location: variable writer of type org.apache.parquet.avro.AvroParquetWriter<org.apache.avro.generic.GenericRecord>
19:54:29 [ERROR] /home/quanlong/workspace/Impala/testdata/src/main/java/org/apache/impala/datagenerator/RandomNestedDataGenerator.java:[78,13] cannot find symbol
19:54:29 [ERROR]   symbol:   method close()
19:54:29 [ERROR]   location: variable writer of type org.apache.parquet.avro.AvroParquetWriter<org.apache.avro.generic.GenericRecord>
19:54:29 [ERROR] -> [Help 1]
19:54:29 [ERROR] 
19:54:29 [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
19:54:29 [ERROR] Re-run Maven using the -X switch to enable full debug logging.
19:54:29 [ERROR] 
19:54:29 [ERROR] For more information about the errors and possible solutions, please read the following articles:
19:54:29 [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
mvn  -U package exited with code 0
ERROR in /home/quanlong/workspace/Impala/bin/create_testdata.sh at line 34: ${IMPALA_HOME}/bin/mvn-quiet.sh package
Generated: /home/quanlong/workspace/Impala/logs/extra_junit_xml_logs/generate_junitxml.buildall.create_testdata.20200410_02_54_29.xml

這看着像是缺依賴導致的包和類找不到。正好我又有一個能正常編譯Impala的環境,兩邊都執行以下指令對比下依賴:

(push testdata && mvn && mvn dependency:tree)

對比發現關於 org.apache.parquet:parquet-avro:jar 的依賴不一樣,正常的環境裏是這樣:

[INFO] +- org.apache.parquet:parquet-avro:jar:1.10.99-cdh6.x-SNAPSHOT:compile
[INFO] |  +- org.apache.parquet:parquet-column:jar:1.10.99-cdh6.x-SNAPSHOT:compile
[INFO] |  |  +- org.apache.parquet:parquet-common:jar:1.10.99-cdh6.x-SNAPSHOT:compile
[INFO] |  |  \- org.apache.parquet:parquet-encoding:jar:1.10.99-cdh6.x-SNAPSHOT:compile
[INFO] |  +- org.apache.parquet:parquet-hadoop:jar:1.10.99-cdh6.x-SNAPSHOT:compile
[INFO] |  |  +- org.apache.parquet:parquet-jackson:jar:1.10.99-cdh6.x-SNAPSHOT:compile
[INFO] |  |  \- commons-pool:commons-pool:jar:1.6:compile
[INFO] |  \- org.apache.parquet:parquet-format-structures:jar:1.10.99-cdh6.x-SNAPSHOT:compile
[INFO] |     \- javax.annotation:javax.annotation-api:jar:1.3.2:compile
[INFO] \- org.kitesdk:kite-data-core:jar:1.0.0-cdh6.x-SNAPSHOT:compile
[INFO]    +- org.kitesdk:kite-hadoop-compatibility:jar:1.0.0-cdh6.x-SNAPSHOT:compile
[INFO]    +- org.xerial.snappy:snappy-java:jar:1.1.4:compile
[INFO]    +- net.sf.opencsv:opencsv:jar:2.3:compile
[INFO]    +- org.apache.commons:commons-jexl:jar:2.1.1:compile
[INFO]    +- com.fasterxml.jackson.core:jackson-annotations:jar:2.9.10:compile
[INFO]    \- com.fasterxml.jackson.core:jackson-core:jar:2.9.10:compile

出錯的環境裏是這樣:

[INFO] +- org.apache.parquet:parquet-avro:jar:1.10.99-cdh6.x-SNAPSHOT:compile
[INFO] \- org.kitesdk:kite-data-core:jar:1.0.0-cdh6.x-SNAPSHOT:compile
[INFO]    +- org.kitesdk:kite-hadoop-compatibility:jar:1.0.0-cdh6.x-SNAPSHOT:compile
[INFO]    +- org.xerial.snappy:snappy-java:jar:1.1.4:compile
[INFO]    +- net.sf.opencsv:opencsv:jar:2.3:compile
[INFO]    \- org.apache.commons:commons-jexl:jar:2.1.1:compile

少了由 parquet-avro:jar 引入的傳遞依賴。懷疑是兩個SNAPSHOT包的pom文件不一樣,但對比後發現是一致的。這時在輸出裏搜parquet,才發現下面這個warning!

[WARNING] The POM for org.apache.parquet:parquet-avro:jar:1.10.99-cdh6.x-20200124.115524-1814051 is invalid, transitive dependencies (if any) will not be available, enable debug logging for more details

這解釋了出錯的環境裏爲什麼parquet-avro的傳遞依賴沒有引入。爲了具體找到原因,給mvn加-X打開debug日誌:

(pushd testdata && mvn -X dependency:tree)

輸出比較多,找到相關的一段是這樣:

[WARNING] The POM for org.apache.parquet:parquet-avro:jar:1.10.99-cdh6.x-20200124.115524-1814051 is invalid, transitive dependencies (if any) will not be available: 1 problem was encountered while building the effective model for org.apache.parquet:parquet-avro:1.10.99-cdh6.x-SNAPSHOT
[FATAL] Non-parseable POM /home/quanlong/.m2/repository/org/apache/parquet/parquet/1.10.99-cdh6.x-SNAPSHOT/parquet-1.10.99-cdh6.x-SNAPSHOT.pom: processing instruction started on line 307 and column 14 was not closed (position: START_TAG seen ...<?ignore\n           <execution>... @308:23)  @ /home/quanlong/.m2/repository/org/apache/parquet/parquet/1.10.99-cdh6.x-SNAPSHOT/parquet-1.10.99-cdh6.x-SNAPSHOT.pom, line 308, column 23

這是說pom文件的 line 308, column 23 位置有個沒close的語句塊。人肉檢查了一下感覺語法沒問題,而且這個pom文件跟我正常機器上的pom文件是一樣的。

難道是maven有bug?於是我看了下所用的maven版本,果然不一樣。出問題的環境用的是apache-maven-3.6.1,能正常編譯的環境用的是apache-maven-3.6.2。當我把出問題的環境的maven換成3.6.2時,編譯就成功了!換回來3.6.1,仍是報同樣的錯。因此是maven的bug確認無疑了。看了maven-3.6.2的release note,找到就是這個bug導致的:https://issues.apache.org/jira/browse/MNG-6707

總結

編譯Impala時不要使用 maven 3.6.1,否則在編譯 testdata 模塊時會報前述錯誤。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章