pivotMaxValues報錯

1、出現錯誤的操作

    在列轉行且用指定的列的值填充時報錯,且列轉行的字段個數超過10000個;

2、具體錯誤

Exception in thread "main" org.apache.spark.sql.AnalysisException: The pivot column field_name has more than 10000 distinct values, this cou
ld indicate an error. If this was intended, set spark.sql.pivotMaxValues to at least the number of distinct values of the pivot column.;
        at org.apache.spark.sql.RelationalGroupedDataset.pivot(RelationalGroupedDataset.scala:327)
        at com.rong360.featureAnalyse.FeatherAnalyseOnlineStep.getPivotData(FeatherAnalyseOnlineStep.java:564)
        at com.rong360.featureAnalyse.FeatherAnalyseOnlineStep.main(FeatherAnalyseOnlineStep.java:73)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:736)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

3、解決方法

SparkSession spark = SparkSession.builder()
				.config( "spark.driver.maxResultSize","40g")
				.config("spark.sql.pivotMaxValues", 20000)
				.appName("Features analyse step")
				.enableHiveSupport()
				.getOrCreate();
當然也可以在配置文件中加;
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章