pyspark一些錯誤

原創

2020-04-20 14:30

1、在特徵列還未整合成一個"features"時，Assembler纔是將特徵列組合的，而不是用Stringindexer

出錯語句：

indexer2 = StringIndexer(inputCol=new_columns_names[1:], outputCol='features')

報錯：

typeError: Invalid param value given for param "inputCol". Could not convert <class 'list'> to string type

2、當你的所有特徵列數據都是連續值的時候，不要用Stringindexer或者VectorIndexer，只需要VectorAssembler將所有特徵列合併組成outputCol——"features"的列即可

出錯語句：

# 下面的都是不需要的
new_columns_names = data.columns
new_columns_names = [name + '-new' for name in old_columns_names]
for i in range(len(old_columns_names)):
    indexer = StringIndexer(inputCol=old_columns_names[i], outputCol=new_columns_names[i])
    # 或是indexer = VectorIndexer(inputCol=old_columns_names[i], outputCol=new_columns_names[i], maxCategories=5)
    data = indexer.fit(data).transform(data)

報錯：（出現下面的報錯也可能真的是你的maxBins設置小了，需要設大一點）

'requirement failed: DecisionTree requires maxBins (= 100) to be at least as large as the number of values in each categorical feature, but categorical feature 18 has 7815 values. Considering remove this and other categorical features with a large number of values, or add more training examples.'
'要求失敗：DecisionTree要求maxBins（= 100）至少與每個分類要素中的值數量一樣大，但分類要素18具有7815個值。考慮刪除具有大量值的此分類功能和其他分類功能，或添加更多訓練示例。'

3、一定要注意在spark ML中，需要將label列放在第0個，組合後的features列在第1個

報錯：

'requirement failed: Classifier found max label value = 1277.4 but requires integers in range [0, ... 2147483647)'

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

pyspark一些錯誤

1、在特徵列還未整合成一個"features"時，Assembler纔是將特徵列組合的，而不是用Stringindexer

2、當你的所有特徵列數據都是連續值的時候，不要用Stringindexer或者VectorIndexer，只需要VectorAssembler將所有特徵列合併組成outputCol——"features"的列即可

3、一定要注意在spark ML中，需要將label列放在第0個，組合後的features列在第1個

python使用xlrd和xlwt模塊對Excel文件讀寫（實例：將點座標轉爲無向圖距離）

matlab與python的交互

hdu2023求平均成績杭電OJ Compilation error

分別用numpy和pandas劃分數據集以完成交叉驗證

進程同步水果問題

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結