pyspark向量裝配與笛卡爾積

向量裝配VectorAssembler:對每一行,將多個列的元素組成一個向量
笛卡爾積Interaction:這個也不知道怎麼翻譯好,先對集合做笛卡爾積,然後對每個元組結果做累乘,得到一個元素爲向量的列

from pyspark.ml.feature import Interaction, VectorAssembler
from pyspark.sql import SparkSession

spark = SparkSession\
    .builder\
    .appName("InteractionExample")\
    .getOrCreate()

df = spark.createDataFrame(
    [(1, 1, 2, 3, 8, 4, 5),
     (2, 4, 3, 8, 7, 9, 8),
     (3, 6, 1, 9, 2, 3, 6),
     (4, 10, 8, 6, 9, 4, 5),
     (5, 9, 2, 7, 10, 7, 3),
     (6, 1, 1, 4, 2, 8, 4)],
    ["id1", "id2", "id3", "id4", "id5", "id6", "id7"])

assembler1 = VectorAssembler(inputCols=["id2", "id3", "id4"], outputCol="vec1")
assembled1 = assembler1.transform(df)# 將["id2", "id3", "id4"]裝配爲一個元素爲向量的列
assembler2 = VectorAssembler(inputCols=["id5", "id6", "id7"], outputCol="vec2")
assembled2 = assembler2.transform(assembled1).select("id1", "vec1", "vec2")
# 對["id1", "vec1", "vec2"]求笛卡爾積後,每個元組內的元素累乘,得到一個元素爲向量的列
interaction = Interaction(inputCols=["id1", "vec1", "vec2"], outputCol="interactedCol")
interacted = interaction.transform(assembled2)
interacted.show(truncate=False)
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章