向量裝配VectorAssembler:對每一行,將多個列的元素組成一個向量
笛卡爾積Interaction:這個也不知道怎麼翻譯好,先對集合做笛卡爾積,然後對每個元組結果做累乘,得到一個元素爲向量的列
from pyspark.ml.feature import Interaction, VectorAssembler
from pyspark.sql import SparkSession
spark = SparkSession\
.builder\
.appName("InteractionExample")\
.getOrCreate()
df = spark.createDataFrame(
[(1, 1, 2, 3, 8, 4, 5),
(2, 4, 3, 8, 7, 9, 8),
(3, 6, 1, 9, 2, 3, 6),
(4, 10, 8, 6, 9, 4, 5),
(5, 9, 2, 7, 10, 7, 3),
(6, 1, 1, 4, 2, 8, 4)],
["id1", "id2", "id3", "id4", "id5", "id6", "id7"])
assembler1 = VectorAssembler(inputCols=["id2", "id3", "id4"], outputCol="vec1")
assembled1 = assembler1.transform(df)# 將["id2", "id3", "id4"]裝配爲一個元素爲向量的列
assembler2 = VectorAssembler(inputCols=["id5", "id6", "id7"], outputCol="vec2")
assembled2 = assembler2.transform(assembled1).select("id1", "vec1", "vec2")
# 對["id1", "vec1", "vec2"]求笛卡爾積後,每個元組內的元素累乘,得到一個元素爲向量的列
interaction = Interaction(inputCols=["id1", "vec1", "vec2"], outputCol="interactedCol")
interacted = interaction.transform(assembled2)
interacted.show(truncate=False)