在對一個dataframe的多個列實現應用同一個函數時,是否能動態的指定?
例如:
對A,B,C三列實現分組統計
1.初始化spark,構建DF
val spark = SparkSession.builder() .appName("name") .master("local[2]") .getOrCreate() val df = spark.read.json("src\\main\\resources\\json.txt")
2.靜態實現
val newDF = df .withColumn("cumA", sum("A").over(Window.partitionBy("ID").orderBy("time"))) .withColumn("cumB", sum("B").over(Window.partitionBy("ID").orderBy("time"))) .withColumn("cumC", sum("C").over(Window.partitionBy("ID").orderBy("time")))
3. 動態實現
3.1 方法一:select 實現
import spark.implicits._ df.select($"*" +: Seq("A", "B", "C").map(c => sum(c).over(Window.partitionBy("ID").orderBy("time")).alias(s"cum$c")