spark 行轉列

StructType

注意這種方案解決的是形如下面myScore這樣的擴展
數據是json格式

      /*
      root
       |-- age: long (nullable = true)
       |-- myScore: array (nullable = true)
       |    |-- element: struct (containsNull = true)
       |    |    |-- score1: long (nullable = true)
       |    |    |-- score2: long (nullable = true)
       |-- name: string (nullable = true)
       參考樣例:{"name":"Michael", "age":25,"myScore":[{"score1":19,"score2":23},{"score1":58,"score2":50}]}
            {"name":"Andy", "age":30,"myScore":[{"score1":29,"score2":33},{"score1":38,"score2":52},{"score1":88,"score2":71}]}
            {"name":"Justin", "age":19,"myScore":[{"score1":39,"score2":43},{"score1":28,"score2":53}]}
       */
      def explodeDataFrameStruct(df: DataFrame, explodeCol: String, nextLevelCols: Array[String]) = {
            var dfScore = df.withColumn(explodeCol, explode(df(explodeCol)))
            for (str <- nextLevelCols) {
                  dfScore = dfScore.withColumn(str, dfScore(explodeCol + "." + str))
            }
            println(dfScore.show(2))
            dfScore

      }

String

對string類型的數據進行正則表達式匹配,然後利用explode函數進行對應操作

 def explodeDataFrameString(df: DataFrame, explodeCol: String,splitRule:Array[String],nextLevelCols: Array[String])={
            /*
            數據形如:
            val dft = Seq((1, "scene_id1,scene_name1;scene_id2,scene_name2","michal"),
                           (2, "scene_id1,scene_name1;scene_id2,scene_name2;scene_id3,scene_name3","john"),
                           (3, "scene_id4,scene_name4;scene_id2,scene_name2","mary"),
                           (4, "scene_id6,scene_name6;scene_id5,scene_name5","lily")
                            ).toDF("id", "int_id","name");
             */
            var dt=df.withColumn(explodeCol, explode(split(col(explodeCol), splitRule(0))))
            //var ind=0
            for(i<-0 until nextLevelCols.length) {
                   dt = dt.withColumn(nextLevelCols(i), split(col(explodeCol), splitRule(1))(i))

            }
            dt
      }
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章