RDD應用實例之詞頻分析

一、數據

It hurts to love someone and not be loved in return. But what is more painful is to love someone and never find the courage to let that person know how you feel.
A sad thing in life is when you meet someone who means a lot to you,only to find out in the end that it was never meant to be and you just have to let go.
It's true that we don't know what we've got until we lose it, but it's also true that we don't know what we've been losing until it arrives.
Dream what you want to dream;go where you want to go;be what you want to be,because you have only one life and one chance to do all the things you want to do.

二、需求

需求1:找出每行最多單詞數
需求2:詞頻統計,並排序

三、代碼

  • 需求1
    val rdd=sc.textFile("E:\\data\\spark\\rdd\\test\\read\\word.log")
    val maxLineWordNum=rdd.map(line=>line.split(" ").length).reduce((a,b)=> if (a>b) a else b)
    println(maxLineWordNum)
  • 需求2
    val rdd=sc.textFile("E:\\data\\spark\\rdd\\test\\read\\word.log")
    val flatMapRdd=rdd.flatMap(line=>line.replace("."," ").replace(";"," ").split(" ")).map(word=>(qord,1)).reduceByKey(_+_)
    flatMapRdd.foreach(println)

	//排序版
	val rdd=sc.textFile("E:\\data\\spark\\rdd\\test\\read\\word.log")
    val flatMapRdd=rdd.flatMap(line=>line.replace("."," ").replace(";"," ")
                      .split(" "))
                      .map(x=>(x,1))
                      .reduceByKey(_+_)
                      .map(x=>(x._2,x._1))
                      .sortByKey()
                      .map(x=>(x._2,x._1))
    flatMapRdd.foreach(println)
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章