spark graphx 圖計算demo,結果展現

spark graphx 圖計算官網實例練習:
http://spark.apache.org/docs/latest/graphx-programming-guide.html




import org.apache.spark._
import org.apache.spark.graphx._
// To make some of the examples work we will also need RDD
import org.apache.spark.rdd.RDD


import org.apache.spark.graphx.GraphLoader


val graph = GraphLoader.edgeListFile(sc, "/data/graphx/followers.txt")
val temp = graph.mapVertices((id,attr) => attr.toInt  * 2)
temp.vertices.take(10)


或者
val temp : Graph[Int,Int] =  graph.mapVertices((_,attr) => attr  * 2)
temp.vertices.take(10)
//140M的數據:
val graph = GraphLoader.edgeListFile(sc, "hdfs://server1:9000/data/graphx/followers.txt",numEdgePartitions=4)
graph.vertices.count
graph.edges.count


------properties  operators---end====================================
structural operators-----start


把所有的邊的方向對調一下!!!!reverse
subgraph-----對邊或者頂點帥選
val subGraph = graph.subgraph(epred = e =>e.srcId > e.dstId)
val subGraph = graph.subgraph(epred = e =>e.srcId > e.dstId,vpred=(id,_) => id>500000)


mask---合併一個子集
groupEdges----多個邊合併
structural operators-----end=============
computing degree-----start=============
val tmp = graph.degrees
val temp = graph.inDegrees//(作爲目標節點的數量!)
temp.take(10)
val temp = graph.outDegrees//(作爲原節點的數量!)
temp.take(10)


val tmp = graph.degrees
def max(a : (VertexId,Int),b: (VertexId,Int) ): (VertexId,Int) = if (a._2 > b._2)  a else b
def max(a : {VertexId,Int},b: {VertexId,Int)}): (VertexId,Int) = if (a._2 > b._2)  a else b


graph.degrees.reduce(max)
//業務含義:哪個完整的導航頁數最多: 500個,說明可能是123的導航網站
computing degree-----end========================


collectint Neighbors-------start=============
def collectNeighbors(edgeDirection: EdgeDirection): VertexRDD[Array[(VertexId, VD)]] = {
def collectNeighborIds(edgeDirection: EdgeDirection): VertexRDD[Array[VertexId]] = {
collectint Neighbors-------end=============


join operators-------start=============
def joinVertices[U: ClassTag](table: RDD[(VertexId, U)])(mapFunc: (VertexId, VD, U) => VD)
    : Graph[VD, ED] = {
  def outerJoinVertices[U: ClassTag, VD2: ClassTag](other: RDD[(VertexId, U)])
      (mapFunc: (VertexId, VD, Option[U]) => VD2)(implicit eq: VD =:= VD2 = null)
    : Graph[VD2, ED]
val rawGraph = graph.mapVertices((id,attr) => 0)//頂點設置爲0
val outDeg = rawGraph.outDegrees
val tmp = rawGraph.joinVertices[Int](outDeg){(_,_,optDeg) => optDeg}
val tmp = rawGraph.outerJoinVertices[int,Int](outDeg){_,_,optDeg => optDeg.getOrElse(0)}
join operators-------end=============


地圖----挖掘工具!
map reduce triplets-------start=============
圖上做mr操作


http://spark.apache.org/docs/latest/graphx-programming-guide.html
import org.apache.spark.graphx.util.GraphGenerators


// Create a graph with "age" as the vertex property.
// Here we use a random graph for simplicity.
val graph: Graph[Double, Int] =
  GraphGenerators.logNormalGraph(sc, numVertices = 100).mapVertices( (id, _) => id.toDouble )
// Compute the number of older followers and their total age
val olderFollowers: VertexRDD[(Int, Double)] = graph.aggregateMessages[(Int, Double)](
  triplet => { // Map Function
    if (triplet.srcAttr > triplet.dstAttr) {
      // Send message to destination vertex containing counter and age
      triplet.sendToDst(1, triplet.srcAttr)
    }
  },
  // Add counter and age
  (a, b) => (a._1 + b._1, a._2 + b._2) // Reduce Function
)
// Divide total age by number of older followers to get average age of older followers
val avgAgeOfOlderFollowers: VertexRDD[Double] =
  olderFollowers.mapValues( (id, value) =>
    value match { case (count, totalAge) => totalAge / count } )
// Display the results
avgAgeOfOlderFollowers.collect.foreach(println(_))


map reduce triplets-------end=============


pregel api-------start=============
爲何graphx需要提供調用pregel api操作?
爲了讓大家更方便處理一些迭代操作!graphx需手動cache---控制比較難:點邊分別cache;
會自動的處理cache!!!!
  def pregel[A: ClassTag](
      initialMsg: A,
      maxIterations: Int = Int.MaxValue,
      activeDirection: EdgeDirection = EdgeDirection.Either)(
      vprog: (VertexId, VD, A) => VD,
      sendMsg: EdgeTriplet[VD, ED] => Iterator[(VertexId, A)],
      mergeMsg: (A, A) => A)
    : Graph[VD, ED] = {
    Pregel(graph, initialMsg, maxIterations, activeDirection)(vprog, sendMsg, mergeMsg)
  }
  採用一步處理方式,適用於迭代多的場景
pregel api--------end=============


graphX設計--------start=============
edge cut 爲了分佈式
vertex cut 爲了分佈式 這個!!!!


partitionStrategy 類!4種!!!方式
graph 實用的是vertex cut 所以一個頂點可能在多個partition上,而一條邊只會在一個partition!!!
graphX設計--------end=============
pagerank   triangleCount--------start=============
pagerank不講了,最簡單的!
val ranks = graph.pageRank(0.01).vertices //0.01越小越精確,數據量大要設置大一點。。直接實現好了
ranks.take(10).mkString("\n")
很方便實用:社交網絡的推薦。。。。0.01不算太大!  排序!!!!!
triangleCount : 關係緊密:三角形個數!!!!srcid < desid
val graph = GraphLoader.edgeListFile(sc, "hdfs://server1:9000/data/graphx/followers.txt",true)
val c = graph.triangleCount().vertices
c.take(10)




pagerank   triangleCount--------end=============
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章