1.Graph的创建

1.根据边和顶点来创建。

def apply[VD: ClassTag, ED: ClassTag](
      vertices: RDD[(VertexId, VD)],
      edges: RDD[Edge[ED]],
      defaultVertexAttr: VD = null.asInstanceOf[VD],
      edgeStorageLevel: StorageLevel = StorageLevel.MEMORY_ONLY,
      vertexStorageLevel: StorageLevel = StorageLevel.MEMORY_ONLY): Graph[VD, ED]

2.根据边来创建。

所有顶点的属性相同，都是VD类型的defaultValue。

def fromEdges[VD: ClassTag, ED: ClassTag](
      edges: RDD[Edge[ED]],
      defaultValue: VD,
      edgeStorageLevel: StorageLevel = StorageLevel.MEMORY_ONLY,
      vertexStorageLevel: StorageLevel = StorageLevel.MEMORY_ONLY): Graph[VD, ED]

3.根据裸边（只有顶点ID）进行创建。

顶点的属性是defaultValue，边的属性为相同顶点边的个数，默认为1。

def fromEdgeTuples[VD: ClassTag](
      rawEdges: RDD[(VertexId, VertexId)],
      defaultValue: VD,
      uniqueEdges: Option[PartitionStrategy] = None,
      edgeStorageLevel: StorageLevel = StorageLevel.MEMORY_ONLY,
      vertexStorageLevel: StorageLevel = StorageLevel.MEMORY_ONLY): Graph[VD, Int]

2.Graph的转换

1.基本信息

numEdges 返回边的数量
numVertices 顶点的个数
inDegrees:VertexRDD[Int] 返回顶点的入度，返回类型为RDD[(VertexId,Int)] Int是入度的具体值。
outDegrees:VertexRDD[Int] 返回顶点的出度，返回类型为RDD[(VertexId,Int)] Int是出度的具体值。
degrees:VertexRDD[Int] 返回顶点的入度与出度之和，返回类型为RDD[(VertexId,Int)] Int是度的具体值。

2.转换操作

def mapVertices[VD2: ClassTag](map: (VertexId, VD) => VD2)
    (implicit eq: VD =:= VD2 = null): Graph[VD2, ED]

对图中的每一个顶点进行map操作，顶点ID不能变，可以将顶点的属性改变成另一种类型。
如：scala> graph.mapVertices((id,attr)=>attr._1+":"+attr._2)

def mapEdges[ED2: ClassTag](map: Edge[ED] => ED2): Graph[VD, ED2]

对图中的每个边进行map操作，边的方向不能改变，可以将边的属性改为lin一种类型。

def mapTriplets[ED2: ClassTag](map: EdgeTriplet[VD, ED] => ED2): Graph[VD, ED2]

对图中的每个三元组进行map操作，只能修改边的属性。

3.结构操作

reverse 反转

def reverse: Graph[VD, ED]

反转整个图，将边的方向调头。
如：

graph.reverse.triplets.map(x=>"["+x.srcId+":"+x.srcAttr+"-->"+x.attr+"-->"+x.dstId+":"+x.dstAttr+"]").collect.foreach(println)

subgrap 获取子图

def subgraph(
      epred: EdgeTriplet[VD, ED] => Boolean = (x => true),
      vpred: (VertexId, VD) => Boolean = ((v, d) => true))
    : Graph[VD, ED]

获取子图。可以通过参数名来指定传参，如果subgraph中有的边没有顶点对应，那么会自动将该边去除。

graph.subgraph(vpred=(id,attr)=>attr._2 == "professor").triplets.map(x=>"["+x.srcId+":"+x.srcAttr+"-->"+x.attr+"-->"+x.dstId+":"+x.dstAttr+"]").collect.foreach(println)

没有边的顶点不会自动被删除。

graph.subgraph((x=>false)).numVertices

mask 求两个图的交集

def mask[VD2: ClassTag, ED2: ClassTag](other: Graph[VD2, ED2]): Graph[VD, ED]

将当前图和other图做交集，返回一个新图。如果other中的属性与原图的属性不同，那么保留原图的属性。

val other =graph.subgraph(vpred=(id,attr)=>attr._2 == "professor").mapVertices((id,attr)=>attr._1 +":"+attr._2)
other.triplets.map(x=>"["+x.srcId+":"+x.srcAttr+"-->"+x.attr+"-->"+x.dstId+":"+x.dstAttr+"]").collect.foreach(println) //输出other
graph.mask(other).triplets.map(x=>"["+x.srcId+":"+x.srcAttr+"-->"+x.attr+"-->"+x.dstId+":"+x.dstAttr+"]").collect.foreach(println)

groupEdges 合并两条边

def groupEdges(merge: (ED, ED) => ED): Graph[VD, ED]

合并两条边，通过函数合并边的属性。【注意：两条边要在一个分区内。】

4.聚合

collectNeighbors

def collectNeighbors(edgeDirection: EdgeDirection): VertexRDD[Array[(VertexId, VD)]]

收集邻居节点的数据，根据指定的方向。返回的数据为RDD[(VertexId,Array[(VertexId,VD)])] 顶点的属性的一个数组。数组中包含邻居节点的顶点。

graph.collectNeighbors(EdgeDirection.In).collect

collectNeighborIds

def collectNeighborIds(edgeDirection: EdgeDirection): VertexRDD[Array[VertexId]]

与上一个方法类似，只收集ID。

graph.collectNeighborIds(EdgeDirection.In).collect

aggregateMessages

def aggregateMessages[A: ClassTag](
      sendMsg: EdgeContext[VD, ED, A] => Unit,
      mergeMsg: (A, A) => A,
      tripletFields: TripletFields = TripletFields.All)
    : VertexRDD[A]

每个边都会通过sendMsg发送一个消息，每个顶点都会通过mergeMsg来处理它收到的消息，tripletFields存在主要用于定制EdgeContext对象中的属性的值是否存在，为了减少数据通信量。

//初始化顶点集合
  val vertexArray = Array(
    (1L, ("Alice", 28)),
    (2L, ("Bob", 27)),
    (3L, ("Charlie", 65)),
    (4L, ("David", 42)),
    (5L, ("Ed", 55)),
    (6L, ("Fran", 50))
  )
  //创建顶点的RDD表示
  val vertexRDD: RDD[(Long, (String, Int))] = sc.parallelize(vertexArray)

  //初始化边的集合
  val edgeArray = Array(
    Edge(2L, 1L, 7),
    Edge(2L, 4L, 2),
    Edge(3L, 2L, 4),
    Edge(3L, 6L, 3),
    Edge(4L, 1L, 1),
    Edge(2L, 5L, 2),
    Edge(5L, 3L, 8),
    Edge(5L, 6L, 3)
  )

  //创建边的RDD表示
  val edgeRDD: RDD[Edge[Int]] = sc.parallelize(edgeArray)

  //创建一个图
  val graph: Graph[(String, Int), Int] = Graph(vertexRDD, edgeRDD)
graph.aggregateMessages[Array[(VertexId, (String, Int))]](ctx => ctx.sendToDst(Array((ctx.srcId.toLong, (ctx.srcAttr._1, ctx.srcAttr._2)))), _ ++ _).collect.foreach(v => {
    println(s"id: ${v._1}"); for (arr <- v._2) {
      println(s"    ${arr._1} (name: ${arr._2._1}  age: ${arr._2._2})")
    }
  })

5.关联操作

joinVertices

def joinVertices[U: ClassTag](table: RDD[(VertexId, U)])(mapFunc: (VertexId, VD, U) => VD): Graph[VD, ED]

将相同顶点ID的数据进行加权，将U这种类型的数据加入到VD这种类型的数据上，但是不能修改VD的类型。可以使用case class 类型，将VD封装为case class，mapFunc对VD进行属性补全。

outerJoinVertices

 def outerJoinVertices[U: ClassTag, VD2: ClassTag](other: RDD[(VertexId, U)])( mapFunc: (VertexId, VD, Option[U]) => VD2)(implicit eq: VD =:= VD2 = null) : Graph[VD2, ED]

和joinVertices类似，只是如果没有相应的节点，那么join的值默认为None。

6.Pregel

前提：

对于节点来说有两种状态：1.钝化态，类似于休眠，不做任何事。2.激活态，干活。
节点能够处于激活态需要有条件：1.节点收到消息或者2.成功发送了任何一条消息。

def pregel[A: ClassTag](
      initialMsg: A,
      maxIterations: Int = Int.MaxValue,
      activeDirection: EdgeDirection = EdgeDirection.Either)(
      vprog: (VertexId, VD, A) => VD,
      sendMsg: EdgeTriplet[VD, ED] => Iterator[(VertexId, A)],
      mergeMsg: (A, A) => A)
    : Graph[VD, ED]

initialMsg 图初试化的时候，开始模型计算的时候，所有节点都会收到一个消息，所有节点都是active的。
maxIterations 最大迭代次数。
activeDirection 规定了发送消息的方向。
柯里化：
vprog 激活态且具有activeDirection的节点调用该消息将聚合后的数据和本节点进行属性的合并。
sendMsg 激活态的节点调用该方法发送消息。
mergeMsg 如果一个节点接收到多个消息，先用mergeMsg来将多条消息聚合成一条消息。如果节点只收到一条消息，则不会调用该函数。
实例：求节点5到各个节点的最短距离。

代码实现：

package com.dengdan.practice

import org.apache.log4j.{Level, Logger}
import org.apache.spark.graphx.{Edge, _}
import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}

object Practice extends App {
  //屏蔽日志
  Logger.getLogger("org.apache.spark").setLevel(Level.ERROR)
  Logger.getLogger("org.eclipse.jetty.server").setLevel(Level.OFF)

  //设定一个SparkConf
  val conf = new SparkConf().setAppName("simpleGraphx").setMaster("local[*]")
  val sc = new SparkContext(conf)

  //初始化顶点集合
  val vertexArray = Array(
    (1L, ("Alice", 28)),
    (2L, ("Bob", 27)),
    (3L, ("Charlie", 65)),
    (4L, ("David", 42)),
    (5L, ("Ed", 55)),
    (6L, ("Fran", 50))
  )
  //创建顶点RDD表示
  val vertexRDD: RDD[(Long, (String, Int))] = sc.parallelize(vertexArray)
  //初始化边表示
  val edgeArray = Array(
    Edge(2L, 1L, 7),
    Edge(2L, 4L, 2),
    Edge(3L, 2L, 4),
    Edge(3L, 6L, 3),
    Edge(4L, 1L, 1),
    Edge(2L, 5L, 2),
    Edge(5L, 3L, 8),
    Edge(5L, 6L, 3)
  )
  //创建边RDD表示
  val edgeRDD: RDD[Edge[Int]] = sc.parallelize(edgeArray)

  //创建图
  val graph: Graph[(String, Int), Int] = Graph(vertexRDD, edgeRDD)
  //***************************  实用操作    ****************************************
  println("聚合操作")
  println("**********************************************************")
  val sourceId: VertexId = 5L //定义源点
  val initialGraph = graph.mapVertices((id, _) => if (id == sourceId) 0.0 else Double.PositiveInfinity)

  initialGraph.triplets.collect() foreach (println)

  println("找出5到各顶点的最短距离")
  val sssp = initialGraph.pregel(Double.PositiveInfinity, Int.MaxValue, EdgeDirection.Out)(
    //id为满足条件的节点的编号，dist是该节点的属性值，newDist为收到消息后，新的属性
    (id, dist, newDist) => {
      println("||||" + id + "  收到消息")
      math.min(dist, newDist)
    },
    triplet => {
      println(">>>>" + triplet.srcId + "  发送消息")
      //源节点的属性 + 边的属性值 <目标节点的属性
      if (triplet.srcAttr + triplet.attr < triplet.dstAttr) {
        //发送成功
        // Interator 表示发送，发送给dstId，发送的内容为 triplet.srcAttr + triplet.attr
        Iterator((triplet.dstId, triplet.srcAttr + triplet.attr))
      } else {
        //发送失败
        Iterator.empty
      }
    },
    (a, b) => {
      println("$$$$$")
      math.min(a, b)
    } //当前节点所有输入的最短距离
  )
  println("--------------图信息------------------")
  sssp.triplets.collect().foreach(println)

  println("-------------节点5到各个节点的最短距离---")
  println(sssp.vertices.collect.mkString("\n"))

  sc.stop()
}

运行结果：

聚合操作
**********************************************************
((2,Infinity),(1,Infinity),7)
((2,Infinity),(4,Infinity),2)
((3,Infinity),(2,Infinity),4)
((3,Infinity),(6,Infinity),3)
((4,Infinity),(1,Infinity),1)
((2,Infinity),(5,0.0),2)
((5,0.0),(3,Infinity),8)
((5,0.0),(6,Infinity),3)
找出5到各顶点的最短距离
# 初始化时
||||4  收到消息
||||6  收到消息
||||5  收到消息
||||2  收到消息
||||3  收到消息
||||1  收到消息
>>>>4  发送消息
>>>>2  发送消息
>>>>2  发送消息
>>>>5  发送消息
>>>>3  发送消息
>>>>2  发送消息
>>>>3  发送消息
>>>>5  发送消息
# 第一轮迭代
||||3  收到消息
||||6  收到消息
>>>>3  发送消息
>>>>3  发送消息
# 第二轮迭代
||||2  收到消息
>>>>2  发送消息
>>>>2  发送消息
>>>>2  发送消息
# 第三轮迭代
||||1  收到消息
||||4  收到消息
>>>>4  发送消息
# 第四轮迭代
||||1  收到消息
# 迭代结束
--------------图信息------------------
((2,12.0),(1,15.0),7)
((2,12.0),(4,14.0),2)
((3,8.0),(2,12.0),4)
((3,8.0),(6,3.0),3)
((4,14.0),(1,15.0),1)
((2,12.0),(5,0.0),2)
((5,0.0),(3,8.0),8)
((5,0.0),(6,3.0),3)
-------------节点5到各个节点的最短距离---
(1,15.0)
(2,12.0)
(3,8.0)
(4,14.0)
(5,0.0)
(6,3.0)

spark-20.sparkGraphx_2_图的转换

1.Graph的创建

1.根据边和顶点来创建。

2.根据边来创建。

3.根据裸边（只有顶点ID）进行创建。

2.Graph的转换

1.基本信息

2.转换操作

3.结构操作

4.聚合

collectNeighbors

collectNeighborIds

aggregateMessages

5.关联操作

joinVertices

outerJoinVertices

6.Pregel

前提：

代码实现：

运行结果：

钉钉打卡速度慢

使用neovim打造go ide(支持代码跳转, 代码补全, 实时语法检查)

Nginx R31 doc 官方文档-01-nginx 如何安装

Python 潮流周刊#51：用 Python 绘制美观的图表

Qt/C++音视频开发74-合并标签图形/生成yolo运算结果图形/文字和图形合并成一个/水印滤镜

挑战程序设计竞赛 2.2章习题 POJ - 3617 Best Cow Line 贪心

字节面试：MySQL什么时候锁表？如何防止锁表？

.NET8连接SQL SERVER 2008 R2 报：证书链是由不受信任的颁发机构颁发的

golang开发环境搭建(win10)

python计算机视觉学习笔记——PIL库的用法

Elasticsearch-javaAPI

Elasticsearch-IK分詞器

spark-33.spark機器學習_6_決策樹

spark-22.spark內核解析_2_Spark的腳本

spark-20.sparkGraphx_2_圖的轉換

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結