MaxCompute 圖計算用戶手冊(下)

示例程序

強連通分量
在有向圖中,如果從任意一個頂點出發,都能通過圖中的邊到達圖中的每一個頂點,則稱之爲強連通圖。一張有向圖的頂點數極大的強連通子圖稱爲強連通分量。此算法示例基於 parallel Coloring algorithm。
每個頂點包含兩個部分,如下所示:
colorID:在向前遍歷過程中存儲頂點 v 的顏色,在計算結束時,具有相同 colorID 的頂點屬於一個強連通分量。

transposeNeighbors:存儲輸入圖的轉置圖中頂點 v 的鄰居 ID。

算法包含以下四部分:
生成轉置圖:包含兩個超步,首先每個頂點發送 ID 到其出邊對應的鄰居,這些 ID 在第二個超步中會存爲 transposeNeighbors 值。

修剪:一個超步,每個只有一個入邊或出邊的頂點,將其 colorID 設爲自身 ID,狀態設爲不活躍,後面傳給該頂點的信號被忽略。

向前遍歷:頂點包括兩個子過程(超步),啓動和休眠。在啓動階段,每個頂點將其 colorID 設置爲自身 ID,同時將其 ID 傳給出邊對應的鄰居。休眠階段,頂點使用其收到的最大 colorID 更新自身 colorID,並傳播其 colorID,直到 colorID 收斂。當 colorID 收斂,master 進程將全局對象設置爲向後遍歷。

向後遍歷:同樣包含兩個子過程,啓動和休眠。啓動階段,每一個 ID 等於 colorID 的頂點將其 ID 傳遞給其轉置圖鄰居頂點,同時將自身狀態設置爲不活躍,後面傳給該頂點的信號可忽略。在每一個休眠步,每個頂點接收到與其 colorID 匹配的信號,並將其 colorID 在轉置圖中傳播,隨後設置自身狀態爲不活躍。該步結束後如果仍有活躍頂點,則回到修剪步。

代碼示例
強連通分量的代碼,如下所示:

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import com.aliyun.odps.data.TableInfo;
import com.aliyun.odps.graph.Aggregator;
import com.aliyun.odps.graph.ComputeContext;
import com.aliyun.odps.graph.GraphJob;
import com.aliyun.odps.graph.GraphLoader;
import com.aliyun.odps.graph.MutationContext;
import com.aliyun.odps.graph.Vertex;
import com.aliyun.odps.graph.WorkerContext;
import com.aliyun.odps.io.BooleanWritable;
import com.aliyun.odps.io.IntWritable;
import com.aliyun.odps.io.LongWritable;
import com.aliyun.odps.io.NullWritable;
import com.aliyun.odps.io.Tuple;
import com.aliyun.odps.io.Writable;
import com.aliyun.odps.io.WritableRecord;
/**

連通分量
兩個頂點之間存在路徑,稱兩個頂點爲連通的。如果無向圖 G 中任意兩個頂點都是連通的,則稱 G 爲連通圖,否則稱爲非連通圖。其頂點個數極大的連通子圖稱爲連通分量。
本算法計算每個點的連通分量成員,最後輸出頂點值中包含最小頂點 ID 的連通分量。將最小頂點 ID 沿着邊傳播到連通分量的所有頂點。

代碼示例
連通分量的代碼,如下所示:

import java.io.IOException;
import com.aliyun.odps.data.TableInfo;
import com.aliyun.odps.graph.ComputeContext;
import com.aliyun.odps.graph.GraphJob;
import com.aliyun.odps.graph.GraphLoader;
import com.aliyun.odps.graph.MutationContext;
import com.aliyun.odps.graph.Vertex;
import com.aliyun.odps.graph.WorkerContext;
import com.aliyun.odps.graph.examples.SSSP.MinLongCombiner;
import com.aliyun.odps.io.LongWritable;
import com.aliyun.odps.io.NullWritable;
import com.aliyun.odps.io.WritableRecord;
/**

拓撲排序
對於有向邊(u,v),定義所有滿足 u算法步驟如下:

從圖中找到一個沒有入邊的頂點,並輸出。
從圖中刪除該點,及其所有出邊。
重複以上步驟,直到所有點都已輸出。

[]()代碼示例
拓撲排序算法的代碼,如下所示:

import java.io.IOException;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import com.aliyun.odps.data.TableInfo;
import com.aliyun.odps.graph.Aggregator;
import com.aliyun.odps.graph.Combiner;
import com.aliyun.odps.graph.ComputeContext;
import com.aliyun.odps.graph.GraphJob;
import com.aliyun.odps.graph.GraphLoader;
import com.aliyun.odps.graph.MutationContext;
import com.aliyun.odps.graph.Vertex;
import com.aliyun.odps.graph.WorkerContext;
import com.aliyun.odps.io.LongWritable;
import com.aliyun.odps.io.NullWritable;
import com.aliyun.odps.io.BooleanWritable;
import com.aliyun.odps.io.WritableRecord;
public class TopologySort {
private final static Log LOG = LogFactory.getLog(TopologySort.class);
public static class TopologySortVertex extends
Vertex<LongWritable, LongWritable, NullWritable, LongWritable> {br/>@Override
public void compute(
ComputeContext<LongWritable, LongWritable, NullWritable, LongWritable> context,
Iterable<LongWritable> messages) throws IOException {
// in superstep 0, each vertex sends message whose value is 1 to its
// neighbors
if (context.getSuperstep() == 0) {
if (hasEdges()) {
context.sendMessageToNeighbors(this, new LongWritable(1L));
}
} else if (context.getSuperstep() >= 1) {
// compute each vertex's indegree
long indegree = getValue().get();
for (LongWritable msg : messages) {
indegree += msg.get();
}
setValue(new LongWritable(indegree));
if (indegree == 0) {
voteToHalt();
if (hasEdges()) {
context.sendMessageToNeighbors(this, new LongWritable(-1L));
}
context.write(new LongWritable(context.getSuperstep()), getId());
LOG.info("vertex: " + getId());
}
context.aggregate(new LongWritable(indegree));
}
}
}
public static class TopologySortVertexReader extends
GraphLoader<LongWritable, LongWritable, NullWritable, LongWritable> {
br/>@Override
public void load(
LongWritable recordNum,
WritableRecord record,
MutationContext<LongWritable, LongWritable, NullWritable, LongWritable> context)
throws IOException {
TopologySortVertex vertex = new TopologySortVertex();
vertex.setId((LongWritable) record.get(0));
vertex.setValue(new LongWritable(0));
String[] edges = record.get(1).toString().split(",");
for (int i = 0; i < edges.length; i++) {
long edge = Long.parseLong(edges[i]);
if (edge >= 0) {
vertex.addEdge(new LongWritable(Long.parseLong(edges[i])),
NullWritable.get());
}
}
LOG.info(record.toString());
context.addVertexRequest(vertex);
}
}
public static class LongSumCombiner extends
Combiner<LongWritable, LongWritable> {
br/>@Override
public void combine(LongWritable vertexId, LongWritable combinedMessage,
LongWritable messageToCombine) throws IOException {
combinedMessage.set(combinedMessage.get() + messageToCombine.get());
}
}
public static class TopologySortAggregator extends
Aggregator<BooleanWritable> {
br/>@SuppressWarnings("rawtypes")
@Override
public BooleanWritable createInitialValue(WorkerContext context)
throws IOException {
return new BooleanWritable(true);
br/>}
@Override
public void aggregate(BooleanWritable value, Object item)
throws IOException {
boolean hasCycle = value.get();
boolean inDegreeNotZero = ((LongWritable) item).get() == 0 ? false : true;
value.set(hasCycle && inDegreeNotZero);
br/>}
@Override
public void merge(BooleanWritable value, BooleanWritable partial)
throws IOException {
value.set(value.get() && partial.get());
br/>}
@SuppressWarnings("rawtypes")
@Override
public boolean terminate(WorkerContext context, BooleanWritable value)
throws IOException {
if (context.getSuperstep() == 0) {
// since the initial aggregator value is true, and in superstep we don't
// do aggregate
return false;
}
return value.get();
}
}
public static void main(String[] args) throws IOException {
if (args.length != 2) {
System.out.println("Usage : <inputTable> <outputTable>");
System.exit(-1);
}
// 輸入表形式爲
// 0 1,2
// 1 3
// 2 3
// 3 -1
// 第一列爲vertexid,第二列爲該點邊的destination vertexid,若值爲-1,表示該點無出邊
// 輸出表形式爲
// 0 0
// 1 1
// 1 2
// 2 3
// 第一列爲supstep值,隱含了拓撲順序,第二列爲vertexid
// TopologySortAggregator用來判斷圖中是否有環
// 若輸入的圖有環,則當圖中active的點入度都不爲0時,迭代結束
// 用戶可以通過輸入表和輸出表的記錄數來判斷一個有向圖是否有環
GraphJob job = new GraphJob();
job.setGraphLoaderClass(TopologySortVertexReader.class);
job.setVertexClass(TopologySortVertex.class);
job.addInput(TableInfo.builder().tableName(args[0]).build());
job.addOutput(TableInfo.builder().tableName(args[1]).build());
job.setCombinerClass(LongSumCombiner.class);
job.setAggregatorClass(TopologySortAggregator.class);
long startTime = System.currentTimeMillis();
job.run();
System.out.println("Job Finished in "

  • (System.currentTimeMillis() - startTime) / 1000.0 + " seconds");
    }
    }

線性迴歸
在統計學中,線性迴歸是用來確定兩種或兩種以上變量間的相互依賴關係的統計分析方法,與分類算法處理離散預測不同。
迴歸算法可對連續值類型進行預測。線性迴歸算法定義損失函數爲樣本集的最小平方誤差之和,通過最小化損失函數求解權重矢量。
常用的解法是梯度下降法,流程如下:

初始化權重矢量,給定下降速率以及迭代次數(或者迭代收斂條件)。
對每個樣本,計算最小平方誤差。
對最小平方誤差求和,根據下降速率更新權重。
重複迭代直到收斂。

[]()代碼示例
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import com.aliyun.odps.data.TableInfo;
import com.aliyun.odps.graph.Aggregator;
import com.aliyun.odps.graph.ComputeContext;
import com.aliyun.odps.graph.GraphJob;
import com.aliyun.odps.graph.MutationContext;
import com.aliyun.odps.graph.WorkerContext;
import com.aliyun.odps.graph.Vertex;
import com.aliyun.odps.graph.GraphLoader;
import com.aliyun.odps.io.DoubleWritable;
import com.aliyun.odps.io.LongWritable;
import com.aliyun.odps.io.NullWritable;
import com.aliyun.odps.io.Tuple;
import com.aliyun.odps.io.Writable;
import com.aliyun.odps.io.WritableRecord;
/**

三角形計數
三角形計數算法用於計算通過每個頂點的三角形個數。
算法實現的流程如下:

每個頂點將其 ID 發送給所有出邊鄰居。
存儲入邊和出邊鄰居併發送給出邊鄰居。
對每條邊計算其終點的交集數量,並求和,結果輸出到表。
將表中的輸出結果求和併除以三,即得到三角形個數。

[]()代碼示例
三角形計數算法的代碼,如下所示:

import java.io.IOException;
import com.aliyun.odps.data.TableInfo;
import com.aliyun.odps.graph.ComputeContext;
import com.aliyun.odps.graph.Edge;
import com.aliyun.odps.graph.GraphJob;
import com.aliyun.odps.graph.GraphLoader;
import com.aliyun.odps.graph.MutationContext;
import com.aliyun.odps.graph.Vertex;
import com.aliyun.odps.graph.WorkerContext;
import com.aliyun.odps.io.LongWritable;
import com.aliyun.odps.io.NullWritable;
import com.aliyun.odps.io.Tuple;
import com.aliyun.odps.io.Writable;
import com.aliyun.odps.io.WritableRecord;
/**

輸入點表示例
輸入點表的代碼,如下所示:

import java.io.IOException;
import com.aliyun.odps.conf.Configuration;
import com.aliyun.odps.data.TableInfo;
import com.aliyun.odps.graph.ComputeContext;
import com.aliyun.odps.graph.GraphJob;
import com.aliyun.odps.graph.GraphLoader;
import com.aliyun.odps.graph.Vertex;
import com.aliyun.odps.graph.VertexResolver;
import com.aliyun.odps.graph.MutationContext;
import com.aliyun.odps.graph.VertexChanges;
import com.aliyun.odps.graph.Edge;
import com.aliyun.odps.io.LongWritable;
import com.aliyun.odps.io.WritableComparable;
import com.aliyun.odps.io.WritableRecord;
/**

輸入邊表示例
輸入邊表的代碼,如下所示:

import java.io.IOException;
import com.aliyun.odps.conf.Configuration;
import com.aliyun.odps.data.TableInfo;
import com.aliyun.odps.graph.ComputeContext;
import com.aliyun.odps.graph.GraphJob;
import com.aliyun.odps.graph.GraphLoader;
import com.aliyun.odps.graph.Vertex;
import com.aliyun.odps.graph.VertexResolver;
import com.aliyun.odps.graph.MutationContext;
import com.aliyun.odps.graph.VertexChanges;
import com.aliyun.odps.graph.Edge;
import com.aliyun.odps.io.LongWritable;
import com.aliyun.odps.io.WritableComparable;
import com.aliyun.odps.io.WritableRecord;
/**

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章