GCN\GAT研究概述

文章目錄

數據集

官方數據集cora

The Cora dataset consists of Machine Learning papers.
These papers are classified into one of the following seven classes:
- Case_Based
- Genetic_Algorithms
- Neural_Networks
- Probabilistic_Methods
- Reinforcement_Learning
- Rule_Learning
- Theory
The papers were selected in a way such that in the final corpus every paper cites or is cited by atleast one other paper.
There are 2708 papers in the whole corpus. => 該數據集共2708個樣本點
Vocabulary: After stemming and removing stopwords we were left with a vocabulary of size 1433 unique words (All words with document frequency less than 10 were removed).

THE DIRECTORY CONTAINS TWO FILES

.content file: contains descriptions of the papers in the following format:
```
  <paper_id> <word_attributes>+ <class_label>
```
- The first entry in each line contains the unique string ID of the paper followed by binary values indicating whether each word in the vocabulary is present (indicated by 1) or absent (indicated by 0) in the paper.
共有2708行，每一行代表一個樣本點，即一篇論文。
每篇論文都由一個1433維的詞向量表示，所以，每個樣本點具有1433個特徵。
- 詞向量的每個元素都對應一個詞，且該元素只有0或1兩個取值。
- 取0表示該元素對應的詞不在論文中，取1表示在論文中。
- 所有的詞來源於具有1433個詞的字典。
- Finally, the last entry in the line contains the class label of the paper.
.cites file: contains the citation graph of the corpus. Each line describes a link in the following format:
```
  <ID of cited paper> <ID of citing paper>
```
- 共5429行，Each line contains two paper IDs.
- The first entry is the ID of the paper being cited and the second ID stands for the paper which contains the citation.
- The direction of the link is from right to left. If a line is represented by “paper1 paper2” then the link is “paper2->paper1”.
每篇論文都至少引用了一篇其他論文，或者被其他論文引用，也就是樣本點之間存在聯繫，沒有任何一個樣本點與其他樣本點完全沒聯繫。如果將樣本點看做圖中的點，則這是一個連通的圖，不存在孤立點。
如果將論文看做圖中的點，那麼這5429行便是點之間的5429條邊。

https://blog.csdn.net/yeziand01/java/article/details/93374216

mini-batch思想

通過創建一個稀疏的塊對角矩陣來實現並行化操作
並在節點的維度將節點特徵矩陣和target矩陣連接起來。這種方式使得比較容易地在不同的batch中進行操作

https://zhuanlan.zhihu.com/p/78452993

GCN

層與層之間的傳播方式

GAT

對於節點3，它的鄰接節點只有節點2和節點4，但不代表這兩個節點對節點3具有一樣的重要性。這個“重要性”可以進行量化，更可以通過網絡訓練得出。這個“重要性”，在文中叫attention，可以通過訓練得到。這便是GAT的核心創新點了。

https://zhuanlan.zhihu.com/p/99927545

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

GCN\GAT研究概述

文章目錄

數據集

官方數據集cora

THE DIRECTORY CONTAINS TWO FILES

mini-batch思想

GCN

層與層之間的傳播方式

GAT

C語言--右移左移

12款高效開源Wiki系統推薦，打造團隊知識管理利器

一個開源且全面的C#算法實戰教程

dotnet 基於 DirectML 控制檯運行 Phi-3 模型

自定義MyBatis插件

一款.NET開源、功能強大、跨平臺的繪圖庫 - OxyPlot

常用的 Git 指令

sm4加密工具類

2020實習筆試經驗

Scrapy框架應用實踐

Python PDF讀取&處理

leetcode974 模式識別\同餘定理：和可被 K 整除的子數組

Transfomer解析

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結