GCN\GAT研究概述

數據集

官方數據集cora

  • The Cora dataset consists of Machine Learning papers.

  • These papers are classified into one of the following seven classes:

    • Case_Based
    • Genetic_Algorithms
    • Neural_Networks
    • Probabilistic_Methods
    • Reinforcement_Learning
    • Rule_Learning
    • Theory
  • The papers were selected in a way such that in the final corpus every paper cites or is cited by atleast one other paper.

  • There are 2708 papers in the whole corpus. => 該數據集共2708個樣本點

  • Vocabulary: After stemming and removing stopwords we were left with a vocabulary of size 1433 unique words (All words with document frequency less than 10 were removed).

THE DIRECTORY CONTAINS TWO FILES

  • .content file: contains descriptions of the papers in the following format:

      <paper_id> <word_attributes>+ <class_label>
    
    • The first entry in each line contains the unique string ID of the paper followed by binary values indicating whether each word in the vocabulary is present (indicated by 1) or absent (indicated by 0) in the paper.
  • 共有2708行,每一行代表一個樣本點,即一篇論文。

  • 每篇論文都由一個1433維的詞向量表示,所以,每個樣本點具有1433個特徵。

    • 詞向量的每個元素都對應一個詞,且該元素只有0或1兩個取值。

    • 取0表示該元素對應的詞不在論文中,取1表示在論文中。

    • 所有的詞來源於具有1433個詞的字典。

    • Finally, the last entry in the line contains the class label of the paper.

  • .cites file: contains the citation graph of the corpus. Each line describes a link in the following format:

      <ID of cited paper> <ID of citing paper>
    
    • 共5429行,Each line contains two paper IDs.
    • The first entry is the ID of the paper being cited and the second ID stands for the paper which contains the citation.
    • The direction of the link is from right to left. If a line is represented by “paper1 paper2” then the link is “paper2->paper1”.
  • 每篇論文都至少引用了一篇其他論文,或者被其他論文引用,也就是樣本點之間存在聯繫,沒有任何一個樣本點與其他樣本點完全沒聯繫。如果將樣本點看做圖中的點,則這是一個連通的圖,不存在孤立點

  • 如果將論文看做圖中的點,那麼這5429行便是點之間的5429條邊。

https://blog.csdn.net/yeziand01/java/article/details/93374216

mini-batch思想

  • 通過創建一個稀疏的塊對角矩陣來實現並行化操作
  • 並在節點的維度將節點特徵矩陣和target矩陣連接起來。這種方式使得比較容易地在不同的batch中進行操作

在這裏插入圖片描述在這裏插入圖片描述
https://zhuanlan.zhihu.com/p/78452993

GCN

層與層之間的傳播方式

在這裏插入圖片描述

GAT

  • 對於節點3,它的鄰接節點只有節點2和節點4,但不代表這兩個節點對節點3具有一樣的重要性。這個“重要性”可以進行量化,更可以通過網絡訓練得出。這個“重要性”,在文中叫attention,可以通過訓練得到。這便是GAT的核心創新點了。

在這裏插入圖片描述
在這裏插入圖片描述

https://zhuanlan.zhihu.com/p/99927545

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章