star-history源碼閱讀筆記(01)-項目介紹,Github的stargazers接口與分頁機制，獲取star歷史的思路

本文首發於我的Github博客
本文是star-history項目源碼閱讀的第一篇文章，會包含:

作者對項目的介紹，這個系列博文的目的
Github的stargazers接口
Github接口的分頁策略
獲取star歷史的思路

本次對代碼的分析基於Commit - first commit deecd92 timqian

項目的介紹與系列博文的目的

項目介紹

首先說明，本文作者並非項目作者，各種介紹和分析，如有不當，還請諒解。

star-history項目(體驗地址)是一個用於統計github項目獲得star歷史的組件，包括web app網頁版本和chrome extension版本

產生這個項目的原因，是Github官方並沒有提供查看項目star歷史的功能

博文目的

留作知識沉澱
- 從體驗界面來看，前端效果和功能都很不錯，希望學習一下
將我三次元的時間，盡！情！揮！霍！

Github的stargazers接口

Github官方提供了一系列REST API(現在有向graphql上遷移的趨勢)，通過REST API，可以獲得許多Github上的信息，以此爲基礎，我們可以構建各式各樣的APP，star-history這個項目也是這樣建立起來的

Github雖然沒有提供直接查看項目star歷史的功能，但是卻提供了stargazers接口，這個接口有兩種形式

查看star了一個項目的所有用戶
同上，並且加入該用戶star該項目的時間

這二者共用同一個rest url，不同的是：

方法2需要在HTTP請求頭中加入Accept: application/vnd.github.v3.star+json

其rest url和返回的json格式分別是

GET /repos/:owner/:repo/stargazers
# 沒有時間
[
  {
    "login": "octocat",
    "id": 1,
    "node_id": "MDQ6VXNlcjE=",
    "avatar_url": "https://github.com/images/error/octocat_happy.gif",
    "gravatar_id": "",
    "url": "https://api.github.com/users/octocat",
    "html_url": "https://github.com/octocat",
    "followers_url": "https://api.github.com/users/octocat/followers",
    "following_url": "https://api.github.com/users/octocat/following{/other_user}",
    "gists_url": "https://api.github.com/users/octocat/gists{/gist_id}",
    "starred_url": "https://api.github.com/users/octocat/starred{/owner}{/repo}",
    "subscriptions_url": "https://api.github.com/users/octocat/subscriptions",
    "organizations_url": "https://api.github.com/users/octocat/orgs",
    "repos_url": "https://api.github.com/users/octocat/repos",
    "events_url": "https://api.github.com/users/octocat/events{/privacy}",
    "received_events_url": "https://api.github.com/users/octocat/received_events",
    "type": "User",
    "site_admin": false
  }
]

GET /repos/:owner/:repo/stargazers
Header:
Accept: application/vnd.github.v3.star+json
# 有star時間
[
  {
    "starred_at": "2011-01-16T19:06:43Z",
    "user": {
      "login": "octocat",
      "id": 1,
      "node_id": "MDQ6VXNlcjE=",
      "avatar_url": "https://github.com/images/error/octocat_happy.gif",
      "gravatar_id": "",
      "url": "https://api.github.com/users/octocat",
      "html_url": "https://github.com/octocat",
      "followers_url": "https://api.github.com/users/octocat/followers",
      "following_url": "https://api.github.com/users/octocat/following{/other_user}",
      "gists_url": "https://api.github.com/users/octocat/gists{/gist_id}",
      "starred_url": "https://api.github.com/users/octocat/starred{/owner}{/repo}",
      "subscriptions_url": "https://api.github.com/users/octocat/subscriptions",
      "organizations_url": "https://api.github.com/users/octocat/orgs",
      "repos_url": "https://api.github.com/users/octocat/repos",
      "events_url": "https://api.github.com/users/octocat/events{/privacy}",
      "received_events_url": "https://api.github.com/users/octocat/received_events",
      "type": "User",
      "site_admin": false
    }
  }
]

Github接口的分頁策略

對於stargazers接口，一個倉庫很可能有數萬甚至數十萬個用戶star過，如果我們在一次請求
GET /repos/:owner/:repo/stargazers
中，就將所有的信息全部都拿出來，會導致:

網絡IO和內存IO負荷過大
不靈活，也許有些接口調用方並不想要全部的數據，只想要部分的，這樣的請求IO就全部浪費了

爲此，Github的很多API都引入了分頁機制

分頁機制中，比較重要的有幾點：

如何知道一個url的資源被分成了多少頁？
如何知道目前是哪一頁？
如何知道一個url的資源在一頁上有多少個？
如何獲取一個url任意一頁的資源？

我們先來看看Github的REST API是如何接受和提供分頁信息的

接受分頁信息

對於每一個url，我們可以在後面加上page和per_page參數:

per_page參數指定了一頁上有多少個資源
- 這個參數可以沒有，不同的url接口會有不同的默認值，有的是30，有的是100，具體靠閱讀文檔
- 並不是所有的url接口都接受這個參數，有些url接口不接受，具體靠閱讀文檔
page參數指定了需要拿哪一頁的資源

提供分頁信息

在HTTP響應中，Github接口加入一個響應頭Link，這個響應頭的樣式大概是

# 注意這個請求沒有加上page參數，也能獲得Link響應頭
GET https://api.github.com/search/code?q=addClass+user%3Amozilla

# HTTP響應的響應頭
Link: <https://api.github.com/search/code?q=addClass+user%3Amozilla&page=15>; rel="next",
  <https://api.github.com/search/code?q=addClass+user%3Amozilla&page=34>; rel="last",
  <https://api.github.com/search/code?q=addClass+user%3Amozilla&page=1>; rel="first",
  <https://api.github.com/search/code?q=addClass+user%3Amozilla&page=13>; rel="prev"

其中rel表示的是url和當前url的關係:

prev，前一頁的url
next，下一頁的url
last，最後一頁的url，也就是總頁數
first，第一頁的url

疑問的解答

所以我們之前的數個疑問就可以得到解答

如何知道一個url的資源被分成了多少頁？
- 首先不帶page參數進行請求，而後通過響應頭，提取出last對應的url中的page參數即可
如何知道目前是哪一頁？
- 當前url的page參數就是當前頁數
- 響應頭中的next對應的url中的page參數是下一頁
如何知道一個url的資源在一頁上有多少個？
- 查看文檔，會有默認值
- 查看文檔，如果url接口接受per_page參數，就可以自行設置(注意可能會有最大值限制)
如何獲取一個url任意一頁的資源？
- 加入page參數

獲取star歷史的思路

瞭解了Github的stargazers接口及分頁策略，我們就可以來分析一下獲取star歷史的方法:

調用stargazers接口，要帶star日期的
根據star日期進行排序
統計出star發生改變的時間(也就是某個用戶star了倉庫的時間)和當時的star數目(就是排序後的索引值)
以改變的時間作爲橫軸，改變當時的star數目作爲縱軸，繪製圖像

這樣來看，基本上是沒錯的，但是還要考慮一點

如果一個倉庫有數千數萬數十萬star，我們就要繪製數千數萬數十萬的點嗎？

可以當然是可以的，但是這麼做，對於高star的項目，內存和網絡消耗過大，處理時間過長，項目初期，不利於我們開發和調試

所以我們可以利用分頁機制進行取樣

比如，我們選定取樣點數爲20，那麼，

對於star數目不足20的項目，
- 我們獲取所有的信息，並繪製出所有的點
對於star數目高於20的項目(假設star數爲N)，
- 我們獲取0, N/20, 2N/20, 3N/20, …, N時的時間
- 然後以這二十個時間點和star數，繪製20個點即可

上面描述的是如何取樣，那麼取樣與分頁有什麼關係呢？

那就是——我們不需要獲取總star數目，我們只需要獲取總頁數

對於一個stargazers接口頁數爲N的項目
- 我們獲取0, N/20, 2N/20, 3N/20, …, N頁上最早的時間
- 然後以這二十個時間點和star數(頁編號 * 每頁資源數目)，繪製20個點即可

代碼分析

事實上，項目代碼中也是這麼操作的(事實上剛纔的思路是我從代碼中倒推出來的，尬笑)

generateUrls.js中

const getConfig = {
  headers: {
    Accept: 'application/vnd.github.v3.star+json',
  },
};

export default async function(repo) {
  const initUrl = `https://api.github.com/repos/${repo}/stargazers`;
  const res = await axios.get(initUrl, getConfig).catch(e => {
      //...
  })
  //
}

這表明我們使用的是stargazers的帶時間的接口

  const link = res.headers.link;
  if (!link) {
      //...
  } else {
    const pageNumArray = /next.*?page=(\d*).*?last/.exec(link);
    const pageNum = pageNumArray[1];
    let samplePageUrls = [];
    let pageIndexes = [];
    if (pageNum <= sampleNum) {
      for (let i = 2; i <= pageNum; i++) {
        pageIndexes.push(i);
        samplePageUrls.push(initUrl + '?page=' + i);
      }
    } else {
      for (let i = 1; i < sampleNum; i++) {
        let pageIndex = Math.round(i / sampleNum * pageNum);
        pageIndexes.push(pageIndex);
        samplePageUrls.push(initUrl + '?page=' + pageIndex);
      }
    }
    //...
    return {
      samplePageUrls, pageIndexes,
    };
  }

顯然這一段代碼是通過響應頭Link，使用正則表達式提取出總頁數，然後取樣sampleNum個點

getStarHistory.js中

export default async function(repo) {
  const {
    samplePageUrls, pageIndexes
  } = await generateUrls(repo).catch(e => {
    console.log(e); // throw don't work
  });

  const getArray = samplePageUrls.map(url => axios.get(url, getConfig));

  const resArray = await Promise.all(getArray).catch(e => {
    console.log(e); // throw don't work
  });

  const starHistory = pageIndexes.map((p, i) => {
    return {
      date: resArray[i].data[0].starred_at.slice(0, 10),
      starNum: 30 * (p - 1),
    };
  });
  console.log(starHistory);

  return starHistory;
}

這一段代碼，

通過generateUrls.js的接口獲取所有采樣的url接口，而後進行請求
請求後獲得每一頁最小的時間，並把最小的時間和當頁代表的star數組合起來返回

這樣，就得到了一個項目的star歷史

star-history源碼閱讀筆記(01)-項目介紹,Github的stargazers接口與分頁機制，獲取star歷史的思路

項目的介紹與系列博文的目的

項目介紹

博文目的

Github的stargazers接口

Github接口的分頁策略

接受分頁信息

提供分頁信息

疑問的解答

獲取star歷史的思路

代碼分析

C++ Primer Section 1-5

golang中使用elasticsearch

聽課筆記---數據結構（浙江大學）MOOC---第一週

golang獲取執行函數名，執行文件名與所在行數

git-status中文亂碼問題

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結