我們來分析一下平均索引速度最快的 BT 磁力搜索引擎索引排行（整理分享，每日更新）

採用分佈式架構：

前端 Client asp.net mvc ef 6.0

後端 JAVA 系統：

* j2ee 核心組件：jsp、servlet、jdbc、ejb、jndi
* 數據通信：xml 標記語言
* 前臺頁面展示：html、dhtml、vue、Ext 開源框架
* 控制層：SPRING MVC
* 業務邏輯層：spring 核心
* 數據持久層：、mybatis HIBERNATE
* 中間件：ejb (2.0)
* 操作系統：WINDOWS 2008
* 數據庫：db2、oracle
* 應用服務器：JBOSS AS 7 TOMCAT
* 開發工具：WebSphere Studio Application Developer（WSAD），eclipse
* 搜索 ES 6.22 + SOLR 雙引擎

.NET 系統：企業庫 5.0+EF6.0 開發，結合 WCF 架構。

原理如下:

後臺採用：JAVA 語言寫的爬蟲系統。

超過八千萬條數據 HASH。

DHT 網絡爬蟲的實現

DHT 協議原理以及一些重點分析：

要做 DHT 的爬蟲，首先得透徹理解 DHT，這樣才能知道在什麼地方究竟該應用什麼算法去解決問題。關於 DHT 協議的細節以及重要的參考文章，請參考文末 1

DHT 協議作爲 BT 協議的一個輔助，是非常好玩的。它主要是爲了在 BT 正式下載時得到種子或者 BT 資源。傳統的網絡，需要一臺中央服務器存放種子或者 BT 資源，不僅浪費服務器資源，還容易出現單點的各種問題，而 DHT 網絡則是爲了去中心化，也就是說任意時刻，這個網絡總有節點是亮的，你可以去詢問問這些亮的節點，從而將自己加入 DHT 網絡。

要實現 DHT 協議的網絡爬蟲，主要分 3 步，第一步是得到資源信息 (infohash，160bit，20 字節，可以編碼爲 40 字節的十六進制字符串)，第二步是確認這些 infohash 是有效的，第三步是通過有效的 infohash 下載到 BT 的種子文件，從而得到對這個資源的完整描述。

其中第一步是其他節點用 DHT 協議中的 get_peers 方法向爬蟲發送請求得到的，第二步是其他節點用 DHT 協議中的 announce_peer 向爬蟲發送請求得到的，第三步可以有幾種方式得到，比如可以去一些保存種子的網站根據 infohash 直接下載到，或者通過 announce_peer 的節點來下載到，具體如何實現，可以取決於你自己的爬蟲。

DHT 協議中的主要幾個操作：

主要負責通過 UDP 與外部節點交互，封裝 4 種基本操作的請求以及相應。

ping：檢查一個節點是否 “存活”

在一個爬蟲裏主要有兩個地方用到 ping，第一是初始路由表時，第二是驗證節點是否存活時

find_node：向一個節點發送查找節點的請求

在一個爬蟲中主要也是兩個地方用到 find_node，第一是初始路由表時，第二是驗證桶是否存活時

get_peers：向一個節點發送查找資源的請求

在爬蟲中有節點向自己請求時不僅像個正常節點一樣做出迴應，還需要以此資源的 info_hash 爲機會盡可能多的去認識更多的節點。如圖，get_peers 實際上最後一步是 announce_peer，但是因爲爬蟲不能 announce_peer，所以實際上 get_peers 退化成了 find_node 操作。

announce_peer：向一個節點發送自己已經開始下載某個資源的通知

爬蟲中不能用 announce_peer，因爲這就相當於通報虛假資源，對方很容易從上下文中判斷你是否通報了虛假資源從而把你禁掉

DHT 協議中有幾個重點的需要澄清的地方：

1. node 與 infohash 同樣使用 160bit 的表示方式，160bit 意味着整個節點空間有 2^160 = 730750818665451459101842416358141509827966271488，是 48 位 10 進制，也就是說有百億億億億億個節點空間，這麼大的節點空間，是足夠存放你的主機節點以及任意的資源信息的。

2. 每個節點有張路由表。每張路由表由一堆 K 桶組成，所謂 K 桶，就是桶中最多隻能放 K 個節點，默認是 8 個。而桶的保存則是類似一顆前綴樹的方式。相當於一張 8 桶的路由表中最多有 160-4 個 K 桶。

3. 根據 DHT 協議的規定，每個 infohash 都是有位置的，因此，兩個 infohash 之間就有距離一說，而兩個 infohash 的距離就可以用異或來表示，即 infohash1 xor infohash2，也就是說，高位一樣的話，他們的距離就近，反之則遠，這樣可以快速的計算兩個節點的距離。計算這個距離有什麼用呢，在 DHT 網絡中，如果一個資源的 infohash 與一個節點的 infohash 越近則該節點越有可能擁有該資源的信息，爲什麼呢？可以想象，因爲人人都用同樣的距離算法去遞歸的詢問離資源接近的節點，並且只要該節點做出了迴應，那麼就會得到一個 announce 信息，也就是說跟資源 infohash 接近的節點就有更大的概率拿到該資源的 infohash

4. 根據上述算法，DHT 中的查詢是跳躍式查詢，可以迅速的跨越的的節點桶而接近目標節點桶。之所以在遠處能夠大幅度跳躍，而在近處只能小幅度跳躍，原因是每個節點的路由表中離自身越接近的節點保存得越多，如下圖

5. 在一個 DHT 網絡中當爬蟲並不容易，不像普通爬蟲一樣，看到資源就可以主動爬下來，相反，因爲得到資源的方式 (get_peers, announce_peer) 都是被動的，所以爬蟲的方式就有些變化了，爬蟲所要做的事就是像個正常節點一樣去響應其他節點的查詢，並且得到其他節點的迴應，把其中的數據收集下來就算是完成工作了。而爬蟲唯一能做的，是儘可能的去多認識其他節點，這樣，纔能有更多其他節點來向你詢問。

6. 有人說，那麼我把 DHT 爬蟲的 K 桶中的容量 K 增大是不是就能增加得到資源的機會，其實不然，之前也分析過了，DHT 爬蟲最重要的信息來源全是被動的，因爲你不能增大別人的 K，所以距離遠的節點保存你自身的概率就越小，當然距離遠的節點去請求你的概率相對也比較小。

一些主要的組件（實際實現更加複雜一些，有其他的模塊，這裏僅列舉主要幾個）：

DHT crawler：

這個就是 DHT 爬蟲的主邏輯，爲了簡化多線程問題，跟 server 用了生產者消費者模型，負責消費，並且複用 server 的端口。

主要任務就是負責初始化，包括路由表的初始化，以及初始的請求。另外負責處理所有進來的消息事件，由於生產者消費者模型的使用，裏面的操作都基本上是單線程的，簡化了不少問題，而且相信也比上鎖要提升速度（當然了，加鎖這步按理是放到了 queue 這裏了，不過對於這種生產者源源不斷生產的類型，可以用 ring-buffer 大幅提升性能）。

DHT server：

這裏是 DHT 爬蟲的服務器端，DHT 網絡中的節點不單是 client，也是 server，所以要有 server 擔當生產者的角色，最初也是每個消費者對應一個生產者，但實際上發現可以利用 IO 多路複用來達到消息事件的目的，這樣一來大大簡化了系統中線程的數量，如果 client 可以的話，也應該用同樣的方式來組織，這樣系統的速度應該會快很多。（尚未驗證）

DHT route table：

主要負責路由表的操作。

路由表有如下操作：

init：剛創建路由表時的操作。分兩種情況：

1. 如果之前已經初始化過，並且將上次路由表的數據保存下來，則只需要讀入保存數據。

2. 如果之前沒有初始化過，則首先應當初始化。

首先，應當有一個接入點，也就是說，你要想加進這個網絡，必須認識這個網絡中某個節點 i 並將 i 加入路由表，接下來對 i 用 find_node 詢問自己的 hash_info，這裏巧妙的地方就在於，理論上通過一定數量的詢問就會找到離自己距離很近的節點 (也就是經過一定步驟就會收斂)。find_node 目的在於儘可能早的讓自己有數據，並且讓網絡上別的節點知道自己，如果別人不認識你，就不會發送消息過來，意味着你也不能獲取到想要的信息。

search：比較重要的方法，主要使用它來定位當前 infohash 所在的桶的位置。會被其他各種代理方法調用到。

findNodes：找到路由表中與傳入的 infohash 最近的 k 個節點

getPeer：找到待查資源是否有 peer（即是否有人在下載，也就是是否有人 announce 過）

announcePeer：通知該資源正在被下載

DHT bucket:

acitiveNode：邏輯比較多，分如下幾點。

1. 查找所要添加的節點對應路由表的桶是否已經滿，如果未滿，添加節點

2. 如果已經滿，檢查該桶中是否包含爬蟲節點自己，如果不包含，拋棄待添加節點

3. 如果該桶中包含本節點，則平均分裂該桶

其他的諸如 locateNode,
replaceNode, updateNode,
removeNode，就不一一說明了

DHT torrent parser：

主要從 bt 種子文件中解析出以下幾個重要的信息：name，size，file list (sub file name, sub file size)，比較簡單，用 bencode 方向解碼就行了

Utils：

distance：計算兩個資源之間的距離。在 kad 中用 a xor b 表示

爲了增加難度，選用了不太熟悉的語言 python，結果步步爲營，但是也感慨 python 的簡潔強大。在實現中，也碰到很多有意思的問題。比如如何保存一張路由表中的所有桶，之前想出來幾個辦法，甚至爲了節省資源，打算用 bit 數組 + dict 直接保存，但是因爲估計最終的幾個操作不是很方便直觀容易出錯而放棄，選用的結構就是前綴樹，操作起來果然是沒有障礙；

在超時問題上，比如桶超時和節點超時，一直在思考一個高效但是比較優雅的做法，可以用一個同步調用然後等待它的超時，但是顯然很低效，尤其我沒有用更多線程的情況，一旦阻塞了就等於該端口所有事件都被阻塞了。所以必須用異步操作，但是異步操作很難去控制它的精確事件，當然，我可以在每個事件來的時候檢查一遍是否超時，但是顯然也是浪費和低效。那麼，剩下的只有採用跟 tomcat 類似的方式了，增加一個線程來監控，當然，這個監控線程最好是全局的，能監控所有 crawler 中所有事務的超時。另外，超時如果控制不當，容易導致內存沒有回收以至於內存泄露，也值得注意。超時線程是否會與其他線程互相影響也應當仔細檢查。

最初超時的控制沒處理好，出現了 ping storm，運行一定時間後大多數桶已經滿了，如果按照協議中的方式去跑的話會發現大量的事件都是在 ping 以確認這個節點是否 ok 以至於大量的 cpu 用於處理 ping 和 ping 響應。深入理解後發現，檢查節點狀態是不需要的，因爲節點狀態只是爲了提供給詢問的人一些好的節點，既然如此，可以將每次過來的節點替換當前桶中最老的節點，如此一來，我們將總是保存着最新的節點。

搜索算法也是比較讓我困惑的地方，簡而言之，搜索的目的並不是真正去找資源，而是去認識那些能夠保存你的節點。爲什麼說是能夠保存你，因爲離你越遠，桶的數量越少，這樣一來，要想進他們的桶中去相對來說就比較困難，所以搜索的目標按理應該是附近的節點最好，但是不能排除遠方節點也可能保存你的情況，這種情況會發生在遠方節點初始化時或者遠方節點的桶中節點超時的時候，但總而言之，概率要小些。所以搜索算法也不應該不做判斷就胡亂搜索，但是也不應該將搜索的距離嚴格限制在附近，所以這是一個權衡問題，暫時沒有想到好的方式，覺得暫時讓距離遠的以一定概率發生，而距離近的必然發生

還有一點，就是搜索速度問題，因爲 DHT 網絡的這種結構，決定了一個節點所認識的其他節點必然是有限的附近節點，於是每個節點在一定時間段內能拿到的資源數必然是有限的，所以應當分配多個節點同時去抓取，而抓取資源的數量很大程度上就跟分配節點的多少有關了。

最後一個值得優化的地方是 findnodes 方法，之前的方式是把一個桶中所有數據拿出來排序，然後取其中前 K 個返回回去，但是實際上我們做了很多額外的工作，這是經典的 topN 問題，使用排序明顯是浪費時間的，因爲這個操作非常頻繁，所以即便所有保存的節點加起來很少（(160 - 4) * 8），也會一定程度上增加時間。而採用的算法是在一篇論文《可擴展的 DHT 網絡爬蟲設計和優化》中找到的，基本公式是 IDi = IDj xor 2 ^(160 - i)，這樣，已知 IDi 和 i 就能知道 IDj，若已知 IDi 和 IDj 就能知道 i，通過這種方式，可以快速的查找該桶 A 附近的其他桶（顯然是離桶 A 層次最近的桶中的節點距離 A 次近），比起全部遍歷再查找效率要高不少。

dht 協議 http://www.bittorrent.org/beps/bep_0005.html 及其翻譯 http://gobismoon.blog.163.com/blog/static/5244280220100893055533/

基於 dht 協議的網絡爬蟲 http://codemacro.com/2013/05/19/crawl-dht/

dht 協議的原理分析，非常不錯，建議一看 http://blog.sina.com.cn/s/blog_5384aaf00100a88k.html

BitTorrent uses a "distributed sloppy hash table" (DHT) for storing peer contact information for "trackerless" torrents. In effect, each peer becomes a tracker. The protocol is based on Kademila [1] and is implemented over UDP.

Please note the terminology used in this document to avoid confusion. A "peer" is a client/server listening on a TCP port that implements the BitTorrent protocol. A "node" is a client/server listening on a UDP port implementing the distributed hash table protocol. The DHT is composed of nodes and stores the location of peers. BitTorrent clients include a DHT node, which is used to contact other nodes in the DHT to get the location of peers to download from using the BitTorrent protocol.

Overview

Each node has a globally unique identifier known as the "node ID." Node IDs are chosen at random from the same 160-bit space as BitTorrent infohashes[2]. A "distance metric" is used to compare two node IDs or a node ID and an infohash for "closeness." Nodes must maintain a routing table containing the contact information for a small number of other nodes. The routing table becomes more detailed as IDs get closer to the node's own ID. Nodes know about many other nodes in the DHT that have IDs that are "close" to their own but have only a handful of contacts with IDs that are very far away from their own.

In Kademlia, the distance metric is XOR and the result is interpreted as an unsigned integer.distance(A,B) = |A xor B| Smaller values are closer.

When a node wants to find peers for a torrent, it uses the distance metric to compare the infohash of the torrent with the IDs of the nodes in its own routing table. It then contacts the nodes it knows about with IDs closest to the infohash and asks them for the contact information of peers currently downloading the torrent. If a contacted node knows about peers for the torrent, the peer contact information is returned with the response. Otherwise, the contacted node must respond with the contact information of the nodes in its routing table that are closest to the infohash of the torrent. The original node iteratively queries nodes that are closer to the target infohash until it cannot find any closer nodes. After the search is exhausted, the client then inserts the peer contact information for itself onto the responding nodes with IDs closest to the infohash of the torrent.

The return value for a query for peers includes an opaque value known as the "token." For a node to announce that its controlling peer is downloading a torrent, it must present the token received from the same queried node in a recent query for peers. When a node attempts to "announce" a torrent, the queried node checks the token against the querying node's IP address. This is to prevent malicious hosts from signing up other hosts for torrents. Since the token is merely returned by the querying node to the same node it received the token from, the implementation is not defined. Tokens must be accepted for a reasonable amount of time after they have been distributed. The BitTorrent implementation uses the SHA1 hash of the IP address concatenated onto a secret that changes every five minutes and tokens up to ten minutes old are accepted.

Routing Table

Every node maintains a routing table of known good nodes. The nodes in the routing table are used as starting points for queries in the DHT. Nodes from the routing table are returned in response to queries from other nodes.

Not all nodes that we learn about are equal. Some are "good" and some are not. Many nodes using the DHT are able to send queries and receive responses, but are not able to respond to queries from other nodes. It is important that each node's routing table must contain only known good nodes. A good node is a node has responded to one of our queries within the last 15 minutes. A node is also good if it has ever responded to one of our queries and has sent us a query within the last 15 minutes. After 15 minutes of inactivity, a node becomes questionable. Nodes become bad when they fail to respond to multiple queries in a row. Nodes that we know are good are given priority over nodes with unknown status.

The routing table covers the entire node ID space from 0 to 2160. The routing table is subdivided into "buckets" that each cover a portion of the space. An empty table has one bucket with an ID space range of min=0, max=2160. When a node with ID "N" is inserted into the table, it is placed within the bucket that has min <= N < max. An empty table has only one bucket so any node must fit within it. Each bucket can only hold K nodes, currently eight, before becoming "full." When a bucket is full of known good nodes, no more nodes may be added unless our own node ID falls within the range of the bucket. In that case, the bucket is replaced by two new buckets each with half the range of the old bucket and the nodes from the old bucket are distributed among the two new ones. For a new table with only one bucket, the full bucket is always split into two new buckets covering the ranges 0..2159 and 2159..2160.

When the bucket is full of good nodes, the new node is simply discarded. If any nodes in the bucket are known to have become bad, then one is replaced by the new node. If there are any questionable nodes in the bucket have not been seen in the last 15 minutes, the least recently seen node is pinged. If the pinged node responds then the next least recently seen questionable node is pinged until one fails to respond or all of the nodes in the bucket are known to be good. If a node in the bucket fails to respond to a ping, it is suggested to try once more before discarding the node and replacing it with a new good node. In this way, the table fills with stable long running nodes.

Each bucket should maintain a "last changed" property to indicate how "fresh" the contents are. When a node in a bucket is pinged and it responds, or a node is added to a bucket, or a node in a bucket is replaced with another node, the bucket's last changed property should be updated. Buckets that have not been changed in 15 minutes should be "refreshed." This is done by picking a random ID in the range of the bucket and performing a find_nodes search on it. Nodes that are able to receive queries from other nodes usually do not need to refresh buckets often. Nodes that are not able to receive queries from other nodes usually will need to refresh all buckets periodically to ensure there are good nodes in their table when the DHT is needed.

Upon inserting the first node into its routing table and when starting up thereafter, the node should attempt to find the closest nodes in the DHT to itself. It does this by issuing find_node messages to closer and closer nodes until it cannot find any closer. The routing table should be saved between invocations of the client software.

BitTorrent Protocol Extension

The BitTorrent protocol has been extended to exchange node UDP port numbers between peers that are introduced by a tracker. In this way, clients can get their routing tables seeded automatically through the download of regular torrents. Newly installed clients who attempt to download a trackerless torrent on the first try will not have any nodes in their routing table and will need the contacts included in the torrent file.

Peers supporting the DHT set the last bit of the 8-byte reserved flags exchanged in the BitTorrent protocol handshake. Peer receiving a handshake indicating the remote peer supports the DHT should send a PORT message. It begins with byte 0x09 and has a two byte payload containing the UDP port of the DHT node in network byte order. Peers that receive this message should attempt to ping the node on the received port and IP address of the remote peer. If a response to the ping is recieved, the node should attempt to insert the new contact information into their routing table according to the usual rules.

Torrent File Extensions

A trackerless torrent dictionary does not have an "announce" key. Instead, a trackerless torrent has a "nodes" key. This key should be set to the K closest nodes in the torrent generating client's routing table. Alternatively, the key could be set to a known good node such as one operated by the person generating the torrent. Please do not automatically add "router.bittorrent.com" to torrent files or automatically add this node to clients routing tables.

nodes = [["<host>", <port>], ["<host>", <port>], ...]
nodes = [["127.0.0.1", 6881], ["your.router.node", 4804]]

KRPC Protocol

The KRPC protocol is a simple RPC mechanism consisting of bencoded dictionaries sent over UDP. A single query packet is sent out and a single packet is sent in response. There is no retry. There are three message types: query, response, and error. For the DHT protocol, there are four queries: ping, find_node, get_peers, and announce_peer.

A KRPC message is a single dictionary with two keys common to every message and additional keys depending on the type of message. Every message has a key "t" with a string value representing a transaction ID. This transaction ID is generated by the querying node and is echoed in the response, so responses may be correlated with multiple queries to the same node. The transaction ID should be encoded as a short string of binary numbers, typically 2 characters are enough as they cover 2^16 outstanding queries. The other key contained in every KRPC message is "y" with a single character value describing the type of message. The value of the "y" key is one of "q" for query, "r" for response, or "e" for error.

Contact Encoding

Contact information for peers is encoded as a 6-byte string. Also known as "Compact IP-address/port info" the 4-byte IP address is in network byte order with the 2 byte port in network byte order concatenated onto the end.

Contact information for nodes is encoded as a 26-byte string. Also known as "Compact node info" the 20-byte Node ID in network byte order has the compact IP-address/port info concatenated to the end.

Queries

Queries, or KRPC message dictionaries with a "y" value of "q", contain two additional keys; "q" and "a". Key "q" has a string value containing the method name of the query. Key "a" has a dictionary value containing named arguments to the query.

Responses

Responses, or KRPC message dictionaries with a "y" value of "r", contain one additional key "r". The value of "r" is a dictionary containing named return values. Response messages are sent upon successful completion of a query.

Errors

Errors, or KRPC message dictionaries with a "y" value of "e", contain one additional key "e". The value of "e" is a list. The first element is an integer representing the error code. The second element is a string containing the error message. Errors are sent when a query cannot be fulfilled. The following table describes the possible error codes:

Code	Description
201	Generic Error
202	Server Error
203	Protocol Error, such as a malformed packet, invalid arguments, or bad token
204	Method Unknown

Example Error Packets:

generic error = {"t":"aa", "y":"e", "e":[201, "A Generic Error Ocurred"]}
bencoded = d1:eli201e23:A Generic Error Ocurrede1:t2:aa1:y1:ee

DHT Queries

All queries have an "id" key and value containing the node ID of the querying node. All responses have an "id" key and value containing the node ID of the responding node.

ping

The most basic query is a ping. "q" = "ping" A ping query has a single argument, "id" the value is a 20-byte string containing the senders node ID in network byte order. The appropriate response to a ping has a single key "id" containing the node ID of the responding node.

arguments:  {"id" : "<querying nodes id>"}

response: {"id" : "<queried nodes id>"}

Example Packets

ping Query = {"t":"aa", "y":"q", "q":"ping", "a":{"id":"abcdefghij0123456789"}}
bencoded = d1:ad2:id20:abcdefghij0123456789e1:q4:ping1:t2:aa1:y1:qe

Response = {"t":"aa", "y":"r", "r": {"id":"mnopqrstuvwxyz123456"}}
bencoded = d1:rd2:id20:mnopqrstuvwxyz123456e1:t2:aa1:y1:re

find_node

Find node is used to find the contact information for a node given its ID. "q" == "find_node" A find_node query has two arguments, "id" containing the node ID of the querying node, and "target" containing the ID of the node sought by the queryer. When a node receives a find_node query, it should respond with a key "nodes" and value of a string containing the compact node info for the target node or the K (8) closest good nodes in its own routing table.

arguments:  {"id" : "<querying nodes id>", "target" : "<id of target node>"}

response: {"id" : "<queried nodes id>", "nodes" : "<compact node info>"}

Example Packets

find_node Query = {"t":"aa", "y":"q", "q":"find_node", "a": {"id":"abcdefghij0123456789", "target":"mnopqrstuvwxyz123456"}}
bencoded = d1:ad2:id20:abcdefghij01234567896:target20:mnopqrstuvwxyz123456e1:q9:find_node1:t2:aa1:y1:qe

Response = {"t":"aa", "y":"r", "r": {"id":"0123456789abcdefghij", "nodes": "def456..."}}
bencoded = d1:rd2:id20:0123456789abcdefghij5:nodes9:def456...e1:t2:aa1:y1:re

get_peers

Get peers associated with a torrent infohash. "q" = "get_peers" A get_peers query has two arguments, "id" containing the node ID of the querying node, and "info_hash" containing the infohash of the torrent. If the queried node has peers for the infohash, they are returned in a key "values" as a list of strings. Each string containing "compact" format peer information for a single peer. If the queried node has no peers for the infohash, a key "nodes" is returned containing the K nodes in the queried nodes routing table closest to the infohash supplied in the query. In either case a "token" key is also included in the return value. The token value is a required argument for a future announce_peer query. The token value should be a short binary string.

arguments:  {"id" : "<querying nodes id>", "info_hash" : "<20-byte infohash of target torrent>"}

response: {"id" : "<queried nodes id>", "token" :"<opaque write token>", "values" : ["<peer 1 info string>", "<peer 2 info string>"]}

or: {"id" : "<queried nodes id>", "token" :"<opaque write token>", "nodes" : "<compact node info>"}

Example Packets:

get_peers Query = {"t":"aa", "y":"q", "q":"get_peers", "a": {"id":"abcdefghij0123456789", "info_hash":"mnopqrstuvwxyz123456"}}
bencoded = d1:ad2:id20:abcdefghij01234567899:info_hash20:mnopqrstuvwxyz123456e1:q9:get_peers1:t2:aa1:y1:qe

Response with peers = {"t":"aa", "y":"r", "r": {"id":"abcdefghij0123456789", "token":"aoeusnth", "values": ["axje.u", "idhtnm"]}}
bencoded = d1:rd2:id20:abcdefghij01234567895:token8:aoeusnth6:valuesl6:axje.u6:idhtnmee1:t2:aa1:y1:re

Response with closest nodes = {"t":"aa", "y":"r", "r": {"id":"abcdefghij0123456789", "token":"aoeusnth", "nodes": "def456..."}}
bencoded = d1:rd2:id20:abcdefghij01234567895:nodes9:def456...5:token8:aoeusnthe1:t2:aa1:y1:re

announce_peer

Announce that the peer, controlling the querying node, is downloading a torrent on a port. announce_peer has four arguments: "id" containing the node ID of the querying node, "info_hash" containing the infohash of the torrent, "port" containing the port as an integer, and the "token" received in response to a previous get_peers query. The queried node must verify that the token was previously sent to the same IP address as the querying node. Then the queried node should store the IP address of the querying node and the supplied port number under the infohash in its store of peer contact information.

There is an optional argument called implied_port which value is either 0 or 1. If it is present and non-zero, theport argument should be ignored and the source port of the UDP packet should be used as the peer's port instead. This is useful for peers behind a NAT that may not know their external port, and supporting uTP, they accept incoming connections on the same port as the DHT port.

arguments:  {"id" : "<querying nodes id>",
  "implied_port": <0 or 1>,
  "info_hash" : "<20-byte infohash of target torrent>",
  "port" : <port number>,
  "token" : "<opaque token>"}

response: {"id" : "<queried nodes id>"}

Example Packets:

announce_peers Query = {"t":"aa", "y":"q", "q":"announce_peer", "a": {"id":"abcdefghij0123456789", "implied_port": 1, "info_hash":"mnopqrstuvwxyz123456", "port": 6881, "token": "aoeusnth"}}
bencoded = d1:ad2:id20:abcdefghij01234567899:info_hash20:<br />
mnopqrstuvwxyz1234564:porti6881e5:token8:aoeusnthe1:q13:announce_peer1:t2:aa1:y1:qe

Response = {"t":"aa", "y":"r", "r": {"id":"mnopqrstuvwxyz123456"}}
bencoded = d1:rd2:id20:mnopqrstuvwxyz123456e1:t2:aa1:y1:re

DHT 協議作爲 BT 協議的一個輔助審覈中

DHT 網絡爬蟲的實現

Overview

Routing Table

BitTorrent Protocol Extension

Torrent File Extensions

KRPC Protocol

Contact Encoding

Queries

Responses

Errors

DHT Queries

ping

find_node

get_peers

announce_peer

BigDecimal java.lang.ArithmeticException: / by zero問題

jsgrid多個自定義控件按鈕？

Java實現FTP文件上傳和下載

mysql和Oracle 查詢某個時間之內的數據

【精·超詳細】一個Tomcat，開啓多個端口,啓動多個項目（一看就會）

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

DHT 協議作爲 BT 協議的一個輔助 審覈中

DHT 網絡爬蟲的實現

Overview

Routing Table

BitTorrent Protocol Extension

Torrent File Extensions

KRPC Protocol

Contact Encoding

Queries

Responses

Errors

DHT Queries

ping

find_node

get_peers

announce_peer

DHT 協議作爲 BT 協議的一個輔助審覈中