Wireshark抓包常見問題解析

1.   tcp out-of-order(tcp有問題)

解答:

1)、    應該有很多原因。但是多半是網絡擁塞,導致順序包抵達時間不同,延時太長,或者包丟失,需要重新組合數據單元 因爲他們可能是通過不同的路徑到達你電腦上面的。

2)、    CRM IT 同仁上禮拜來跟我反應一個問題,由他們客服系統藉由郵件主機要寄送給客戶的信件,常常會有寄送失敗的問題,查看了一下 Log,發現正常的信件在主機接收 DATA 完成後會記錄收到的郵件大小,然後開始進行後續寄送出去的處理,但這些有問題的寄送,都會發生 DATA 沒有傳送完,Server 就記錄已讀取到 EOF,然後結束連線,也因此這封信就不算順利的送到 Server 上來。

初步看了一下排除是 Timeout 問題,因爲連線斷的時間都還未達設定的連線 Timeout 時間,由於 CRM 系統是外面廠商寫的,爲了釐清問題我只好抓封包來看是不是用戶端送出來結束傳送的指令的。

抓了一下結果如下:

bao1.jpg

整封郵件的傳送過程,包含了大量的 TCP Retransmission 或是 Segment Lost,到後來還有跑出 TCP Out-Of-Order,看起來是網路的問題,網路上對於 TCP Out-Of-Order 的建議是說,有些 Packet 可能 Lost,所以重新傳送造成,另一個可能是因爲 Client 到 Server 間有兩條網路路徑,像是 Load Balance 之類的架構,因此若兩個封包走不同路徑,晚送的封包卻比早送的到達,就會發生 Out-Of-Order。

因此在斷定有可能是網路造成,加上 CRM 系統上的網卡同事是把兩張做成一張 Virtual,再請他拿掉 Bonding 只用單一張跑以後,問題就不存在了,觀察流量還跑的比原本兩張合起來的 Virtual 單張跑的高,所以 M$ 在 Bonding 網卡上是不是還有什麼需要調整的就不得而之了,至少找出造成大量寄送失敗的原因就好。

2.   tcp segment of a reassembled PDU

解答:1)在連個連接建立的時候,SYN包裏面會把彼此TCP最大的報文段長度,在局域網內一般都是1460.如果發送的包比最大的報文段長度長的話就要分片了,被分片出來的包,就會被標記了“TCP segment of a reassembled PDU”,可以參考下圖,看一下,被標記了的包的SEQ和ACK都和原來的包一致:

bao2.jpg

2)上週在公司裏遇到一個問題,用wireshark抓系統給網管上報的數據發現裏面有好多報文被標識爲“TCP segment of a reassembled PDU”,並且每一段報文都是180Byte,當時看到這樣的標識,覺得是IP報文分片,以爲系統的接口MTU值爲設置小了,通過命令查詢發現是1500,沒有被重設過,當時有點想不通。

回來查了一下,發現自己的理解是錯的,“TCP segment of a reassembled PDU”指的不是IP層的分片,IP分片在wireshark裏用“Fragmented IP protocol”來標識。詳細查了一下,發現“TCP segment of a reassembled PDU”指TCP層收到上層大塊報文後分解成段後發出去。於是有個疑問,TCP層完全可以把大段報文丟給IP層,讓IP層完成分段,爲什麼要在TCP層分呢?其實這個是由TCP的MSS(Maximum Segment Size,最大報文段長度)決定的,TCP在發起連接的第一個報文的TCP頭裏通過MSS這個可選項告知對方本端能夠接收的最大報文(當然,這個大小是TCP淨荷的大小),以太網上這個值一般設置成1460,因爲1460Byte淨荷+20Byte TCP頭+20Byte IP頭= 1500字節,正好符合鏈路層最大報文的要求。

至於收到一個報文後如何確定它是一個”TCP segment”?如果有幾個報文的ACK序號都一樣,並且這些報文的Sequence Number都不一樣,並且後一個Sequence Number爲前一個Sequence Number加上前一個報文大小再加上1的話,肯定是TCP segment了,對於沒有ACK標誌時,則無法判斷。

既然收到的TCP報文都是180Byte的segment,那麼應該是協商的時候PC端告知了MSS爲180Byte,至於爲什麼這樣,只能等抓包後確認是MSS的問題再排查了。另外,有一種情況也可能導致這個問題:被測系統因爲MTU爲220Byte而設置MSS爲180Byte,但是這種情況現在可以排除,因爲前面講過,已經查詢過MTU值爲1500。

3.   Tcp previous segment lost(tcp先前的分片丟失)

解答:

(1)、“TCP Previous segment lost” errors are not “fatal” errors. They simply indicate that the sequence number in the arriving packet is higher than the next-expected sequence number, indicating that at least one segment was dropped/lost. The receiving station remedies this situation by sending duplicate ACKs for each additional packet it receives until the sender retransmits the missing packet(s). TCP is designed to recover from this situation, which is why the p_w_picpath is downloaded correctly despite having a (briefly) missing packet.

If you are getting a large number of lost packets, then there is likely a communication problem between the sender and receiver. A common cause of this is un-matched duplex settings between the PC and the switch.

We (our lab) recently upgraded to Ethereal 0.10.14 with WinPCap 3.1.  If I remember correctly, we had previously been using 0.10.2 with WinPCap 3.0.  However, since the upgrade we have been noticing several issues.

The first issue is with “TCP Previous segment lost” and “TCP CHECKSUM INCORRECT” messages appearing in the Packet Listing window.  We do not remember seeing these in the previous version of Ethereal, or at least not nearly as many as we are seeing now.  For example, one task for the student instructional part of the lab involves visiting a website containing two p_w_picpaths and observing the network activity.  After the two GET requests are sent for the p_w_picpaths, it is not uncommon for one p_w_picpath to be returned with a typical 200 OK response packet, but the response packet for the other p_w_picpath will be displayed as “TCP Previous segment lost.”  However, both p_w_picpaths are downloaded and displayed perfectly fine in the browser.  I would think that the segment lost error would mean the object wasn’t returned correctly and shouldn’t be able to be displayed, but apparently that is not the case.  (The cache had been cleared when this was performed, so it was not defaulting to a local copy of the p_w_picpath.)

Another problem we’ve been noticing is that some packets simply aren’t displayed in the Packet Listing window, even when they are obviously received.  Using the same example as above, after the two GET requests are sent for the p_w_picpaths, it is not uncommon for one p_w_picpath to be returned with a typical 200 OK response, but the other response will not appear.  Yet both p_w_picpaths are successfully displayed in the browser.  Is this a problem with Ethereal not detecting the packets?

I’m not sure how typical this is, but we seem to be experiencing these issues often with 0.10.14 while we never did with 0.10.2.  Could it also be an issue with WinPCap, and not necessarily Ethereal?  I’m just trying to find some answers as to why we are seeing a sudden abundance of TCP related errors and uncaptured packets.  Thanks.

(2)、I have a network client application that runs fine while I am debugging (no TCP errors),

but when I run the release version, it runs incredibly slow.  It runs as a series of

transactions, where each transaction is a separate connection to the server.  Wireshark

analysis has determined that about 50% of all transactions involve the series:

TCP Previous Segment Lost

TCP Dup ACK

RST

The RST consumes 3 seconds per transaction, which is a Big Deal.  So to prevent it, I must

prevent the initial “TCP Previous Segment Lost” (which seems, on the surface, to merely be

a time-out on a particular segment).

In the following clip, the SYN packet suffers from the “TCP Previous Segment Lost” condition.

0.000640 seconds seems like too short of a time to declare this condition, as many previous

successful transactions took much longer to be successfully SYN-ACK’ed.

Can somebody explain “TCP Previous Segment Lost” in this context to help me troubleshoot my

problem?

Any help would be appreciated.

Here is a clip of a problem transaction:

fffgs

4.   Tcpacked lost segment(tcp應答丟失)

5.   Tcp window update(tcp窗口更新)

6.   Tcp dup ack(tcp重複應答)

TCP may generate an immediate acknowledgment (a duplicate ACK) when an out- of-order segment is received. This duplicate ACK should not be delayed. The purpose of this duplicate ACK is to let the other end know that a segment was received out of order, and to tell it what sequence number is expected.

當收到一個出問題的分片,Tcp立即產生一個應答。這個相同的ack不會延遲。這個相同應答的意圖是讓對端知道一個分片被收到的時候出現問題,並且告訴它希望得到的序列號。

Since TCP does not know whether a duplicate ACK is caused by a lost segment or just a reordering of segments, it waits for a small number of duplicate ACKs to be received. It is assumed that if there is just a reordering of the segments, there will be only one or two duplicate ACKs before the reordered segment is processed, which will then generate a new ACK. If three or more duplicate ACKs are received in a row, it is a strong indication that a segment has been lost. TCP then performs a retransmission of what appears to be the missing segment, without waiting for a retransmission timer to expire.

7.   Tcp keep alive(tcp保持活動)

在TCP中有一個Keep-alive的機制可以檢測死連接,原理很簡單,TCP會在空閒了一定時間後發送數據給對方:

1.如果主機可達,對方就會響應ACK應答,就認爲是存活的。

2.如果可達,但應用程序退出,對方就發RST應答,發送TCP撤消連接。

3.如果可達,但應用程序崩潰,對方就發FIN消息。

4.如果對方主機不響應ack, rst,繼續發送直到超時,就撤消連接。這個時間就是默認

的二個小時。

uses WinSock2;

procedure TForm1.IdTCPServer1Connect(AThread: TIdPeerThread);

type

TCP_KeepAlive = record

OnOff: Cardinal;

KeepAliveTime: Cardinal;

KeepAliveInterval: Cardinal

end;

var

Val: TCP_KeepAlive;

Ret: DWord;

begin

Val.OnOff:=1;

Val.KeepAliveTime:=6000; //6s

Val.KeepAliveInterval:=6000; //6s

WSAIoctl(AThread.Connection.Socket.Binding.Handle, IOC_IN or IOC_VENDOR or 4,

@Val, SizeOf(Val), nil, 0, @Ret, nil, nil)

end;

——————————————————–

KeepAliveTime值控制 TCP/IP 嘗試驗證空閒連接是否完好的頻率。如果這段時間內沒有活動,則會發送保持活動信號。如果網絡工作正常,而且接收方是活動的,它就會響應。如果需要對丟失接收方敏感,換句話說,需要更快地發現丟失了接收方,請考慮減小這個值。如果長期不活動的空閒連接出現次數較多,而丟失接收方的情況出現較少,您可能會要提高該值以減少開銷。缺省情況下,如果空閒連接 7200000 毫秒(2 小時)內沒有活動,Windows 就發送保持活動的消息。通常,1800000 毫秒是首選值,從而一半的已關閉連接會在 30 分鐘內被檢測到。

KeepAliveInterval值定義瞭如果未從接收方收到保持活動消息的響應,TCP/IP 重複發送保持活動信號的頻率。當連續發送保持活動信號、但未收到響應的次數超出TcpMaxDataRetransmissions的值時,會放棄該連接。如果期望較長的響應時間,您可能需要提高該值以減少開銷。如果需要減少花在驗證接收方是否已丟失上的時間,請考慮減小該值或TcpMaxDataRetransmissions值。缺省情況下,在未收到響應而重新發送保持活動的消息之前,Windows 會等待 1000 毫秒(1 秒)。

KeepAliveTime根據你的需要設置就行,比如10分鐘,注意要轉換成MS。

XXX代表這個間隔值得大小

8.   Tcp retransmission(tcp重傳)

作爲一個可靠的傳輸協議,傳輸控制協議(TCP)在發送主機需要從目標主機收到一個包時確認。If the sender does not receive that acknowledgment within a certain amount of time, it acts under the assumption that the packet did not reach its destination and retransmits the packet.如果發件人沒有收到的時間內一定之金額,確認,它的行爲假設下,該數據包沒有到達其目的地,以及轉發數據包。

轉載自:http://www.xianren.org/net/wireshark-q.html

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章