CocoaAsyncSocket 文檔2:常見陷阱

原文:https://github.com/robbiehanson/CocoaAsyncSocket/wiki/CommonPitfalls

Common Pitfalls - Don’t Be A Victim

Over the years we’ve noticed that many issues arrise from general confusion about the TCP protocol. Arm yourself with knowledge so you don’t lose time in the future.

常見的陷阱-不要成爲受害者

多年來,我們已經注意到,許多問題一般都是對TCP協議的理解混亂。用知識武裝自己,這樣你就不會在未來的時候浪費時間。

TCP is a stream


The TCP protocol is modeled on the concept of a single continuous stream of unlimited length. This is a very important concept to understand, and is the number one cause of confusion that we see.

What exactly does this mean, and how does it affect developers?

Imagine that you’re trying to send a few messages over the socket. So you do something like this (in pseudocode):

socket.write("Hi Sandy.");
socket.write("Are you busy tonight?");

How does the data show up on the other end? If you think the other end will receive two separate sentences in two separate reads, then you’ve just fallen victim to a common pitfall! Gasp! Read on.

TCP does not treat the writes as separate data. TCP considers all writes to be part of a single continuous stream. So when you issue the above writes, TCP will simply copy the data into its buffer:

TCP_Buffer = "Hi Sandy.Are you busy tonight?"

and then proceed to send the data as fast as possible. And in order to send data over the network, TCP and other networking protocols will be required to break that data into small pieces that can be transmitted over the medium (ethernet, WiFi, etc). In doing so, TCP may break apart the data in any way it sees fit. Here are some examples of how that data might be broken apart and sent:

"Hi San" , "dy.Ar" , "e you " , "busy to" , "night?"
"Hi Sandy.Are you busy" , " tonight?"
"Hi Sandy.Are you busy tonight?"

The above examples also demonstrate how the data will arrive at the other end. Let’s consider example 1 for a moment.

Sandy has issued a socket.read() command, and is waiting for data to arrive. So the result of her first read might be “Hi San”. Sandy will likely begin to process that data. And while the application is processing the data, the TCP stream continues to receive the 2nd and 3rd packet. Sandy then issues another socket.read() command, and this time she gets “dy.Are you “.

This highlights the continuous stream nature of TCP. The TCP protocol, at the developer API level, has absolutely no concept of packets or separation of data.

But isn’t this a major shortcoming? How do all those other protocols that use TCP work?

HTTP is a great example because it’s so simple, and because most everyone has seen it before. When a client connects to a server and sends a request, it does so in a very specific manner. It sends an HTTP header, and each line of the header is terminated with a CRLF (carriage return, line feed). So something like this:

GET /page.html HTTP/1.1
Host: google.com

Furthermore, the end of the HTTP header is signaled by two CRLF’s in a row. Since the protocol specifies the terminators, it is easy to read data from a TCP socket until the terminators are reached.

Then the server sends the response:

HTTP/1.1 200 OK
Content-Length: 216

{ Exactly 216 bytes of data go here }

Again, the HTTP protocol makes it easy to use TCP. Read data until you get back-to-back CRLF. That’s your header. Then parse the content-length from the header, and now you can simply read a certain number of bytes.

Returning to our original example, we could simply use a designated terminator for our messages:

socket.write("Hi Sandy.\n");
socket.write("Are you busy tonight?\n");

And if Sandy was using AsyncSocket she would be in luck! Because AsyncSocket provides really easy-to-use read methods that allow you to specify the terminator to look for. AsyncSocket does the rest for you, and would deliver two separate sentences in two separate reads!

TCP是一種流(沒有數據邊界)


TCP協議是一種,無限的,連續的,單一的,流的概念。這是一個非常重要的概念,是我們所看到的混淆的頭號原因。

這是什麼意思,以及如何影響開發人員呢?

想象一下你試圖通過Socket發送消息。然後你這樣做(僞碼):

socket.write("Hi Sandy.");
socket.write("Are you busy tonight?");

數據如何顯示在另一端?如你認爲對方會分兩次接收到兩個獨立的句子,你就被坑了……

TCP並不把寫操作作爲分開的數據。TCP認爲寫的是一個連續的流的一部分。所以當你的執行寫操作,TCP僅僅是將數據複製到緩衝區:

TCP_Buffer = "Hi Sandy.Are you busy tonight?"

然後儘可能快的發送數據。爲了在網絡上發送數據,TCP和其他網絡協議將需要分解成小塊,這樣數據可以在不同傳輸的介質(以太網,WiFi,等)上傳輸。在這樣做時,TCP會按照它認爲合適的任何方式分解數據。下面的例子將說明數據是如何被分解的:

"Hi San" , "dy.Ar" , "e you " , "busy to" , "night?"
"Hi Sandy.Are you busy" , " tonight?"
"Hi Sandy.Are you busy tonight?"

上面的例子也演示了數據將如何到達遠端。讓我們思考一下例子1。

sandy已經執行了socket.read()命令,並等待數據的到來。所以她第一次讀的結果可能是“Hi San”。sandy將會開始處理數據。當應用程序處理數據,TCP流繼續接收第二和第三的數據包。sandy就執行另一個Socket.read()命令,這一次她收到“dy.Are you”。

這裏強調了TCP流的連續性。TCP協議,在開發者API級別,完全沒有數據包或數據的分離的概念。

但這不是一個顯著的缺點嗎?他協議如何使用TCP的工作的呢?

HTTP是一個偉大的例子因爲它很簡單,因爲很多人都知道它。當一個客戶端連接到服務器併發送一個請求時,它會以一種非常特殊的方式進行。它發送一個HTTP header,每一行的Header用一個CRLF終止(回車,換行)。如下:

GET /page.html HTTP/1.1
Host: google.com

此外,對HTTP Header結束的標誌是兩個連續的CRLF。因爲協議指定了消息邊界,它易於從一個TCP Socket讀取數據直到達到數據邊界。

然後服務器發送響應:

HTTP/1.1 200 OK
Content-Length: 216

{ Exactly 216 bytes of data go here }

這裏,HTTP協議讓TCP協議更容易使用。讀取數據直到你得到連續的回車換行符。這是你的Header。然後從Header中解析內容長度,現在你可以簡單地按照自己讀數據。

回到我們最初的例子,我們可以簡單地使用一個爲我們的消息指定的消息邊界:

socket.write("Hi Sandy.\n");
socket.write("Are you busy tonight?\n");

如果sandy是使用asyncsocket她很幸運!因爲asyncsocket提供易於使用的閱讀方法,允許你指定要查找的消息邊界。asyncsocket會分兩次閱讀,兩個單獨的句子!

Writes


What happens when you write data to a TCP socket? When the write is complete, does that mean the other party received that data? Can we at least assume the computer has sent the data? The answer is NO and NO.

Recall two things:

  • All data sent and received must get broken into little pieces in order to send it over the network.
  • TCP handles a lot of complicated issues such as resending lost packets, and providing in-order delivery so information arrives in the proper sequence.

So when you issue a write, the data is simply copied into an underlying buffer within the OS networking stack. At that point the TCP software will begin its magic, which consists of all the cool stuff mentioned earlier such as:

  • breaking the data into small pieces such that they can be sent over the network
  • ensuring that lost pieces get properly resent
  • ensuring that your data arrives at the remote destination in the proper order
  • watching out for congestion in the network
  • employing fancy algorithms to accomplish all of this as fast as possible

So when you issue the command, “write this data” the operating system responds with “I have your data, and I will do everything in my power to deliver this to the remote destination.”

BUT… how do I know when the remote destination has received my data?

And this is exactly where most people run into problems. A good way to think about it is like this:

Imagine you want to send a letter to a friend. Not an email, but the traditional snail mail. You know, through the post office. So you write the letter and put it in your mailbox. The mailman later comes by and picks it up. You can rest assured at this point that the post office will make every effort to deliver the letter to your friend. But how do you know for sure if your friend received the letter? I suppose if the letter came back with a “return to sender” stamped on it you can be certain your friend didn’t receive it. But what if it doesn’t come back? Is it enough to know that it made it into your friend’s mailbox? (Assume this is a really, really important letter.) The answer is no. Maybe it never leaves the mailbox. Maybe his roommate picks it up and accidentally throws it away. And if the roommate was responsible and left the letter on your friends desk? Would that be enough? What if your friend was on vacation and your letter gets lost in a pile of junk mail? So the only way to truly know if your friend received the letter is when you receive their response.

This is a great metaphor for sockets. When you write data to a socket, that is like putting the letter in the mailbox. The operating system is like the local mailman that comes by and picks up the letter. The giant post office system that routes the letter toward its destination is like the network. And the mailman that drops off your letter in your friends mailbox is like the operating system on your friends computer. It is then up to the application on your friends computer to read the data from the OS and process it (fetch the letter from the mailbox, and actually read it).

So how do I know when the remote destination has received my data? This is not something that TCP can tell you. At best, it can only tell you that the letter was delivered into their mailbox. It can’t tell you if the application has read that data and processed it. Maybe the application on the remote side crashed. Or maybe the remote user quit the application before it had a chance to read the data. Or maybe the remote user experienced a power outage. Long story short, it is up to the application layer to answer this question if need be.

寫入


當你給Socket寫入數據會發生什麼?當寫入完成時,這是否意味着另一方收到了數據?我們至少可以假設計算機已經發送了數據嗎?答案是否定的。

回兩件事:

  • 所有發送和接收的數據必須被分解成小片段以便將其發送到網絡中。
  • TCP處理很多複雜的問題,如重發丟失的數據包,保證信息按序到達。

因此,當你運行寫操作,數據被簡單地複製到一個底層的緩衝區內的操作系統網絡協議棧。在這一點上的TCP軟件將開始它的魔力,它由所有很酷的東西構成,前面提到的如:

  • 把數據分成小塊,以通過網絡發送
  • 確保丟失的片段得到正確的轉發
  • 確保數據按序到達
  • 監測網絡擁堵
  • 用算法來儘快完成以上事情

因此,當你發出命令,“寫這個數據”的操作系統的響應“我有你的數據,我將盡我的力量,把數據給遠程目的地。”

但是……我怎麼知道遠端已經收到我的數據?

這正是大多數人遇到問題的地方。你可以這樣來思考:

想像你想給朋友發一封信。不是電子郵件,而是傳統的蝸牛郵件(通過郵局)。所以你寫這封信,把它放在你的郵箱裏。郵遞員過來拿起信封。你可以放心,在這一點上,郵政局將盡一切努力,把信交給你的朋友。但是你怎麼知道你的朋友收到了這封信?我想如果信被髮回,信封上印着“回發件人”的話,你可以肯定你的朋友沒有收到它。但是,如果它沒發回呢?這足以讓它到達你朋友的郵箱嗎?(假設這是一封非常重要的信。)答案是否定的。也許它永遠不會離開郵箱。也許他的室友會把它撿起來,不小心把它扔了。如果是室友,把信放在你的朋友桌上?那就足夠了嗎?如果你的朋友在休假,你的信會在一堆垃圾郵件中丟失,你會怎樣?所以唯一的辦法,是收到你朋友的回覆,才能真正知道你的朋友收到了信。本文由B9班的真高興發佈在CSDN博客

這是一個很好的比喻。當你將數據寫入一個Socket時,那就好像把信放在郵箱裏一樣。操作系統就像拿起信的郵遞員。郵政局系統將信的目的地路由到它的目的地,就像是網絡。郵差投遞你的信在你的朋友的郵箱,就像你朋友的電腦操作系統。你的朋友計算機上的應用程序,從操作系統和處理它的數據讀取(取的信,從郵箱,並實際上讀它)。

所以我怎麼知道遠端已經收到我的數據?這是不是TCP可以告訴你。它只能告訴你,這封信被送到他們的郵箱裏。如果應用程序讀取數據並處理它,它就不能告訴你。可能在遠端應用程序崩潰。或者在它有機會讀取數據之前,可能遠程用戶退出應用程序。或者可能遠程用戶體驗停電。長話短說,這裏需要應用層來回答這個問題(應用來返回)。

發佈了92 篇原創文章 · 獲贊 137 · 訪問量 19萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章