SMTP_POP3與MIME協議整理

郵件協議整理

寫在前面

最開始的郵件傳輸是根據SMTP實現的,但由於歷史原因,Internet上的很多網關不能正確傳輸8 bit內碼的字符,比如漢字等。所以出現了對郵件內容編碼的需要。這樣,在郵件協議中除了smtppop外,又增加了與編碼相關的MIME

概括地說,smtppop與郵件的接收、發送過程相關,這兩者負責郵件的傳輸;而MIME與郵件內容(這裏,郵件內容包括發件人信息、收件人/抄送人信息、郵件正文、附件)相關,約定了被傳輸郵件的格式。可以這樣理解,smtppop完成了郵差的工作,mime解決了信件(包括信封)格式的問題。沒有mime之前,郵差只能給美國人送郵件;有了mime之後,郵差可以提供國際快遞業務了。

1.  Smtp

SMTP(Simple Mail Transfer Protocol):簡單郵件傳輸協議,是一組用於由源地址到目的地址傳送郵件的規則,由它來控制信件的中轉方式。SMTP協議屬於TCPIP協議族,它幫助每臺計算機在發送或中轉信件時找到下一個目的地。

關於SMTP的詳細介紹參考rfc821http://tools.ietf.org/html/rfc821

Rfc2821http://tools.ietf.org/html/rfc2821

驗證過程

>auth login ---進行用戶身份認證
<334 VXNlcm5hbWU6 ---BASE64編碼“Username:”
>
Y29zdGFAYW1heGl0Lm5ldA== ----發送BASE64編碼的用戶名
<334 UGFzc3dvcmQ6 ---BASE64編碼"Password:"
>
MTk4MjIxNA== ---客戶端發送BASE64編碼的密碼
<235 auth successfully ---成功

 

客戶端命令:

HELO/EHLO                向服務器發出請求

AUTH LOGIN             用戶身份認證

MAIL FROM:              發件人信息,

RCPT TO:                   收件人信息,告訴服務器郵件發送給誰,

可重複多次,發送給多個收件人

DATA                          郵件內容

QUIT                          本次請求結束

服務器返回值:

220 <domain> Service ready

221 <domain> Service closing transmission channel

250 Requested mail action okay, completed

354 Start mail input; end with <CRLF>.<CRLF>       data命令的應答

其它參考【rfc821】、【rfc2821

 

示例:

R: 220 USC-ISI.ARPA Simple Mail Transfer Service Ready

S: HELO LBL-UNIX.ARPA

R: 250 USC-ISI.ARPA

 

S: MAIL FROM:<[email protected]>

R: 250 OK

 

S: RCPT TO:<[email protected]>

R: OK

 

S: DATA

R: 354 Start mail input; end with <CRLF>.<CRLF>

S: Blah blah blah...

S: ...etc. etc. etc.

S: .

R: 250 OK

 

S: QUIT

R: 221 USC-ISI.ARPA Service closing transmission channel

【注意】 DATA命令之後,若郵件服務器返回354狀態值表示開始接收數據;用戶開始發送數據,郵件數據連續發送,並以<CRLF>.<CRLF>結束。因爲後面採用對郵件內容採用了mime編碼的原因,data數據中不會出現<CRLF>.<CRLF>字段與上面的結束符衝突。

      The mail data may contain any of the 128 ASCII character codes, although experience has indicated that use of control characters other than SP, HT, CR, and LF may cause problems and SHOULD be avoided when possible.

 

2.  pop

POP的全稱是 Post Office Protocol,即郵局協議,用於電子郵件的接收,它使用TCP110端口。

參考rfc1939http://tools.ietf.org/html/rfc1939

常用命令

大部分郵件服務器使用明文的用戶名、密碼進行認證。

命令參數           狀態     描述
------------------------------------------
USER username     
認證     此命令與下面的pass命令若成功,將導致狀態轉換
PASS password    認證
APOP Name,Digest 認證    DigestMD5消息摘要
------------------------------------------
STAT None           
處理     請求服務器發回關於郵箱的統計資料,如郵件總數和總字節                                         數
UIDL [Msg#]       處理     返回郵件的唯一標識符,POP3會話的每個標識符都將是唯                                    一的
LIST [Msg#]              處理     返回郵件數量和每個郵件的大小
RETR [Msg#]      處理     返回由參數標識的郵件的全部文本
DELE [Msg#]      處理     服務器將由參數標識的郵件標記爲刪除,由quit命令執行
RSET None          處理     服務器將重置所有標記爲刪除的郵件,用於撤消DELE命                                      令
TOP [Msg#]        處理     服務器將返回由參數標識的郵件前n行內容,n必須是正整                                   數
NOOP None          處理     服務器返回一個肯定的響應
------------------------------------------
QUIT None           
更新

 

【注意】任何郵件的刪除都必須在quit命令發出後對已標記爲刪除的郵件執行刪除操作,若發生訪問中斷,沒有發出quit命令,那麼雖然執行過dele命令,郵件仍不會被刪除。

在客戶端發出RETR 305命令後,服務器立即返回數據,數據可分在幾個包中連續發送。郵件內容用<CRLF>.<CRLF>結束。

如下:

+OK 2281 octets

Received: from mail-pz0-f178.google.com ([209.85.222.178])

         by oa.legendsec.com (Lotus Domino Release 6.5.3)

         with ESMTP id 2009063010503284-48548 ;

         Tue, 30 Jun 2009 10:50:32 +0800

Received: by pzk8 with SMTP id 8so621168pzk.28

       for <[email protected]>; Mon, 29 Jun 2009 19:50:21 -0700 (PDT)

DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;

.............

MIME-Version: 1.0

Received: by 10.142.139.9 with SMTP id m9mr316739wfd.174.1246330221459; Mon,

29 Jun 2009 19:50:21 -0700 (PDT)

Date: Tue, 30 Jun 2009 10:50:21 +0800

Message-ID: <3627518b0906291950v104c242

 

郵件內容需要從返回的郵件數據中解析。

郵件格式與smtp發送郵件相同,在下面的mime節介紹。

3.  MIME

rfc文檔中有MIME的詳細說明。

3.1. 郵件mime格式

參考:

rfc4021Registration of Mail and MIME Header Fields,

 http://www.apps.ietf.org/rfc/rfc4021.html,

總體來說,MIME消息由消息頭和消息體兩大部分組成。這裏,我們稱爲郵件頭、郵件體。

3.1.1.郵件頭

郵件頭包含了發件人、收件人、主題、時間、MIME版本、郵件內容的類型等重要信息。每條信息稱爲一個域,由域名後加“: ”和信息內容構成,可以是一行,較長的也可以佔用多行。域的首行必須頂頭寫,即左邊不能有空白字符(空格和製表符);續行則必須以空白字符打頭,且第一個空白字符不是信息本身固有的,解碼時要過濾掉。

郵件頭中不允許出現空行。有一些郵件不能被郵件客戶端軟件識別,顯示的是原始碼,就是因爲首行是空行。

例如:

 

 

常見信息如下

Date: Mon, 29 Jun 2009 18:39:03 +0800

From: "=?gb2312?B?26zQocHB?=" <[email protected]>

To: "moreorless" <[email protected]>

Cc: "gxl0620" <[email protected]>

BCC: "=?gb2312?B?26zQocHB?=" <[email protected]>

Subject: attach

Message-ID: <[email protected]>

X-mailer: Foxmail 6, 15, 201, 21 [cn]

Mime-Version: 1.0

 

Date              日期

From:            發件人信息

To:                收件人信息

Cc:               抄送人信息

BCC:             密送人信息

Subject:         主題

X-mailer        客戶端名稱

非標準的、自定義域名都以X-開頭,例如X-Mailer, X-MSMail-Priority等,通常在接收和發送郵件的是同一程序時才能理解它們的意義。

 

關於密送:有三種實現方式,

1.        在郵件服務器發送郵件前,將收件人、抄送人、密送人的郵件的Bcc行都刪除。

2.        在郵件服務器發送郵件前,收件人、抄送人的郵件刪除Bcc欄,只有密送人收到的郵件包含該字段。如果有多個密送人,可能在密送欄有所有密送人地址、或只有自己的地址

3.        郵件服務器拿到的郵件內容中根本不出現Bcc欄。

The "Bcc:" field (where the "Bcc" means "Blind Carbon Copy") contains addresses of recipients of the message whose addresses are not to be revealed to other recipients of the message. There are three ways in which the "Bcc:" field is used.

In the first case, when a message containing a "Bcc:" field is prepared to be sent, the "Bcc:" line is removed even though all of the recipients (including those specified in the "Bcc:" field) are sent a copy of the message.

In the second case, recipients specified in the "To:" and "Cc:" lines each are sent a copy of the message with the "Bcc:" line removed as above, but therecipients on the "Bcc:" line get a separate copy of the messagecontaining a "Bcc:" line. (When there are multiple recipient addresses in the "Bcc:" field, some implementations actually send a separate copy of the message to each recipient with a "Bcc:" containing only the address of that particular recipient.)

Finally, since a "Bcc:" field may contain no addresses, a "Bcc:" field can be sent without any addresses indicating to the recipients that blind copies were sent to someone. Which method to use with "Bcc:" fields is implementation dependent, but refer to the "Security
 Considerations" section of this document for a discussion of each.

(來源:http://www.apps.ietf.org/rfc/rfc2822.html#sec-3.6.3)

3.1.2.郵件體

在郵件體中,大致有如下一些域:

域名含義

  Content-Type                     段體的類型

  Content-Transfer-Encoding        段體的傳輸編碼方式

  Content-Disposition              段體的安排方式

  Content-ID                       段體的ID

  Content-Location                 段體的位置(路徑)

  Content-Base                     段體的基位置

有的域除了值之外,還帶有參數。值與參數、參數與參數之間以“;”分隔。參數名與參數值之間以“=”分隔。

郵件體包含郵件的內容,它的類型由郵件頭的“Content-Type”域指出。常見的簡單類型有text/plain(純文本)text/html(超文本)

multipart類型,是MIME郵件的精髓。郵件體被分爲多個段,每個段又包含段頭和段體兩部分,這兩部分之間也以空行分隔。常見的multipart類型有三種:multipart/mixed, multipart/relatedmultipart/alternative。從它們的名稱,不難推知這些類型各自的含義和用處。它們之間的層次關係可歸納爲下圖所示:

 

可以看出,如果在郵件中要添加附件,必須定義multipart/mixed段;如果存在內嵌資源,至少要定義multipart/related段;如果純文本與超文本共存,至少要定義multipart/alternative段。

郵件正文

Content-Type: text/plain;

      charset="gb2312"

Content-Transfer-Encoding: base64

 

DQoNCjIwMDktMDctMDEgDQoNCg0KDQrbrNChwcEgDQo=

 

上面的郵件正文使用gb2312字符集、base64編碼

 

附件處理

.multipart/mixed:表示文檔的多個部分是混合的,指正文與附件的關係。如果郵件的MIME類型是multipart/mixed,即表示郵件帶有附件。

Content-Disposition          Intended content disposition and file name

Indicates whether a MIME body part is to be shown inline or is an attachment; can also indicate a suggested filename for use when saving an attachment to a file.

例:

1.附件名:readme.txt

Content-Transfer-Encoding: base64

Content-Disposition: attachment;

      filename="readme.txt"

2.附件名:郵件內容

Content-Transfer-Encoding: base64

Content-Disposition: attachment;

      filename="=?gb2312?B?08q8/sTayN0udHh0?="

filename後是編碼後的附件內容。

3.2.   MIME編碼

參考rfc2047MIME Part Three:Message Header Extensions for Non-ASCII Text

 http://tools.ietf.org/html/rfc2047

 

 MIME編碼的兩種方法:

    對郵件進行編碼最初的原因是因爲Internet上的很多網關不能正確傳輸8bit內碼的字符,比如漢字等。編碼的原理就是把8bit的內容轉換成7bit的形式以能正確傳輸,在接收方收到之後,再將其還原成8bit的內容。   

    MIME是“多用途網際郵件擴充協議”的縮寫,在MIME協議之前,郵件的編碼曾經有過UUENCODE等編碼方式,但是由於MIME協議算法簡單,並且易於擴展,現在已經成爲郵件編碼方式的主流,不僅是用來傳輸8 bit的字符,也可以用來傳送二進制的文件,如郵件附件中的圖像、音頻等信息,而且擴展了很多基於MIME的應用。

 

從編碼方式來說,MIME定義了兩種編碼方法Base64QP(Quote-Printable)

3.1.1. Base64

      Base64是一種通用的方法,其原理很簡單,就是把三個Byte的數據用4Byte表示,這樣,這四個Byte中,實際用到的都只有前面6 bit,這樣就不存在只能傳輸7bit的字符的問題了。Base64的縮寫一般是“B”。

Base64將輸入的字符串或一段數據編碼成只含有{'A'-'Z', 'a'-'z', '0'-'9', '+', '/'}64個字符的串,'='用於填充。其編碼的方法是,將輸入數據流每次取6bit,用此6bit的值(0-63)作爲索引去查表,輸出相應字符。這樣,每3個字節將編碼爲4個字符(3×8 → 4×6);不滿4個字符的以'='填充。 Base64的算法很簡單,它將字符流順序放入一個24位的緩衝區,缺字符的地方補零。  然後將緩衝區截斷成爲4個部分,高位在先,每個部分6位,用64個字符重新表示。如果輸入只有一個或兩個字節,那麼輸出將用等號“=”補足。這可以隔斷附加的信息造成編碼的混亂。

3.2.2 QP

另一種方法是QP(Quote-Printable)方法,通常縮寫爲“Q”方法,其原理是把一個8 bit   的字符用兩個16進制數值表示,然後在前面加“=”。所以我們看到經過QP編碼後的文件通常是這個樣子:=B3=C2=BF=A1=C7=E5=A3=AC=C4=FA=BA=C3=A3=A1

QP編碼要求編碼後每行不能超過76個字符。當超過這個限制時,將使用軟換行,用=表示編碼行的斷行,後接CRLF。(76的限制包括=)。

“=”等號被編碼爲”=3D”

tab和空格出現在行尾時,需要被編碼爲”=09”tab “=20”(space)

 

Any 8-bit byte value may be encoded with 3 characters, an "=" followed by two hexadecimal digits (0–9 or A–F) representing the byte's numeric value. For example, a US-ASCII form feed character (decimal value 12) can be represented by "=0C", anda US-ASCII equal sign (decimal value 61) is represented by "=3D". All characters except printable ASCII characters or end of line characters must be encoded in this fashion.

All printable ASCII characters (decimal values between 33 and 126) may be represented by themselves, except "=" (decimal 61).

ASCII tab and space characters, decimal values 9 and 32, may be represented by themselves, except if these characters appear at the end of a line. If one of these characters appears at the end of a line it must be encoded as "=09" (tab) or "=20" (space).

If the data being encoded contains meaningful line breaks, they must be encoded as an ASCII CR LF sequence, not as their original byte values. Conversely if byte values 10 and 13 have meanings other than end of line then they must be encoded as =0A and =0D.

Lines of quoted-printable encoded data must not be longer than 76 characters. To satisfy this requirement without altering the encoded text, soft line breaks may be added as desired. A soft line break consists of an "=" at the end of an encoded line, and does not cause a line break in the decoded text.

 

編碼格式:encoded-word = "=?" charset "?" encoding "?" encoded-text "?="

編碼信息有"=?""?="括起來,"=?"後是字符集名稱,再一個"?"後是編碼方式,再一個"?"後是編碼後的字符串。字符集和編碼方式都不區分大小寫。

字符集可以是任意系統支持的字符集(iso-8859-1utf-8gb2312gbkgb18030....

編碼方式有兩種:"B""b"代表base64編碼;"Q""q"代表QP編碼。

Generally, an "encoded-word" is a sequence of printable ASCII characters that begins with "=?", ends with "?=", and has two "?"s in between. It specifies a character set and an encoding method, and also includes the original text encoded as graphic ASCII characters, according to the rules for that encoding method.

 

下面是一個例子:

Subject: =?gb2312?B?xOO6w6Oh?=  

這一主題的內容,這不是一段完整的編碼,只有部分是編碼了的,這個部分用=??=兩個標記括起來,=?後面說明的是這段文字的字符集是GB2312,然後一個?後面的一個B表示的是用的Base64編碼。

另一個例子:=?iso-8859-1?q?this=20is=20some=20text?=

 

 

4.  smtpmime的關係

 

從上圖可以看出發件人、收件人地址都出現了兩次,一次在smtp命令中(SMTP email address),一次在郵件正文中(MIME email address)。需要注意的是:

1.        郵件正文中可以包含發件人、收件人的別名,smtp命令中不可以

2.        密送人的地址不一定會出現在郵件正文中。不同客戶端實現不同。

 

5.  一些測試數據

1.     Utf8

1.        郵件主題:smtp_utf8測試

 

From - Tue Jun 30 18:13:22 2009

X-Mozilla-Status: 0001

X-Mozilla-Status2: 00800000

X-Mozilla-Keys:                                                                                

Message-ID: <[email protected]>

Date: Tue, 30 Jun 2009 18:13:12 +0800

From: =?UTF-8?B?6YOc5bCP5Lqu?= <[email protected]>

User-Agent: Thunderbird 2.0.0.22 (Windows/20090605)

MIME-Version: 1.0

To: [email protected]

Subject: =?UTF-8?B?c210cF91dGY45rWL6K+V?=

Content-Type: text/plain; charset=UTF-8; format=flowed

Content-Transfer-Encoding: 8bit

 

2.        郵件主題:smtp_utf8

From - Tue Jun 30 18:13:22 2009

X-Mozilla-Status: 0001

X-Mozilla-Status2: 00800000

X-Mozilla-Keys:                                                                                

Message-ID: <[email protected]>

Date: Tue, 30 Jun 2009 18:13:12 +0800

From: =?UTF-8?B?6YOc5bCP5Lqu?= <[email protected]>

User-Agent: Thunderbird 2.0.0.22 (Windows/20090605)

MIME-Version: 1.0

To: [email protected]

Subject: =?UTF-8?B?c210cF91dGY45rWL6K+V?=

Content-Type: text/plain; charset=UTF-8; format=flowed

Content-Transfer-Encoding: 8bit

 

3.        郵件主題:smtp                                      不需要編碼使用7bit傳輸

From - Tue Jun 30 18:19:25 2009

X-Mozilla-Status: 0001

X-Mozilla-Status2: 00800000

X-Mozilla-Keys:                                                                                

Message-ID: <[email protected]>

Date: Tue, 30 Jun 2009 18:19:25 +0800

From: =?UTF-8?B?6YOc5bCP5Lqu?= <[email protected]>

User-Agent: Thunderbird 2.0.0.22 (Windows/20090605)

MIME-Version: 1.0

To: [email protected]

Subject: smtp

Content-Type: text/plain; charset=UTF-8; format=flowed

Content-Transfer-Encoding: 7bit

 

2.     GB2312

郵件主題:中文

From - Tue Jun 30 18:32:03 2009

X-Mozilla-Status: 0001

X-Mozilla-Status2: 00800000

X-Mozilla-Keys:                                                                                

Message-ID: <[email protected]>

Date: Tue, 30 Jun 2009 18:32:02 +0800

From: =?GB2312?B?26zQocHB?= <[email protected]>

User-Agent: Thunderbird 2.0.0.22 (Windows/20090605)

MIME-Version: 1.0

To: [email protected]

Subject: =?GB2312?B?1tDOxA==?=

Content-Type: text/plain; charset=GB2312

Content-Transfer-Encoding: 7bit

 

 

3.     Gb18030

郵件主題:中文

From - Tue Jun 30 18:33:47 2009

X-Mozilla-Status: 0001

X-Mozilla-Status2: 00800000

X-Mozilla-Keys:                                                                                

Message-ID: <[email protected]>

Date: Tue, 30 Jun 2009 18:33:47 +0800

From: =?gb18030?Q?=DB=AC=D0=A1=C1=C1?= <[email protected]>

User-Agent: Thunderbird 2.0.0.22 (Windows/20090605)

MIME-Version: 1.0

To: [email protected]

Subject: =?gb18030?Q?=D6=D0=CE=C4?=

Content-Type: text/plain; charset=GB18030; format=flowed

Content-Transfer-Encoding: 7bit

 

6.  一封郵件的完整mime信息

Date: Mon, 29 Jun 2009 18:39:03 +0800

From: "=?gb2312?B?26zQocHB?=" <[email protected]>

To: "moreorless" <[email protected]>

Cc: "gxl0620" <[email protected]>

BCC: "=?gb2312?B?26zQocHB?=" <[email protected]>

Subject: attach

Message-ID: <[email protected]>

X-mailer: Foxmail 6, 15, 201, 21 [cn]

Mime-Version: 1.0

Content-Type: multipart/mixed;

      boundary="=====001_Dragon777814155473_====="

 

This is a multi-part message in MIME format.

 

--=====001_Dragon777814155473_=====

Content-Type: multipart/alternative;

      boundary="=====003_Dragon777814155473_====="

 

 

--=====003_Dragon777814155473_=====

Content-Type: text/plain;

      charset="gb2312"

Content-Transfer-Encoding: base64

 

DQoNCjIwMDktMDYtMjkgDQoNCg0KDQrbrNChwcEgDQo=

 

--=====003_Dragon777814155473_=====

Content-Type: text/html;

      charset="gb2312"

Content-Transfer-Encoding: base64

 

PCFET0NUWVBFIEhUTUwgUFVCTElDICItLy9XM0MvL0RURCBIVE1MIDQuMCBUcmFuc2l0aW9uYWwv

L0VOIj4NCjxIVE1MPjxIRUFEPg0KPE1FVEEgY29udGVudD0idGV4dC9odG1sOyBjaGFyc2V0PWdi

MjMxMiIgaHR0cC1lcXVpdj1Db250ZW50LVR5cGU+DQo8TUVUQSBuYW1lPUdFTkVSQVRPUiBjb250

ZW50PSJNU0hUTUwgOC4wMC42MDAxLjE4NzAyIj48TElOSyByZWw9c3R5bGVzaGVldCANCmhyZWY9

IkJMT0NLUVVPVEV7bWFyZ2luLVRvcDogMHB4OyBtYXJnaW4tQm90dG9tOiAwcHg7IG1hcmdpbi1M

ZWZ0OiAyZW19Ij48L0hFQUQ+DQo8Qk9EWSBzdHlsZT0iTUFSR0lOOiAxMHB4OyBGT05ULUZBTUlM

WTogdmVyZGFuYTsgRk9OVC1TSVpFOiAxMHB0Ij4NCjxESVY+PEZPTlQgc2l6ZT0yIGZhY2U9VmVy

ZGFuYT48L0ZPTlQ+Jm5ic3A7PC9ESVY+DQo8RElWPjxGT05UIHNpemU9MiBmYWNlPVZlcmRhbmE+

PC9GT05UPiZuYnNwOzwvRElWPg0KPERJViBhbGlnbj1sZWZ0PjxGT05UIGNvbG9yPSNjMGMwYzAg

c2l6ZT0yIGZhY2U9VmVyZGFuYT4yMDA5LTA2LTI5IA0KPC9GT05UPjwvRElWPjxGT05UIHNpemU9

MiBmYWNlPVZlcmRhbmE+DQo8SFIgc3R5bGU9IldJRFRIOiAxMjJweDsgSEVJR0hUOiAycHgiIGFs

aWduPWxlZnQgU0laRT0yPg0KDQo8RElWPjxGT05UIGNvbG9yPSNjMGMwYzAgc2l6ZT0yIGZhY2U9

VmVyZGFuYT48U1BBTj7brNChwcE8L1NQQU4+IA0KPC9GT05UPjwvRElWPjwvRk9OVD48L0JPRFk+

PC9IVE1MPg0K

 

--=====003_Dragon777814155473_=====--

--=====001_Dragon777814155473_=====

Content-Type: application/octet-stream;

      name="readme.txt"

Content-Transfer-Encoding: base64

Content-Disposition: attachment;

      filename="readme.txt"

 

YWJjZGVkZg==

 

--=====001_Dragon777814155473_=====--

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章