RFC 822 中文版 MIME解析基礎(4)(第5-6也)

第5頁

--------------------------------------------------------------------------------------------------------------------------------------
3. LEXICAL ANALYSIS OF MESSAGES

3.1. GENERAL DESCRIPTION

          A message consists of header fields and, optionally, a body.
     The body is simply a sequence of lines containing ASCII charac-
     ters. It is separated from the headers by a null line (i.e., a
     line with nothing preceding the CRLF).

3.1.1. LONG HEADER FIELDS

        Each header field can be viewed as a single, logical line of
        ASCII characters, comprising a field-name and a field-body.
        For convenience, the field-body portion of this conceptual
        entity can be split into a multiple-line representation; this
        is called "folding". The general rule is that wherever there
        may be linear-white-space (NOT simply LWSP-chars), a CRLF
        immediately followed by AT LEAST one LWSP-char may instead be
        inserted. Thus, the single line

To: "Joe & J. Harvey" <ddd @Org>, JJV @ BBN

can be represented as:

To: "Joe & J. Harvey" <ddd @ Org>,
JJV@BBN

and

            To: "Joe & J. Harvey"
                            <ddd@ Org>, JJV
             @BBN

and

To: "Joe &
J. Harvey" <ddd @ Org>, JJV @ BBN

             The process of moving from this folded   multiple-line
        representation of a header field to its single line represen-
        tation is called "unfolding". Unfolding is accomplished by
        regarding   CRLF   immediately followed by a LWSP-char as
        equivalent to the LWSP-char.

        Note: While the standard permits folding wherever linear-
               white-space is permitted, it is recommended that struc-
               tured fields, such as those containing addresses, limit
               folding to higher-level syntactic breaks. For address
               fields, it is recommended that such folding occur
               between addresses, after the separating comma.

--------------------------------------------------------------------------------------------------------------------------------------
3. 消息詞法分析

3.1. 總體描述

          一個消息包含消息頭和可選的消息體。消息體是多行的ASCII字符組成
     的文本。消息體和消息頭之間用一個空行隔開(空行是指一行文本，在下一個
     回車換行之前沒有任何內容)。

3.1.1. 長消息頭

        每個消息頭能被看作一個ASCII字母組成的邏輯行，由消息頭名稱和消息頭
        內容組成。爲了方便使用，消息頭的內容可以多行存放。這個多行存放的
        方法又叫"可摺疊"(folding). 消息頭的內容折行的規則是: 在CRLF(注:回車換行)之後
        緊跟至少一個LWSP-char(注:LSWP-char指空格或者製表符tab).
        (注:消息頭的名字裏不能有空格,消息頭必須頂格寫,所以正常的消息頭是不
        會出現行開始就是空格的情況, 如果出現,就是上一行的繼續)

        這樣,單行的消息頭例子如下:
            To: "Joe & J. Harvey" <ddd @Org>, JJV @ BBN

        也可以寫成這樣

To: "Joe & J. Harvey" <ddd @ Org>,
JJV@BBN

或者

            To: "Joe & J. Harvey"
                            <ddd@ Org>, JJV
             @BBN

或者

To: "Joe &
J. Harvey" <ddd @ Org>, JJV @ BBN

             這些摺疊的多行信息變成一行的處理過程叫"展開"(unfolding)."展開"的過程
        可以理解爲把消息頭中的CRLF+LWSP-char替換爲LWSP-char

        注意: 儘管標準允許在一行的任意位置進行折行, 但是一些結構化, 如To, Cc之類
        包含地址信息的域, 推薦在語義的較高層次進行折行. 以地址域爲例, 推薦在各地
        址之間, 逗號之後折行.
        (注：例子：　　To: "tom"<[email protected]>,"jack"<[email protected]>
             也可以爲：To: "tom"<[email protected]>,
                           "jack"<[email protected]>
             這樣折行比較好看，易懂，標準推薦下一種方式)

--------------------------------------------------------------------------------------------------------------------------------------

第6頁

--------------------------------------------------------------------------------------------------------------------------------------

3.1.2. STRUCTURE OF HEADER FIELDS

        Once a field has been unfolded, it may be viewed as being com-
        posed of a field-name followed by a colon (":"), followed by a
        field-body, and terminated by a carriage-return/line-feed.
        The field-name must be composed of printable ASCII characters
        (i.e., characters that have values between 33. and 126.,
        decimal, except colon). The field-body may be composed of any
        ASCII characters, except CR or LF. (While CR and/or LF may be
        present in the actual text, they are removed by the action of
        unfolding the field.)

        Certain field-bodies of headers may be interpreted according
        to an internal syntax that some systems may wish to parse.
        These fields are called "structured   fields".    Examples
        include fields containing dates and addresses. Other fields,
        such as "Subject" and "Comments", are regarded simply as
        strings of text.

        Note: Any field which has a field-body that is defined as
               other than simply <text> is to be treated as a struc-
               tured field.

               Field-names, unstructured field bodies and structured
               field bodies each are scanned by their own, independent
               "lexical" analyzers.

3.1.3. UNSTRUCTURED FIELD BODIES

        For some fields, such as "Subject" and "Comments", no struc-
        turing is assumed, and they are treated simply as <text>s, as
        in the message body. Rules of folding apply to these fields,
        so that such field bodies which occupy several lines must
        therefore have the second and successive lines indented by at
        least one LWSP-char.

3.1.4. STRUCTURED FIELD BODIES

        To aid in the creation and reading of structured fields, the
        free insertion   of linear-white-space (which permits folding
        by inclusion of CRLFs) is allowed between lexical tokens.
        Rather than obscuring the syntax specifications for these
        structured fields with explicit syntax for this linear-white-
        space, the existence of another "lexical" analyzer is assumed.
        This analyzer does not apply for unstructured field bodies
        that are simply strings of text, as described above. The
        analyzer provides an interpretation of the unfolded text
        composing the body of the field as a sequence of lexical sym-
        bols.

--------------------------------------------------------------------------------------------------------------------------------------

3.1.2. 消息頭域的結構

        一個沒有摺疊的消息頭域，由兩部分組成，用冒號(":")分開，左邊是名稱
        右邊是域的內容，最後由回車換行(CRLF)結束．消息頭域的名稱必須由可
        打印的ASCII字母組成( 可打印的ASCII字母指0x21~0x7e之間的，除了冒
        號0x3A之外的字符 )，消息頭域的內容可以由任意的ASCII數字組成，除了
        回車和換行( 注:CR, LF是回車換行，CR回車，'/r',值爲0x0d, LF換行，
        '/n', 值爲0x0a)(CR,LF可能在消息頭域的內容包含多行時存在，但是多行
        信息在做"展開"(unfolding)的動作時，會把這些CR和LF都刪除掉)

        一些消息頭域的內容部分，可能還需要再次解釋，它們有自己的內部語法
        (注:如地址被編碼爲<?GB18030?B?suLK1A==?=>)．這樣的域叫結構化域
        ("structured   fields")，例如包含時間和地址的域．而其他域，例如
        標題("Subject")，註釋("Comments")，都被認爲是簡單的字符串域，
        而不是結構化域

        注意：有一些消息頭域的內容部分，它們在標準裏有定義，它們不是簡單文本，
        處理它們時要把它們看作結構化域．
        　　　
        　　　而消息頭域的名稱，結構化和非結構化域的消息頭內容，會使用不同的，
        各自獨立的詞法分析器

3.1.3. 非結構化的消息頭域內容

        一些消息頭的域，如"Subject"和"Comments"，在消息體裏，會被假設爲非結構的
        內容，它們被當作文本來處理(<text>)。折行策略可能會被使用，因此如果
        內容有多行，那麼除了第一行以外的行必須用LWSP-char開始

3.1.4. 結構化的消息頭域內容

        爲了幫助創建和讀取結構化的域，折行字符(注：linear-white-space ，
        就是1*([CRLF] LWSP-char))允許出現在詞和詞的中間. (如果把折行字符放到
        詞內部)會使這些結構化域內容的明確語法定義產生混亂，我們應該假設存在
        另外的詞法解析器專門解析此結構化域。這個解析器不需要能解析前面提到的
        能折行的簡單文本。但需要能解釋展開了的文本(把折行字符去掉的文本)。

--------------------------------------------------------------------------------------------------------------------------------------

RFC 822 中文版 MIME解析基礎(4)(第5-6也)

《日本蠟燭圖》讀書筆記 & 技術分析回測

Python多線程編程深度探索：從入門到實戰

《期貨-市場技術分析》讀書筆記

mongodb處理json數據很好

頂級 Javaer 都在用的 20 個類庫，真香！

[轉帖]cpupower

google瀏覽器插件開發

35K*14 薪，入職了！這公司只要不裁員，我能一直呆下去！

8583例子

RFC 822 中文版 MIME解析基礎(4)(第5-6也)

2.3版本python如何進行調試

使用gcov對gcc項目進行覆蓋分析

一個關於windows服務管理的工具的源碼

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結