RFC 822 中文版 MIME解析基础(4)(第5-6也)

第5页

--------------------------------------------------------------------------------------------------------------------------------------
3. LEXICAL ANALYSIS OF MESSAGES

3.1. GENERAL DESCRIPTION

          A message consists of header fields and, optionally, a body.
     The body is simply a sequence of lines containing ASCII charac-
     ters. It is separated from the headers by a null line (i.e., a
     line with nothing preceding the CRLF).

3.1.1. LONG HEADER FIELDS

        Each header field can be viewed as a single, logical line of
        ASCII characters, comprising a field-name and a field-body.
        For convenience, the field-body portion of this conceptual
        entity can be split into a multiple-line representation; this
        is called "folding". The general rule is that wherever there
        may be linear-white-space (NOT simply LWSP-chars), a CRLF
        immediately followed by AT LEAST one LWSP-char may instead be
        inserted. Thus, the single line

To: "Joe & J. Harvey" <ddd @Org>, JJV @ BBN

can be represented as:

To: "Joe & J. Harvey" <ddd @ Org>,
JJV@BBN

and

            To: "Joe & J. Harvey"
                            <ddd@ Org>, JJV
             @BBN

and

To: "Joe &
J. Harvey" <ddd @ Org>, JJV @ BBN

             The process of moving from this folded   multiple-line
        representation of a header field to its single line represen-
        tation is called "unfolding". Unfolding is accomplished by
        regarding   CRLF   immediately followed by a LWSP-char as
        equivalent to the LWSP-char.

        Note: While the standard permits folding wherever linear-
               white-space is permitted, it is recommended that struc-
               tured fields, such as those containing addresses, limit
               folding to higher-level syntactic breaks. For address
               fields, it is recommended that such folding occur
               between addresses, after the separating comma.

--------------------------------------------------------------------------------------------------------------------------------------
3. 消息词法分析

3.1. 总体描述

          一个消息包含消息头和可选的消息体。消息体是多行的ASCII字符组成
     的文本。消息体和消息头之间用一个空行隔开(空行是指一行文本，在下一个
     回车换行之前没有任何内容)。

3.1.1. 长消息头

        每个消息头能被看作一个ASCII字母组成的逻辑行，由消息头名称和消息头
        内容组成。为了方便使用，消息头的内容可以多行存放。这个多行存放的
        方法又叫"可折叠"(folding). 消息头的内容折行的规则是: 在CRLF(注:回车换行)之后
        紧跟至少一个LWSP-char(注:LSWP-char指空格或者制表符tab).
        (注:消息头的名字里不能有空格,消息头必须顶格写,所以正常的消息头是不
        会出现行开始就是空格的情况, 如果出现,就是上一行的继续)

        这样,单行的消息头例子如下:
            To: "Joe & J. Harvey" <ddd @Org>, JJV @ BBN

        也可以写成这样

To: "Joe & J. Harvey" <ddd @ Org>,
JJV@BBN

或者

            To: "Joe & J. Harvey"
                            <ddd@ Org>, JJV
             @BBN

或者

To: "Joe &
J. Harvey" <ddd @ Org>, JJV @ BBN

             这些折叠的多行信息变成一行的处理过程叫"展开"(unfolding)."展开"的过程
        可以理解为把消息头中的CRLF+LWSP-char替换为LWSP-char

        注意: 尽管标准允许在一行的任意位置进行折行, 但是一些结构化, 如To, Cc之类
        包含地址信息的域, 推荐在语义的较高层次进行折行. 以地址域为例, 推荐在各地
        址之间, 逗号之后折行.
        (注：例子：　　To: "tom"<[email protected]>,"jack"<[email protected]>
             也可以为：To: "tom"<[email protected]>,
                           "jack"<[email protected]>
             这样折行比较好看，易懂，标准推荐下一种方式)

--------------------------------------------------------------------------------------------------------------------------------------

第6页

--------------------------------------------------------------------------------------------------------------------------------------

3.1.2. STRUCTURE OF HEADER FIELDS

        Once a field has been unfolded, it may be viewed as being com-
        posed of a field-name followed by a colon (":"), followed by a
        field-body, and terminated by a carriage-return/line-feed.
        The field-name must be composed of printable ASCII characters
        (i.e., characters that have values between 33. and 126.,
        decimal, except colon). The field-body may be composed of any
        ASCII characters, except CR or LF. (While CR and/or LF may be
        present in the actual text, they are removed by the action of
        unfolding the field.)

        Certain field-bodies of headers may be interpreted according
        to an internal syntax that some systems may wish to parse.
        These fields are called "structured   fields".    Examples
        include fields containing dates and addresses. Other fields,
        such as "Subject" and "Comments", are regarded simply as
        strings of text.

        Note: Any field which has a field-body that is defined as
               other than simply <text> is to be treated as a struc-
               tured field.

               Field-names, unstructured field bodies and structured
               field bodies each are scanned by their own, independent
               "lexical" analyzers.

3.1.3. UNSTRUCTURED FIELD BODIES

        For some fields, such as "Subject" and "Comments", no struc-
        turing is assumed, and they are treated simply as <text>s, as
        in the message body. Rules of folding apply to these fields,
        so that such field bodies which occupy several lines must
        therefore have the second and successive lines indented by at
        least one LWSP-char.

3.1.4. STRUCTURED FIELD BODIES

        To aid in the creation and reading of structured fields, the
        free insertion   of linear-white-space (which permits folding
        by inclusion of CRLFs) is allowed between lexical tokens.
        Rather than obscuring the syntax specifications for these
        structured fields with explicit syntax for this linear-white-
        space, the existence of another "lexical" analyzer is assumed.
        This analyzer does not apply for unstructured field bodies
        that are simply strings of text, as described above. The
        analyzer provides an interpretation of the unfolded text
        composing the body of the field as a sequence of lexical sym-
        bols.

--------------------------------------------------------------------------------------------------------------------------------------

3.1.2. 消息头域的结构

        一个没有折叠的消息头域，由两部分组成，用冒号(":")分开，左边是名称
        右边是域的内容，最后由回车换行(CRLF)结束．消息头域的名称必须由可
        打印的ASCII字母组成( 可打印的ASCII字母指0x21~0x7e之间的，除了冒
        号0x3A之外的字符 )，消息头域的内容可以由任意的ASCII数字组成，除了
        回车和换行( 注:CR, LF是回车换行，CR回车，'/r',值为0x0d, LF换行，
        '/n', 值为0x0a)(CR,LF可能在消息头域的内容包含多行时存在，但是多行
        信息在做"展开"(unfolding)的动作时，会把这些CR和LF都删除掉)

        一些消息头域的内容部分，可能还需要再次解释，它们有自己的内部语法
        (注:如地址被编码为<?GB18030?B?suLK1A==?=>)．这样的域叫结构化域
        ("structured   fields")，例如包含时间和地址的域．而其他域，例如
        标题("Subject")，注释("Comments")，都被认为是简单的字符串域，
        而不是结构化域

        注意：有一些消息头域的内容部分，它们在标准里有定义，它们不是简单文本，
        处理它们时要把它们看作结构化域．
        　　　
        　　　而消息头域的名称，结构化和非结构化域的消息头内容，会使用不同的，
        各自独立的词法分析器

3.1.3. 非结构化的消息头域内容

        一些消息头的域，如"Subject"和"Comments"，在消息体里，会被假设为非结构的
        内容，它们被当作文本来处理(<text>)。折行策略可能会被使用，因此如果
        内容有多行，那么除了第一行以外的行必须用LWSP-char开始

3.1.4. 结构化的消息头域内容

        为了帮助创建和读取结构化的域，折行字符(注：linear-white-space ，
        就是1*([CRLF] LWSP-char))允许出现在词和词的中间. (如果把折行字符放到
        词内部)会使这些结构化域内容的明确语法定义产生混乱，我们应该假设存在
        另外的词法解析器专门解析此结构化域。这个解析器不需要能解析前面提到的
        能折行的简单文本。但需要能解释展开了的文本(把折行字符去掉的文本)。

--------------------------------------------------------------------------------------------------------------------------------------

RFC 822 中文版 MIME解析基础(4)(第5-6也)

「Pygors跨平台GUI」1：Pygors跨平台GUI应用研究

[转帖]

python列出centos7内存使用前50的进程信息

「Pygors跨平台GUI」2：安装MinGW-w64、MSYS2还是WSL2

Garnet：微软官方基于.NET开源的高性能分布式缓存存储数据库

Flink执行图

Java响应式编程

评估统计算法在银行伪造钞票检测中的价值

Dokcer部署Kafka集群

【Linux命令学习】lsof查看打开的文件

8583例子

RFC 822 中文版 MIME解析基礎(4)(第5-6也)

2.3版本python如何進行調試

使用gcov對gcc項目進行覆蓋分析

一個關於windows服務管理的工具的源碼

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結