Python3讀取郵件內容

前言

郵件的收取主要有pop(主要用於客戶端遠程管理服務器上的郵件)和imap(交互式郵件訪問協議)，相應的Python中提供了相關的模塊poplib和imaplib。POP3儘管得到廣泛的支持，但其已經過時，而且POP3服務器的實現差異很大，大多數進行較差，所以如果我們的郵件服務器支持IMAP，那麼最好使用imaplib.IMAP4，因爲IMAP服務器往往會更好的實現。基本上主流的郵箱都會支持imap協議，如qq、163、gmail、outlook等等。因此我們選擇imap協議來實現讀取郵件的腳本。

實現過程

登錄郵箱並讀取原始郵件

使用imaplib庫實現郵箱登錄，所以需要先導入庫import imaplib，然後利用imaplib庫中的方法登錄郵箱並讀取郵件
```
def get_mail(email_address, password):
  # 這裏的服務器根據需要選擇
  server = imaplib.IMAP4_SSL("imap.gmail.com")
  server.login(email_address, password)
  # 郵箱中的文件夾，默認爲'INBOX'
  inbox = server.select("INBOX")
  # 搜索匹配的郵件，第一個參數是字符集，None默認就是ASCII編碼，第二個參數是查詢條件，這裏的ALL就是查找全部
  type, data = server.search(None, "ALL")
  # 郵件列表,使用空格分割得到郵件索引
  msgList = data[0].split()
  # 最新郵件，第0封郵件爲最早的一封郵件
  latest = msgList[len(msgList) - 1]
  type, datas = server.fetch(latest, '(RFC822)')
  # 使用utf-8解碼
  text = datas[0][1].decode('utf8')
  # 轉化爲email.message對象
  message = email.message_from_string(text)
  return message
```
上述程序返回值爲email.message，即原始郵件，如果打印出來，我們會發現這些一些代碼，無法讀懂，因此接下來我們需要將原始郵件轉化爲可讀郵件

關於email.message

電子郵件消息由 headers 和 payload （其也被稱爲 content）組成。標題是 RFC 5322 或 RFC 6532 樣式的字段名稱和值。有效載荷可以是簡單文本消息，或二進制對象或子消息的結構化序列，每個子消息具有它們自己的一組頭部和它們自己的有效載荷。後一類型的有效載荷由具有諸如 multipart或 message/rfc822 的MIME類型的消息指示。

由 EmailMessage 對象提供的概念模型是與表示消息的 RFC 5322 主體的 payload 耦合的標題的有序字典，其可以是子 EmailMessage 對象的列表。除了用於訪問頭部名稱和值的常規字典方法之外，還存在用於從頭部（例如MIME內容類型）訪問專用信息，用於在有效載荷上操作，用於生成消息的序列化版本的方法，以及用於遞歸地遍歷對象樹。

EmailMessage 類字典接口由標題名稱索引，標題名稱必須是ASCII值。字典的值是帶有一些額外方法的字符串。頭以字節保存的形式存儲和返回，但字段名匹配大小寫不敏感。不像真正的dict，有一個排序的鍵，並可以有重複的鍵。提供了其他方法來處理具有重複鍵的標頭。

將原始郵件轉化爲可讀郵件

郵件的Subject或者Email中包含的名字都是經過編碼後的字符串，要正常顯示就必須decode，定義一個decode函數
```
def decode_str(s):
 value, charset = decode_header(s)[0]
 if charset:
     value = value.decode(charset)
 return value
```

爲了防止非UTF-8編碼的郵件無法顯示，定義一個檢測郵件編碼函數

def guess_charset(msg):
 charset = msg.get_charset()
 if charset is None:
     content_type = msg.get('Content-Type', '').lower()
     pos = content_type.find('charset=')
     if pos >= 0:
         # 去掉尾部不代表編碼的字段
         charset = content_type[pos + 8:].strip('; format=flowed; delsp=yes')
 return charset

接下來通過循環遍歷來讀取郵件內容

# 使用全局變量來保存郵件內容
mail_content = '\n'
# indent用於縮進顯示:
def print_info(msg, indent=0):
 global mail_content
 if indent == 0:
     for header in ['From', 'To', 'Subject']:
         value = msg.get(header, '')
         if value:
             if header == 'Subject':
                 value = decode_str(value)
             else:
                 hdr, addr = parseaddr(value)
                 name = decode_str(hdr)
                 value = u'%s <%s>' % (name, addr)
         mail_content += '%s%s: %s' % ('  ' * indent, header, value) + '\n'
 parts = msg.get_payload()
 for n, part in enumerate(parts):
     content_type = part.get_content_type()
     if content_type == 'text/plain':
         content = part.get_payload(decode=True)
         # charset = guess_charset(msg)
         charset = 'utf-8'
         if charset:
             content = content.decode(charset)
         mail_content += '%sText:\n %s' % (' ' * indent, content)
     else:
         # 這裏沒有讀取非text/plain類型的內容，只是讀取了其格式，一般爲text/html
         mail_content += '%sAttachment: %s' % ('  ' * indent, content_type)
return mail_content

最後，調用上述函數，輸出郵件內容

if __name__ == '__main__':
 email_addr = "[email protected]"
 password = "mypassword"
 test = print_info(get_mail(email_addr, password))       
 print("mail content is: %s" % test)

Python3讀取郵件內容

Python3讀取郵件內容

前言

實現過程

相關問題

SQL優化-20231016

IDEA Maven項目找不到或無法加載主類

IDEA下載Maven超級慢的解決

Python3讀取郵件內容

排序算法之希爾排序

Java之final關鍵字解析

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結