Python中的bytes、str以及unicode區別

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"從Python發展歷史談起"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Python3和Python2表示字符序列的方式有所不同。"}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"Python3字符序列的兩種表示爲"},{"type":"text","marks":[{"type":"italic"},{"type":"size","attrs":{"size":16}},{"type":"strong"}],"text":"byte和str"},{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"。"},{"type":"text","marks":[{"type":"italic"},{"type":"size","attrs":{"size":14}},{"type":"strong"}],"text":"前者的實例包含原始的8位值,即原始的字節;後者的實例包括Unicode字符。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"Python2字符序列的兩種表示爲"},{"type":"text","marks":[{"type":"italic"},{"type":"size","attrs":{"size":16}},{"type":"strong"}],"text":"str和unicode"},{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"。"},{"type":"text","marks":[{"type":"italic"},{"type":"size","attrs":{"size":14}},{"type":"strong"}],"text":"與Python3不同的是,str實例包含原始的8位值;而unicode的實例,則包含Unicode字符。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"類型轉換"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"把Unicode字符表示爲二進制數據有許多方法。做常見的編碼方式是UTF-8。但是python3的str實例和Python2的unicode實例都沒有和特定的二進制編碼形式相關聯。要想把Unicode字符轉換爲二進制數據,就必須使用encode方法。要想把二進制數據轉換成Unicode字符,則必須使用decode方法。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因此,在編寫大型複雜的Python程序的時候,一般把編碼和解碼的相關操作放到最外層來做。程序核心部分使用Unicode字符類型,也就是Python3的str以及Python2的unicode,而且不要對字符編碼做任何假設。這種辦法既可以讓程序接受多種類型的文本編碼,又可以保證輸出的文本信息只採用一種編碼形式(最好是UTF-8),靈活性很高。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"所以,可以編寫兩個輔助函數,以便對序列進行轉換,使得轉換後的輸入數據符合開發者預期。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"對於Python3"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在Python3中,我們需要編寫一個接收str或bytes,並總是返回str的方法:"}]},{"type":"codeblock","attrs":{"lang":"python"},"content":[{"type":"text","text":"def to_str(bytes_or_str):\n if isinstance(bytes_or_str, bytes):\n return bytes_or_str.decode('utf-8')\n return bytes_or_str # instance of str"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以及一個總是返回bytes的方法:"}]},{"type":"codeblock","attrs":{"lang":"python"},"content":[{"type":"text","text":"def to_bytes(bytes_or_str):\n if isinstance(bytes_or_str, str):\n return bytes_or_str.encode('utf-8')\n return bytes_or_str # instance of bytes"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"對於Python2"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在Python2中,我們需要編寫一個接收str或unicode,並總是返回unicode的方法:"}]},{"type":"codeblock","attrs":{"lang":"python"},"content":[{"type":"text","text":"def to_unicode(unicode_or_str):\n if isinstance(unicode_or_str, str):\n return unicode_or_str.decode('utf-8')\n return unicode_or_str # instance of unicode"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以及一個總是返回str的方法:"}]},{"type":"codeblock","attrs":{"lang":"python"},"content":[{"type":"text","text":"def to_str(unicode_or_str):\n if isinstance(unicode_or_str, unicode):\n return unicode_or_str.encode('utf-8')\n return unicode_or_str # instance of str"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"面臨的問題"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在Python中使用原始8位值與Unicode字符時,通常有兩個問題需要注意。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"},{"type":"underline"},{"type":"strong"}],"text":"第一個問題"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第一個問題通常出現在Python2中,如果你用的是Python3,可以暫且忽略這個問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果str只包含7位的ASCII字符,那麼unicode和str實例似乎就成了同一種類型。"}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可以用+操作符把str與unicode連接起來"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可以進行比較操作比如, "},{"type":"text","marks":[{"type":"strong"}],"text":"=="},{"type":"text","text":"、"},{"type":"text","marks":[{"type":"strong"}],"text":"!=、"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這些行爲一位着,只處理7位的ASCII的情況下,如果某個函數接受str,那麼可以給它傳入unicode;如果某個函數接收unicode,也可以傳入str。而在Python3中,bytes和str實例絕對不會等價,更不能進行比較,即使是空字符串也不可以,因爲這是完全不同的兩個類型。所以,在傳入字符序列的時候必須留意其類型。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"},{"type":"underline"},{"type":"strong"}],"text":"第二個問題"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第二個問題可能會出現在Python3上,也是經常遇到的一些問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果通過內置的open函數獲取謳歌文件句柄,那麼需要注意的是,該句柄默認會採用UTF-8編碼格式來操作文件。而在Python2中,文件操作的默認格式是二進制的,這可能會導致程序出現奇怪的錯誤。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"例如,現在要向文件中隨機寫入二進制數據,下面這個寫法在Python2中不會有什麼問題,但在Python3中會有異常,提示"},{"type":"codeinline","content":[{"type":"text","text":"TypeError: write() argument must be str, not bytes"}]}]},{"type":"codeblock","attrs":{"lang":"python"},"content":[{"type":"text","text":"with open('./random.bin', 'w') as f:\n f.write(os.urandom(10))\n\n>>> TypeError: write() argument must be str, not bytes"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"發生上述問題的原因是Python3給open函數添加了encode參數,而這個新參數的默認值是UTF-8。這樣一來,系統就會要求開發者必須傳入包含Unicode字符的str實例,而不是包含二進制數據的bytes實例。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了解決這個問題,我們必須用二進制寫入模式,即將原來的'w'修改爲'wb',這樣一來就可以同時適配Python2和Python3。從文件中讀取數據的時候也會有類似的問題,解決方法與寫入類似,使用"},{"type":"text","marks":[{"type":"strong"}],"text":"'rb'"},{"type":"text","text":"模式打開文件,而不是"},{"type":"text","marks":[{"type":"strong"}],"text":"'r'"},{"type":"text","text":"模式。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"總結"}]},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":1,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"在Python3中,bytes是一種包含8位值的序列,str是一種包含Unicode字符的序列。開發者不用"},{"type":"text","marks":[{"type":"strong"}],"text":"比較操作"},{"type":"text","text":"來混合處理。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":1,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"在Python2中,str是一種包含8位值的序列,unicode是一種包含Unicode字符的序列。如果str只有7位ASCII字符,那麼可以進行比較運算以及連接操作。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":1,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"在對輸入的數據操作之前,使用輔助函數來保證字符序列的類型與開發者的期望一致。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":1,"number":4,"align":null,"origin":null},"content":[{"type":"text","text":"從文件中讀取二進制數據,或向其中寫入二進制數據時,總應該以'rb'或'wb'等二進制模式來開啓文件。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"參考閱讀:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://book.douban.com/subject/26312313/","title":""},"content":[{"type":"text","text":"《Effective Python》"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章