Unicode (UTF-8) 在 Python 中讀取和寫入文件 - Unicode (UTF-8) reading and writing to files in Python

問題:

I'm having some brain failure in understanding reading and writing text to a file (Python 2.4).我在理解對文件(Python 2.4)的讀取和寫入文本時遇到了一些大腦故障。

# The string, which has an a-acute in it.
ss = u'Capit\xe1n'
ss8 = ss.encode('utf8')
repr(ss), repr(ss8)

("u'Capit\\xe1n'", "'Capit\\xc3\\xa1n'") ("u'Capit\\xe1n'", "'Capit\\xc3\\xa1n'")

print ss, ss8
print >> open('f1','w'), ss8

>>> file('f1').read()
'Capit\xc3\xa1n\n'

So I type in Capit\\xc3\\xa1n into my favorite editor, in file f2.所以我在文件 f2 中輸入Capit\\xc3\\xa1n到我最喜歡的編輯器中。

Then:然後:

>>> open('f1').read()
'Capit\xc3\xa1n\n'
>>> open('f2').read()
'Capit\\xc3\\xa1n\n'
>>> open('f1').read().decode('utf8')
u'Capit\xe1n\n'
>>> open('f2').read().decode('utf8')
u'Capit\\xc3\\xa1n\n'

What am I not understanding here?我在這裏不明白什麼? Clearly there is some vital bit of magic (or good sense) that I'm missing.顯然,我缺少一些重要的魔法(或理智)。 What does one type into text files to get proper conversions?在文本文件中鍵入什麼以獲得正確的轉換?

What I'm truly failing to grok here, is what the point of the UTF-8 representation is, if you can't actually get Python to recognize it, when it comes from outside.我真正無法理解的是 UTF-8 表示的意義是什麼,如果你實際上無法讓 Python 識別它,當它來自外部時。 Maybe I should just JSON dump the string, and use that instead, since that has an asciiable representation!也許我應該只用 JSON 轉儲字符串,然後使用它,因爲它具有 asciiable 表示! More to the point, is there an ASCII representation of this Unicode object that Python will recognize and decode, when coming in from a file?更重要的是,當從文件中輸入時,Python 會識別和解碼這個 Unicode 對象的 ASCII 表示嗎? If so, how do I get it?如果是這樣,我如何獲得它?

>>> print simplejson.dumps(ss)
'"Capit\u00e1n"'
>>> print >> file('f3','w'), simplejson.dumps(ss)
>>> simplejson.load(open('f3'))
u'Capit\xe1n'

解決方案:

參考一: https://en.stackoom.com/question/23yD
參考二: https://stackoom.com/question/23yD
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章