Unicode (UTF-8) 在 Python 中读取和写入文件 - Unicode (UTF-8) reading and writing to files in Python

问题:

I'm having some brain failure in understanding reading and writing text to a file (Python 2.4).我在理解对文件(Python 2.4)的读取和写入文本时遇到了一些大脑故障。

# The string, which has an a-acute in it.
ss = u'Capit\xe1n'
ss8 = ss.encode('utf8')
repr(ss), repr(ss8)

("u'Capit\\xe1n'", "'Capit\\xc3\\xa1n'") ("u'Capit\\xe1n'", "'Capit\\xc3\\xa1n'")

print ss, ss8
print >> open('f1','w'), ss8

>>> file('f1').read()
'Capit\xc3\xa1n\n'

So I type in Capit\\xc3\\xa1n into my favorite editor, in file f2.所以我在文件 f2 中输入Capit\\xc3\\xa1n到我最喜欢的编辑器中。

Then:然后:

>>> open('f1').read()
'Capit\xc3\xa1n\n'
>>> open('f2').read()
'Capit\\xc3\\xa1n\n'
>>> open('f1').read().decode('utf8')
u'Capit\xe1n\n'
>>> open('f2').read().decode('utf8')
u'Capit\\xc3\\xa1n\n'

What am I not understanding here?我在这里不明白什么? Clearly there is some vital bit of magic (or good sense) that I'm missing.显然,我缺少一些重要的魔法(或理智)。 What does one type into text files to get proper conversions?在文本文件中键入什么以获得正确的转换?

What I'm truly failing to grok here, is what the point of the UTF-8 representation is, if you can't actually get Python to recognize it, when it comes from outside.我真正无法理解的是 UTF-8 表示的意义是什么,如果你实际上无法让 Python 识别它,当它来自外部时。 Maybe I should just JSON dump the string, and use that instead, since that has an asciiable representation!也许我应该只用 JSON 转储字符串,然后使用它,因为它具有 asciiable 表示! More to the point, is there an ASCII representation of this Unicode object that Python will recognize and decode, when coming in from a file?更重要的是,当从文件中输入时,Python 会识别和解码这个 Unicode 对象的 ASCII 表示吗? If so, how do I get it?如果是这样,我如何获得它?

>>> print simplejson.dumps(ss)
'"Capit\u00e1n"'
>>> print >> file('f3','w'), simplejson.dumps(ss)
>>> simplejson.load(open('f3'))
u'Capit\xe1n'

解决方案:

参考一: https://en.stackoom.com/question/23yD
参考二: https://stackoom.com/question/23yD
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章