在編寫將中文輸出到html時候,不僅要在輸出時生成uft-8編碼的網頁,如下兩句的encode('utf-8'):
fout.write("<td>%s</td>" % data['title'].encode('utf-8'))
fout.write("<td>%s</td>" % data['summary'].encode('utf-8'))
在瀏覽器讀取原內容時也應該設置爲utf-8格式,否則會亂碼,如下句:
fout.write("<head><meta charset='utf-8'></head>")
下面實現了一個輸出到HTML的類,collect_data方法接收data參數是一個字典,含有字段'url','title'和'summary'
# coding:utf-8
#輸出到html
class HtmlOutputer(object):
def __init__(self):
self.datas = []
def collect_data(self,data):
if data is None:
return
self.datas.append(data)
def output_html(self):
fout = open('output.html','w')
fout.write("<html>")
fout.write("<head><meta charset='utf-8'></head>")
fout.write("<body>")
fout.write("<table>")
for data in self.datas:
fout.write("<tr>")
fout.write("<td>%s</td>" % data['url'])
fout.write("<td>%s</td>" % data['title'].encode('utf-8'))
fout.write("<td>%s</td>" % data['summary'].encode('utf-8'))
fout.write("</tr>")
fout.write("</html>")
fout.write("</body>")
fout.write("</table>")