Crawl GB2312 encoded webpages with Python 3.x

原創

2020-06-21 07:06

The following code works well.

from urllib.request import urlopen  
import bs4

doc= urlopen("http://www.w3school.com.cn/html/html_tables.asp")
soup = bs4.BeautifulSoup(doc,fromEncoding="GB2312")
a=soup.findAll("title")
print (soup.prettify())

output = open("C:\\Users\\yfeng14\\Desktop\\betting\\contents.txt", 'w', encoding="UTF-8")
output.write(soup.prettify())
output.close()

If we use "requests" package, it fails.

import requests
import bs4

output = open("C:\\Users\\yfeng14\\Desktop\\betting\\contents.txt", 'w', encoding="UTF-8")
request_link = "http://www.songtaste.com/"
response = requests.get(request_link)<span style="white-space:pre">	</span>
soup = bs4.BeautifulSoup(response.text,"html.parser",from_encoding="GB2312")

output.write(soup.prettify())
output.close()

So, be careful when using the "request" package.

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

相關文章

Using pip to install Python packages on Anaconda

Windows Platform: cd C:\Users\jack\AppData\Local\Continuum\Anaconda2\pythonw.exe C:\Users\yfeng14\AppData\Local\Continu

2020-06-21 07:06:18

[A Weird Bug] caused by the name of Python script

I attempt to run a Python LDA module after 'pip install lda'. But there al

2020-06-21 07:06:18

Parallel Python for loop

An example on Windows: from joblib import Parallel, delayed import multiprocessing data = range(100) def process

2020-06-21 07:06:18

Python 3.5 deleting specified files recursively

Starting with Python 3.5, glob module supports the '**' directive, which matches any files and zero or more directories

2020-06-21 07:06:18

Tips for writing efficient Python code

2020-02-22 13:34:27

Fill Countries in Python Basemap

2020-02-22 13:34:27

Crawl AJAX dynamic web page using Python 2.x and 3.x

2020-02-22 13:34:27

Using pip to install Python packages on Anaconda

Windows Platform: cd C:\Users\jack\AppData\Local\Continuum\Anaconda2\pythonw.exe C:\Users\yfeng14\AppData\Local\Continu

2020-06-21 07:06:18

[A Weird Bug] caused by the name of Python script

I attempt to run a Python LDA module after 'pip install lda'. But there al

2020-06-21 07:06:18

Parallel Python for loop

An example on Windows: from joblib import Parallel, delayed import multiprocessing data = range(100) def process

2020-06-21 07:06:18

Python 3.5 deleting specified files recursively

Starting with Python 3.5, glob module supports the '**' directive, which matches any files and zero or more directories

2020-06-21 07:06:18

Tips for writing efficient Python code

2020-02-22 13:34:27

Fill Countries in Python Basemap

2020-02-22 13:34:27

Crawl AJAX dynamic web page using Python 2.x and 3.x

2020-02-22 13:34:27

Detect operating system types (Linux or Windows) using Python

2018-12-07 17:29:53

24小時熱門文章

最新文章

最新評論文章