人生苦短 python爬蟲學習週期

原創

2020-06-23 22:27

爬蟲必備包 – request

from urllib import request

獲取數據

 def get_data():
 	url = ' '   
 	#創建request對象url請求頭
	headers = { ' user-agent' : ' ' }
	req = request.Request(url, headers=headers)
	#傳user-agent
	response = request.urlopen(req)
	if response.getcode()==200: #確認是否成功
		data = response.read() #讀取響應的結果
		data = str(data,encoding='utf-8') #轉換爲str
		#將數據寫入文件中
			with open('idnex.html',mode='w',encoding='utf-8') as f:
				f.write(data)

處理數據

 data parse_data():
 	with open('index.html',mode='r',encoding='utf-8') as f:
 		html = f.read()
 	bs = BeautifulSoup(html,'html.parser') #使用解析器
 	#1.find方法，獲取第一個匹配的標籤
 	#div = bs.find('div') #找到相應的內容
 	#print('div')  #打印相應的內容
 	#print(type((div)) #內容否認類型
	
	#2.find_all方法，或取所有匹配的標籤
	#metas = bs.find_all('meta') #返回的是所有的集合
	#print(metas[0])
	#print(bs.find_all(id='hello')) #根據id 獲取的數據，返回集合
	#print(bs,find_all(class_='itany')) #根據class 獲取

	#3.獲取select()方法，使用CSS選擇器獲取數據
	#print(bs.select('#hello'))
	#print(bs.select('.itany'))
	#print(bs.select('p#world span'))
	#print(bs.select('[title]'))

	#獲取文本
	#print(bs.select('.div')[0].get_text())
	#print(bs.find_all('article'))
	value = bs.select('#article')[0].get_text(strip=Ture)
	#print(len(value))
	print(value)

main函數
if name==‘main’:
parse_data()

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

人生苦短 python爬蟲學習週期

[軟件工具百科] 互聯網資源歷史快照歸檔站點與數字圖書館

網易面試：SpringBoot如何開啓虛擬線程？

杭州的 IT 崩盤了麼？

程序員常見的文本查看工具

VS2022 解決方案打不開 .NET Framework 4.0 、 4.5 等老項目

Vue3 運行可以，build 打包發佈報錯，app.config.globalProperties 用法坑

既然測試也要求寫代碼，那乾脆讓開發兼任測試不就好了嗎？

ITSM落地經驗之建設藍圖規劃

PDF 補丁丁 1.0.2 版更新

奇怪！應用的日誌呢？？

scrapy 創建新工程

ubuntu -- 常規操作編譯、刪除

解決python爬蟲時亂碼的問題

win10 -- 註冊機認爲是病毒，死活不讓下載 --那就關了它的自帶殺毒軟件

人生苦短 python爬蟲學習週期

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

人生苦短 python爬蟲 學習週期

人生苦短 python爬蟲學習週期