Python爬蟲第一天

原創

2020-02-22 10:52

1.安裝BeautifulSoup庫（第三方庫，簡化正則，目前還未體會到其應用優勢~~）

2.Test1:獲取url網頁信息

import urllib.request
response = urllib.request.urlopen('http://python.org/')
result = response.read().decode('utf-8')
print(result)

3.Test2:提取url網頁中包含的超級鏈接/網址

import urllib.request
import re  #re庫用於正則表達式

response = urllib.request.urlopen('http://www.jd.com')
text = response.read().decode('UTF-8')
print(text)
linkre = re.compile('href=\"(.+?)\"')  #編輯正則模型
for x in linkre.findall(text):
    if 'http' in x:
        print('新增地址-->'+x)

4.正則

# pattern = re.compile('正則') 匹配所有
# pettern = re.match('正則') 開始匹配，匹配一次
# pettern = re.research('正則') 中間匹配，匹配一次

benguniang

發佈了27 篇原創文章 · 獲贊 2 · 訪問量 1萬+

私信關注

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

智慧家庭場景的推薦系統的發展歷程和方向 | InfoQ《公開課》

直播概要：隨着計算機的蓬勃發展，互聯網進入大數據和人工智能時代，爲了解決信息過載和長尾商品，推薦系統成爲唯一選擇，而面對不同的業務場景，爲了解決業務痛點，會根據不同的場景特點尋找不同的方法和手段來解決推薦中實際遇到的問題。在智慧家庭領域，

InfoQ 中文站

2021-12-21 10:54:01

Alexa 全球排名網站將關閉，排名曾引爭議

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

辛晓亮

2021-12-14 14:53:55

Thinking Above Code：TLA+思維概述

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null

李明昊

2021-12-07 17:23:58

你的2.6朵雲裏，會有火山引擎嗎？

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

张俊宝

2021-12-07 10:28:54

數字化轉型這麼火，你真的看懂了嗎？

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

张俊宝

2021-12-02 21:08:57

基於圖像的機器學習技術將數十億的電子商務產品分爲數千個類別

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

Celian Gossec

2021-11-29 16:28:50

如何用 PyTorch 構建 GAN？

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

a-ying Cheng

2021-11-23 11:18:54

繞過硬件瓶頸，成倍提升芯片算力，軟件層面深挖芯片性能可行嗎？

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

张俊宝

2021-11-23 11:18:54

App Annie發佈預測：TikTok 將達 15 億活躍用戶，遙遙領先 Instagram

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

闫园园

2021-11-19 19:53:55

不是隻有數字化水平高，纔可以落地知識圖譜

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockq

罗燕珊

2021-11-11 15:23:53

科大訊飛在AI源頭技術上的突破，實現系統性創新

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null

Lucien

2021-11-08 15:13:57

不滿被辭退，一程序員寫爬蟲程序侵入公司後臺刪庫泄憤，造成經濟損失10餘萬元

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockq

刘燕

2021-11-08 14:03:51

“Trojan Source”算法漏洞幾乎影響所有代碼的安全

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null

Brian Krebs

2021-11-05 18:33:59

谷歌前CEO發出警告：元宇宙對人類未必是好事，AI技術是“僞神”

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null

凌敏

2021-11-02 14:03:53

騰訊發佈超大預訓練系統派大星，聚焦解決BERT等超大模型訓練時的“GPU內存牆”問題

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

InfoQ编辑部

2021-11-02 13:38:53

24小時熱門文章

Python爬蟲第一天

《日本蠟燭圖》讀書筆記 & 技術分析回測

Python多線程編程深度探索：從入門到實戰

《期貨-市場技術分析》讀書筆記

mongodb處理json數據很好

頂級 Javaer 都在用的 20 個類庫，真香！

[轉帖]cpupower

google瀏覽器插件開發

35K*14 薪，入職了！這公司只要不裁員，我能一直呆下去！

交集、並集、餘集——多種方法/List 泛型

基於收斂加密的文件所有權證明協議——毛崢

小紅號的端口講解

Python爬蟲第二天

Python爬蟲第一天

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結