Python爬虫第一天

原創

2020-02-22 10:52

1.安装BeautifulSoup库（第三方库，简化正则，目前还未体会到其应用优势~~）

2.Test1:获取url网页信息

import urllib.request
response = urllib.request.urlopen('http://python.org/')
result = response.read().decode('utf-8')
print(result)

3.Test2:提取url网页中包含的超级链接/网址

import urllib.request
import re  #re库用于正则表达式

response = urllib.request.urlopen('http://www.jd.com')
text = response.read().decode('UTF-8')
print(text)
linkre = re.compile('href=\"(.+?)\"')  #编辑正则模型
for x in linkre.findall(text):
    if 'http' in x:
        print('新增地址-->'+x)

4.正则

# pattern = re.compile('正则') 匹配所有
# pettern = re.match('正则') 开始匹配，匹配一次
# pettern = re.research('正则') 中间匹配，匹配一次

benguniang

发布了27 篇原创文章 · 获赞 2 · 访问量 1万+

私信关注

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

智慧家庭场景的推荐系统的发展历程和方向 | InfoQ《公开课》

直播概要：隨着計算機的蓬勃發展，互聯網進入大數據和人工智能時代，爲了解決信息過載和長尾商品，推薦系統成爲唯一選擇，而面對不同的業務場景，爲了解決業務痛點，會根據不同的場景特點尋找不同的方法和手段來解決推薦中實際遇到的問題。在智慧家庭領域，

InfoQ 中文站

2021-12-21 10:54:01

Alexa 全球排名网站将关闭，排名曾引争议

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

辛晓亮

2021-12-14 14:53:55

Thinking Above Code：TLA+思维概述

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null

李明昊

2021-12-07 17:23:58

你的2.6朵云里，会有火山引擎吗？

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

张俊宝

2021-12-07 10:28:54

数字化转型这么火，你真的看懂了吗？

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

张俊宝

2021-12-02 21:08:57

基于图像的机器学习技术将数十亿的电子商务产品分为数千个类别

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

Celian Gossec

2021-11-29 16:28:50

如何用 PyTorch 构建 GAN？

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

a-ying Cheng

2021-11-23 11:18:54

绕过硬件瓶颈，成倍提升芯片算力，软件层面深挖芯片性能可行吗？

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

张俊宝

2021-11-23 11:18:54

App Annie发布预测：TikTok 将达 15 亿活跃用户，遥遥领先 Instagram

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

闫园园

2021-11-19 19:53:55

不是只有数字化水平高，才可以落地知识图谱

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockq

罗燕珊

2021-11-11 15:23:53

科大讯飞在AI源头技术上的突破，实现系统性创新

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null

Lucien

2021-11-08 15:13:57

不满被辞退，一程序员写爬虫程序侵入公司后台删库泄愤，造成经济损失10余万元

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockq

刘燕

2021-11-08 14:03:51

“Trojan Source”算法漏洞几乎影响所有代码的安全

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null

Brian Krebs

2021-11-05 18:33:59

谷歌前CEO发出警告：元宇宙对人类未必是好事，AI技术是“伪神”

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null

凌敏

2021-11-02 14:03:53

腾讯发布超大预训练系统派大星，聚焦解决BERT等超大模型训练时的“GPU内存墙”问题

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

InfoQ编辑部

2021-11-02 13:38:53

24小時熱門文章

Python爬虫第一天

容器中nginx无法使用同一个网络下的容器域名

Python: SunMoonTimeCalculator

「Pygors跨平台GUI」1：Pygors跨平台GUI应用研究

NETCore中实现一个轻量无负担的极简任务调度ScheduleTask

docker使用特定的网络

使用c#强大的表达式树实现对象的深克隆之解决循环引用的问题

「Pygors跨平台GUI」2：安装MinGW-w64、MSYS2还是WSL2

nodejs学习07——API

避免DbContext同时在多个线程调用

GPT-4o 引领人机交互新风向，向量数据库赛道沸腾了

交集、並集、餘集——多種方法/List 泛型

基於收斂加密的文件所有權證明協議——毛崢

小紅號的端口講解

Python爬蟲第二天

Python爬蟲第一天

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結