python基础爬虫

原創

2020-02-22 01:47


import requests
import re

# 获取对象的url
url2 = 'http://www.dzu.edu.cn/'

# 伪装成默认浏览器访问
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:67.0) Gecko/20100101 Firefox/67.0",}

response = requests.get(url2, headers  =  headers)
# 因为出现了乱码  所以改变编码方式
response.encoding = 'utf-8'

# print(response)
# 将返回的response转换为text
html = response.text
# 利用正则进行数据清洗
title = re.findall(r'<li><a href="(.*?)">书记信箱</a></li>',html)
imgs = re.findall(r'<img src="(.*?)" alt="">', html)

# print(imgs)

# print(title)

# with open('%s.jpg' %imgs[0]) as f:
i = 0
# 数据永久化处理
for x in imgs:
	imgurl = 'http://www.dzu.edu.cn/%s'%x
	print(imgurl)
	# 进行二进制书写
	fb = open('%d.jpg' %i, 'wb')
	# 返回回应
	repose =  requests.get(imgurl)
	# 将回应转化为二进制
	data = repose.content
	fb.write(data)
	i = i+1;

发布了20 篇原创文章 · 获赞 25 · 访问量 4666

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

相關文章

一款基于C#开发的通讯调试工具（支持Modbus RTU、MQTT调试）

前言今天大姚給大家分享一款基於C#、WPF、Prism、MaterialDesign、HandyControl開發的通訊調試工具（支持Modbus RTU、MQTT調試，界面色彩豐富）：Wu.CommTool。工具特點工具界面色彩豐

2024-05-19 14:21:58

Linux/Golang/glibC系统调用

Linux/Golang/glibC系統調用本文主要通過分析Linux環境下Golang的系統調用，以此闡明整個流程有時候涉略過多，反而遭到質疑~，寫點文章證明自己實力也好 Golang系統調用找個函數來分析 https://pk

藍天上的雲℡

2024-05-19 14:21:17

让python代码找到文件路径的最好方法

也就是算出絕對路徑傳進去. import os wenjian='/'.join(os.path.abspath(__file__).split('/')[:-2])+'/' with open(wenjian+"meddata.jso

張博的博客

2024-05-19 14:19:47

Python 潮流周刊#51：用 Python 绘制美观的图表

本週刊由 Python貓出品，精心篩選國內外的 250+ 信息源，爲你挑選最值得分享的文章、教程、開源項目、軟件工具、播客和視頻、熱門話題等內容。願景：幫助所有讀者精進 Python 技術，並增長職業和副業的收入。本期週刊分享了 12

豌豆花下貓

2024-05-19 14:19:07

MASM中的向前引用（Forward Reference）

當程序需要引用尚未定義的變量或標號時，編譯器會如何處理呢，這就涉及到向前引用（Forward Reference）的概念。一、Forward Reference的概念程序引用到之前尚未定義的變量(Variable)、標號(L

2024-05-19 14:11:37

[MASM拾遗]Offset伪指令

Offset僞指令我一直都認爲只是獲取標識符在段中的偏移地址，但經研究，發現了部分違反直覺的細微區別： 1、在完整端聲明(Full segment definition)的模式下如果offset mygroup:myvar或o

2024-05-19 14:11:37

【Python】强化学习SARSA走迷宫

之前有實現Q-Learning走迷宮，本篇實現SARSA走迷宮。 Q-Learning是一種off-policy算法，當前步採取的決策action不直接作用於環境生成下一次state，而是選擇最優的獎勵來更新Q表。更新公式： SARSA

2024-05-19 14:11:07

h28 HTML Javascript

A script is a small piece of program that can add interactivity to our websites. For example, a script could generate a

2024-05-19 14:10:26

h29 HTML Layouts

The HTML Layouts specifies the arrangement of components on an HTML web page. A good layout structure of the webpage i

2024-05-19 14:10:26

h27 HTML Adding Favicon

What is a HTML Favicon? A favicon is a small image that represents your website and helps users identify it among mult

2024-05-19 14:10:26

h30 HTML Layout Elements

The Layout Elements of HTML In HTML, there are various semantic elements that are used to define different parts of a

2024-05-19 14:10:26

h31 HTML Layout using CSS

Now we all have learned various techniques to design an HTML layout including tables and semantic elements. We are ver

2024-05-19 14:10:26

CSS Cascading Style Sheet

cs01 CSS Syntax cs02 CSS Selectors cs03 CSS Inclusion cs04 CSS Measurement Units cs05 CSS Paddings Property REF http

2024-05-19 14:10:26

cs04 CSS Measurement Units

Values and units, in CSS, are significant as they determine the size, proportions, and positioning of elements on a web

2024-05-19 14:10:26

cs01 CSS Syntax

A CSS comprises of style rules that are interpreted by the browser and then applied to the corresponding elements in you

2024-05-19 14:10:26

24小時熱門文章

最新文章

最新評論文章