Python爬虫突破某360查询网站反爬验证码

某360查询网址同一ip频繁访问的话会弹出验证码,

该验证码特点:get获取的验证码图片和网页展现的不一致(所以无法通过图像识别破解);该验证码是针对ip,输入验证码后会给出一个大概一天有效期的cookie,在此阶段可以大量频繁访问了。

获取cookie:谷歌浏览器(输入验证码之后) >>F12 >> Application(Network右边)>> Cookies>> 然后最好从Network也获取个cookies对比一下,然后测试一下哪几个是关键的,去掉多余的>> 我这里筛选的有QiHooGUID、__guid、HSPK >> 然后就可以在get方式里使用这个cookie了,此cookie只对相同ip下的访问生效。。

cookiesList = [{'QiHooGUID': '4B8401DA0CF874A18E4BFF757EB90AE1.1559194913641','__guid': '15484592.1937232903330800000.1559194915338.2622','HSPK': 'e233796b2ddb62b46b356063af241c0.1559194953.1'},
               {'QiHooGUID': '0EFD4F5BD0052491974CAED4F706DF45.1559179861562','__guid': '15484592.814590072574967200.1559179861246.2769','HSPK': '160ff6e1afae22e60d0321036f692f9.1559179998.1'},
               {'QiHooGUID': '117C9371BC637FDC5C410F7B05C04CBD.1559179860932','__guid': '15484592.1180757161332586200.1559179861157.9478','HSPK': '96db7fb23d1c854555fba80e4c827bc.1559180056.1'},
			   {'QiHooGUID': '397CF97029E2031C7FE185F323BC4591.1559179862317','__guid': '15484592.4234635624280693000.1559179861608.329','HSPK': '788f84234c87aff1807a434c968aa67.1559180114.1'},
			   {'QiHooGUID': '7677BC494C9AEB59BF338C34BE065908.1559179860461','__guid': '15484592.772992706594705000.1559179860839.9106','HSPK': '054120f3987b7a26dff500e099e238a.1559180167.1'}

]

headersPool = [
	"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36",
	"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36",
	"Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US)",
	"Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET CLR 2.0.50727; Media Center PC 6.0)",
	"Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET CLR 1.0.3705; .NET CLR 1.1.4322)",
	"Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 5.2; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.2; .NET CLR 3.0.04506.30)",
	"Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN) AppleWebKit/523.15 (KHTML, like Gecko, Safari/419.3) Arora/0.3 (Change: 287 c9dfb30)",
	"Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.2pre) Gecko/20070215 K-Ninja/2.1.1",
	"Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.9) Gecko/20080705 Firefox/3.0 Kapiko/3.0",
	"Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.8) Gecko Fedora/1.9.0.8-1.fc10 Kazehakase/0.5.6",
	"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11",
	"Mozilla/5.0 (Windows; U; Windows NT 5.2) Gecko/2008070208 Firefox/3.0.1",
	"Mozilla/5.0 (Windows; U; Windows NT 5.1) Gecko/20070309 Firefox/2.0.0.3",
	"Mozilla/5.0 (Windows; U; Windows NT 5.1) Gecko/20070803 Firefox/1.5.0.12",
	"Opera/9.27 (Windows NT 5.2; U; zh-cn)",
	"Mozilla/5.0 (Windows; U; Windows NT 5.2) AppleWebKit/525.13 (KHTML, like Gecko) Version/3.1 Safari/525.13",
	"Mozilla/5.0 (Windows; U; Windows NT 5.2) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.2.149.27 ",
	"Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_6; en-US) AppleWebKit/530.9 (KHTML, like Gecko) Chrome/ Safari/530.9 ",
	"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_0) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11",
	"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; 360SE)",
	"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.11 (KHTML, like Gecko) Ubuntu/11.10 Chromium/27.0.1453.93 Chrome/27.0.1453.93 Safari/537.36",
	"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.93 Safari/537.36",
	"Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.94 Safari/537.36"
]


	while True:
		try:
			urlAim = 'https://www.so.com/s?q={0}'.format(telNbr)
			print(urlAim)
			headers = {"User-Agent": random.choice(headersPool)}
			cookies = random.choice(cookiesList)
			response = requests.get(urlAim, cookies=cookies, headers=headers)
			selector = etree.HTML(response.text)
			response.close()
			telNbrShow = telAttribution = telType = telMarkNbr = ''
			seeyzm = selector.xpath('//*[@id="container"]/div[1]/p[1]/text()')  # 判断是否出现验证码
			# ['亲,系统检测到您操作过于频繁。']
			if seeyzm:
				print('请检查第 {0} cookie是否过期...'.format(cookiesList.index(cookies)))
			else:
				print('当前cookie  {0} '.format(cookiesList.index(cookies)))

 

 

 

 

 

 

 

 

 

 

 

 

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章