有道
思路
獲取到請求地址---->查看請求方式----->post請求一定是有表單的---->粘貼複製表單,裏面不知道的東西,先空過
比如下面的salt sign
def seng_request(self):
form_data = {
#'i': '啊哈',
'i': '',
'from': 'AUTO',
'to': 'AUTO',
'smartresult': 'dict',
'client': 'fanyideskweb',
#'salt': '15932458043921',
'salt': '',
#'sign': '24d1ac950b72ae268b1704034a5c172c',
'sign': '',
#'ts': '1593245804392',時間戳
'ts': self.ts,
#'bv': '02a6ad4308a3443b3732d855273259bf',
'bv': '',
'doctype': 'json',
'version': '2.1',
'keyfrom':' fanyi.web',
'action': 'FY_BY_CLICKBUTTION',
}
我們又要完善這個表單,所以必須破解出來
可以去網頁上找可能有關聯的腳本,把它複製到pycharm上
這裏有一個小技巧,當pycharm上的代碼比較亂的時候,可以使用快捷鍵 ctrl + alt+L來改善代碼的整潔度,或者
複製出來之後,代碼是根本讀不懂的,我們只是想要找到我們想找的東西,就可以使用==ctrl + l == 在pycharm裏面搜索關鍵參數,看看能不能分析出來對應的值。
我們把salt搞定了,然後可以先完善代碼。
完善完成之後,接着分析,
代碼
import time
import requests
import random
import hashlib
class YouDaoSpider():
def __init__(self):
self.url = 'http://fanyi.youdao.com/translate_o?smartresult=dict&smartresult=rule'
self.headers = {
'Accept': 'application/json, text/javascript, */*; q=0.01',
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'zh-CN,zh;q=0.9',
'Connection': 'keep-alive',
'Content-Length': '260',
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
'Cookie':'[email protected]; _ntes_nnid=b117014d95a4c5cf6832f8c92a045dcc,1589801960374; OUTFOX_SEARCH_USER_ID_NCOO=598000807.9036449; JSESSIONID=aaa3wgVaNvg1XdFRZ10lx; ___rl__test__cookies=1593256681647',
'Host': 'fanyi.youdao.com',
'Origin': 'http://fanyi.youdao.com',
'Referer': 'http://fanyi.youdao.com/',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36',
'X-Requested-With': 'XMLHttpRequest',
}
self.appversion = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
self.kw = input('請輸入你要翻譯的單詞:')
self.ts = self.get_ts()
self.salt = self.get_salt()
self.bv = self.get_bv()
self.sign = self.get_sign()
def send_request(self):
form_data = {
#'i': '啊哈',
'i': self.kw,
'from': 'AUTO',
'to': 'AUTO',
'smartresult': 'dict',
'client': 'fanyideskweb',
#'salt': '15932458043921',
'salt': self.salt,
#'sign': '24d1ac950b72ae268b1704034a5c172c',
'sign': self.sign,
#'ts': '1593245804392',時間戳
'ts': self.ts,
#'bv': '02a6ad4308a3443b3732d855273259bf',
'bv': self.bv,
'doctype': 'json',
'version': '2.1',
'keyfrom':' fanyi.web',
'action': 'FY_BY_CLICKBUTTION',
}
response =requests.post(url=self.url,data=form_data,headers=self.headers)
print(response.text)
def get_ts(self):
#他的時間戳是13位,但是python裏面默認的時間戳是13
return str(int(time.time())*1000)
def get_salt(self):
return self.ts + str(random.randint(0,10))
def get_bv(self):
md5 = hashlib.md5()
md5.update(self.appversion.encode())
return md5.hexdigest()
def get_sign(self):
md5 = hashlib.md5()
data = "fanyideskweb" + self.kw + self.salt + "mmbP%A-r6U3Nw(n]BjuEU"
md5.update(data.encode())
return md5.hexdigest()
if __name__ == '__main__':
yd = YouDaoSpider()
yd.send_request()
產品目錄
上面的有道里面的參數,我們還可能猜出來 ,但是像產品目錄的話,我們請求並保存數據
展示出來的就是這種,並沒有網頁上的數據,我們也只能看出來是js語法,其他的就看不明白了(比如函數)
這個時候,我們可以直接運行js
創建一個js文件,把有用的東西複製出來
如果要執行js,需要安裝東西
一、安裝
pip install PyExecJS
鏡像源安裝 pip3 install -i https://pypi.tuna.tsinghua.edu.cn/simple PyExecJS
二、執行js
execjs.eval("Date.now()")
返回:1522847001080
ctx = execjs.compile("""
function add(x, y) {
return x + y;
}
""")
ctx.call("add", 1, 2)
返回值:3
node = execjs.get() # 通過python代碼去執行JavaScript代碼的庫
file = 'product.js'
ctx = node.compile(open(file).read())
data = ctx.eval("data") # 去執行js裏面的函數變量
verify_data = ctx.eval("verify")
代碼
import requests
import execjs
# url ='http://www.300600900.cn/'
#
# headers = {
# 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36',
# }
# response = requests.get(url=url,headers=headers)
# with open('prodct.html','w') as f:
# f.write(response.text)
#
ej = execjs.get()
js_name = 'product.js'
node = ej.compile(open(js_name).read())
cookie_date = node.eval('cookie_date')
security_verify_data = node.eval('security_verify_data')
print(cookie_date)
print(security_verify_data)
url ='http://www.300600900.cn/'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36',
}
key,value = cookie_date.split('=')
session = requests.session()
session.get(url=url,headers=headers)
session.cookies.set(key,value)
full_url = url + security_verify_data
session.get(url=full_url,headers=headers)
response = session.get(url,headers=headers)
with open('product11.html','w')as f :
f.write(response.content.decode())
function stringToHex(str) {
var val = "";
for (var i = 0; i < str.length; i++) {
if (val == "") val = str.charCodeAt(i).toString(16); else val += str.charCodeAt(i).toString(16);
}
return val;
}
var width = 1400;
var height = 900;
var screendate = width + "," + height;
cookie_date = "srcurl=" + stringToHex('http://www.300600900.cn/');
security_verify_data = "/?security_verify_data=" + stringToHex(screendate);