完整源碼參考我的pixiv-to-weibo項目,只實現了核心部分,沒做容錯處理
登錄
預登錄
這一步是爲了獲取加密公鑰和nonce,因爲以前新浪沒用HTTPS,必須手動加密。其中用戶名編碼su算法爲BASE64(URI編碼(用戶名))
async def _pre_login(self, su):
async with self._session.get('https://login.sina.com.cn/sso/prelogin.php', params={
'entry': 'weibo',
'callback': 'sinaSSOController.preloginCallBack',
'su': su,
'rsakt': 'mod',
'checkpin': '1',
'client': 'ssologin.js(v1.4.18)'
}) as r:
return self.__get_jsonp_response(await r.text())
返回方式是JSONP,有用的是這幾個字段:用來加密的公鑰pubkey
,防止回放攻擊的nonce
和servertime
,之後請求要用的rsakv
,是否需要驗證碼showpin
,驗證碼請求要用的pcid
獲取驗證碼
如果預登錄返回的數據中showpin
爲1則需要驗證碼。獲取驗證碼的請求如下,返回的是驗證碼圖片數據
async def _input_verif_code(self, pcid):
async with self._session.get('https://login.sina.com.cn/cgi/pin.php', params={
'r': random.randint(0, 100000000),
's': '0',
'p': pcid
}) as r:
img_data = await r.read()
self._show_image(img_data)
return input('輸入驗證碼:')
登錄
密碼密文sp計算
sp算法爲:轉十六進制文本(RSA(servertime + '\t' + nonce + '\n' + 密碼))
,其中RSA公鑰在預登錄時返回
@staticmethod
def _get_secret_password(password, servertime, nonce, pubkey):
key = rsa.PublicKey(int(pubkey, 16), 65537)
res = rsa.encrypt(f'{servertime}\t{nonce}\n{password}'.encode(), key)
res = binascii.b2a_hex(res)
return res.decode()
登錄請求
登錄請求如下,這個請求完成後就可以發微博了。很多微博登錄的源碼都只做到這一步,但是24小時後session會失效,少了後面的步驟則不能恢復session
async def login(self, username, password):
su = base64.b64encode(quote_plus(username).encode()).decode()
data = await self._pre_login(su)
async with self._session.post('https://login.sina.com.cn/sso/login.php', params={
'client': 'ssologin.js(v1.4.19)',
}, data={
'entry': 'weibo',
'gateway': '1',
'from': '',
'savestate': '7',
'qrcode_flag': 'false',
'useticket': '1',
'pagerefer': 'https://login.sina.com.cn/crossdomain2.php?action=logout&'
'r=https%3A%2F%2Fpassport.weibo.com%2Fwbsso%2Flogout%3Fr%3'
'Dhttps%253A%252F%252Fweibo.com%26returntype%3D1',
'vsnf': '1',
'su': su,
'service': 'miniblog',
'servertime': data['servertime'],
'nonce': data['nonce'],
'pwencode': 'rsa2',
'rsakv': data['rsakv'],
'sp': self._get_secret_password(password, data['servertime'],
data['nonce'], data['pubkey']),
'sr': '1366*768',
'encoding': 'UTF-8',
'prelt': '233',
'url': 'https://weibo.com/ajaxlogin.php?framelogin=1&callback='
'parent.sinaSSOController.feedBackUrlCallBack',
'returntype': 'META',
'door': '' if data['showpin'] == 0
else await self._input_verif_code(data['pcid'])
}) as r:
return await self.__handle_login_page(str(r.url), await r.text())
返回內容爲一個網頁,其中有腳本跳轉到下一個地址
<html>
<head>
<title>新浪通行證</title>
<meta http-equiv="refresh" content="0; url='https://login.sina.com.cn/crossdomain2.php?action=login&entry=weibo&......'"/>
<meta http-equiv="Content-Type" content="text/html; charset=GBK" />
</head>
<body bgcolor="#ffffff" text="#000000" link="#0000cc" vlink="#551a8b" alink="#ff0000">
<script type="text/javascript" language="javascript">
location.replace("https://login.sina.com.cn/crossdomain2.php?action=login&entry=weibo&......");
</script>
</body>
</html>
微博登錄中跳轉的網址是不確定的,有時候是https://login.sina.com.cn/crossdomain2.php
,有時候是https://passport.weibo.com/visitor/visitor
,整個流程不能寫死,所以我專門寫了一個函數來處理這些跳轉
async def __handle_login_page(self, url, res):
while True:
# 登錄頁
if url.startswith('https://login.sina.com.cn/sso/login.php'):
next_url = self.__get_next_url(res)
# ...
# 登錄完畢
elif url.startswith('https://weibo.com'):
return '/home' in url
# 未知的地址
else:
print('未知的地址:' + url)
print(res)
return False
async with self._session.get(next_url, headers={
'Referer': url # 訪問visitor?a=restore必須帶referer
}) as r:
url = str(r.url)
res = await r.text()
跨域廣播登錄
這是上一步跳轉到的crossdomain2.php
乾的事,就是你打開微博偶爾會看到的"Signing in …"的頁面。舊版sso登錄腳本沒有混淆並且有中文註釋,有興趣的自己看吧。arrURL
是要發送請求的URL列表,請求完成後調用location.replace
跳轉
廣播URL中要關注的是passport.weibo.com
這個域名,會設置一些恢復session用的cookie,Python中有些坑會導致這些cookie設置不上,這個後面再說
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=GBK" />
<title>Sina Passport</title>
<script charset="utf-8" src="https://i.sso.sina.com.cn/js/ssologin.js"></script>
</head>
<body>
Signing in ...
<script>
try{sinaSSOController.setCrossDomainUrlList({"retcode":0,"arrURL":["......"]});}
catch(e){
var msg = e.message;
var img = new Image();
var type = 1;
img.src = 'https://login.sina.com.cn/sso/debuglog?msg=' + msg +'&type=' + type;
}try{sinaSSOController.crossDomainAction('login',function(){location.replace('https://passport.weibo.com/wbsso/login?......');});}
catch(e){
var msg = e.message;
var img = new Image();
var type = 2;
img.src = 'https://login.sina.com.cn/sso/debuglog?msg=' + msg +'&type=' + type;
}
</script>
</body>
</html>
__handle_login_page
中對這個頁面的處理如下
# 跨域登錄廣播
elif url.startswith('https://login.sina.com.cn/crossdomain2.php'):
async def cross_domain_callback(url_, i):
async with self._session.get(url_, params={
'callback': 'sinaSSOController.doCrossDomainCallBack',
'scriptId': 'ssoscript' + str(i),
'client': 'ssologin.js(v1.4.2)'
}) as r_:
await r_.read()
url_list = re.search(r'setCrossDomainUrlList\((.*?)\);', res)[1]
url_list = json.loads(url_list)['arrURL']
await asyncio.gather(*(
cross_domain_callback(url, i) for i, url in enumerate(url_list)
))
next_url = self.__get_next_url(res)
解決cookie設置不上的問題
這個是因爲新浪返回的cookie中expire的星期幾用的是全稱,而Python標準庫中的正則假設它用的是3個字符的簡稱:\w{3},\s[\w\d\s-]{9,11}\s[\d:]{8}\sGMT
>>> import http.cookies
>>> cookie = http.cookies.SimpleCookie()
>>> cookie.load('SRF=155......; expires=Sunday, 15-Apr-2029 13:18:56 GMT; path=/; domain=.passport.weibo.com')
>>> cookie
<SimpleCookie: >
>>> cookie.load('SRF=155......; expires=Sun, 15-Apr-2029 13:18:56 GMT; path=/; domain=.passport.weibo.com')
>>> cookie
<SimpleCookie: SRF='155......'>
我的解決方法是在腳本開頭用黑科技把這個正則改了,不過不保證可移植性,如果不能用就自己想辦法吧
import http.cookies
http.cookies.BaseCookie._BaseCookie__parse_string.__defaults__ = (
re.compile(r"""
\s* # Optional whitespace at start of cookie
(?P<key> # Start of group 'key'
[""" + http.cookies._LegalKeyChars + r"""]+? # Any word of at least one letter
) # End of group 'key'
( # Optional group: there may not be a value.
\s*=\s* # Equal Sign
(?P<val> # Start of group 'val'
"(?:[^\\"]|\\.)*" # Any doublequoted string
| # or
\w{3,},\s[\w\d\s-]{9,11}\s[\d:]{8}\sGMT # Special case for "expires" attr
| # or
[""" + http.cookies._LegalValueChars + r"""]* # Any word or empty string
) # End of group 'val'
)? # End of optional value group
\s* # Any number of spaces.
(\s+|;|$) # Ending either at space, semicolon, or EOS.
""", re.ASCII | re.VERBOSE), # re.ASCII may be removed if safe.
)
跨域廣播之後
廣播完成後腳本跳轉到https://passport.weibo.com/wbsso/login
,然後通常會重定向到https://weibo.com/ajaxlogin.php
,redirect
是下一個跳轉地址
<html><head><script language='javascript'>parent.sinaSSOController.feedBackUrlCallBack({"result":true,"userinfo":{"uniqueid":"......","userid":null,"displayname":null,"userdomain":"?wvr=5&lf=reg"},"redirect":"https:\/\/weibo.com\/nguide\/interest"});</script></head><body></body></html>
__handle_login_page
中對這個頁面的處理如下
# 調用parent.sinaSSOController.feedBackUrlCallBack跳轉到weibo.com/nguide/interest
elif url.startswith('https://weibo.com/ajaxlogin.php'):
res_ = self.__get_jsonp_response(res)
next_url = res_['redirect']
然後經過一堆重定向到你的微博主頁https://weibo.com/....../home
恢復session
24小時後session會失效,此時再訪問微博首頁會被重定向到https://login.sina.com.cn/sso/login.php
,之後按登錄流程處理即可
async def restore_session(self):
async with self._session.get('https://weibo.com/') as r:
return await self.__handle_login_page(str(r.url), await r.text())
有時候會重定向到新浪訪客系統https://passport.weibo.com/visitor/visitor?a=enter
,這個頁面的腳本也是沒有混淆,帶中文註釋的。不過不管哪種方式最後都會到https://login.sina.com.cn/sso/login.php
# 新浪訪客系統,用來恢復cookie
elif url.startswith('https://passport.weibo.com/visitor/visitor'):
if 'a=enter' in url:
next_url = 'https://passport.weibo.com/visitor/visitor?a=restore&cb=restore_back&from=weibo'
else:
res_ = self.__get_jsonp_response(res)
next_url = (
f'https://login.sina.com.cn/sso/login.php?entry=sso&alt={res_["data"]["alt"]}'
f'&returntype=META&gateway=1&savestate={res_["data"]["savestate"]}'
f'&url=https%3A%2F%2Fweibo.com%2F%3Fdisplay%3D0%26retcode%3D6102'
)