微博登錄和session恢復過程

完整源碼參考我的pixiv-to-weibo項目，只實現了核心部分，沒做容錯處理

登錄

預登錄

這一步是爲了獲取加密公鑰和nonce，因爲以前新浪沒用HTTPS，必須手動加密。其中用戶名編碼su算法爲BASE64(URI編碼(用戶名))

    async def _pre_login(self, su):
        async with self._session.get('https://login.sina.com.cn/sso/prelogin.php', params={
            'entry':    'weibo',
            'callback': 'sinaSSOController.preloginCallBack',
            'su':       su,
            'rsakt':    'mod',
            'checkpin': '1',
            'client':   'ssologin.js(v1.4.18)'
        }) as r:
            return self.__get_jsonp_response(await r.text())

返回方式是JSONP，有用的是這幾個字段：用來加密的公鑰pubkey，防止回放攻擊的nonce和servertime，之後請求要用的rsakv，是否需要驗證碼showpin，驗證碼請求要用的pcid

獲取驗證碼

如果預登錄返回的數據中showpin爲1則需要驗證碼。獲取驗證碼的請求如下，返回的是驗證碼圖片數據

    async def _input_verif_code(self, pcid):
        async with self._session.get('https://login.sina.com.cn/cgi/pin.php', params={
            'r': random.randint(0, 100000000),
            's': '0',
            'p': pcid
        }) as r:
            img_data = await r.read()
        self._show_image(img_data)
        return input('輸入驗證碼：')

登錄

密碼密文sp計算

sp算法爲：轉十六進制文本(RSA(servertime + '\t' + nonce + '\n' + 密碼))，其中RSA公鑰在預登錄時返回

    @staticmethod
    def _get_secret_password(password, servertime, nonce, pubkey):
        key = rsa.PublicKey(int(pubkey, 16), 65537)
        res = rsa.encrypt(f'{servertime}\t{nonce}\n{password}'.encode(), key)
        res = binascii.b2a_hex(res)
        return res.decode()

登錄請求

登錄請求如下，這個請求完成後就可以發微博了。很多微博登錄的源碼都只做到這一步，但是24小時後session會失效，少了後面的步驟則不能恢復session

    async def login(self, username, password):
        su = base64.b64encode(quote_plus(username).encode()).decode()
        data = await self._pre_login(su)

        async with self._session.post('https://login.sina.com.cn/sso/login.php', params={
            'client': 'ssologin.js(v1.4.19)',
        }, data={
            'entry':       'weibo',
            'gateway':     '1',
            'from':        '',
            'savestate':   '7',
            'qrcode_flag': 'false',
            'useticket':   '1',
            'pagerefer':   'https://login.sina.com.cn/crossdomain2.php?action=logout&'
                           'r=https%3A%2F%2Fpassport.weibo.com%2Fwbsso%2Flogout%3Fr%3'
                           'Dhttps%253A%252F%252Fweibo.com%26returntype%3D1',
            'vsnf':        '1',
            'su':          su,
            'service':     'miniblog',
            'servertime':  data['servertime'],
            'nonce':       data['nonce'],
            'pwencode':    'rsa2',
            'rsakv':       data['rsakv'],
            'sp':          self._get_secret_password(password, data['servertime'],
                                                     data['nonce'], data['pubkey']),
            'sr':          '1366*768',
            'encoding':    'UTF-8',
            'prelt':       '233',
            'url':         'https://weibo.com/ajaxlogin.php?framelogin=1&callback='
                           'parent.sinaSSOController.feedBackUrlCallBack',
            'returntype':  'META',
            'door':        '' if data['showpin'] == 0
                           else await self._input_verif_code(data['pcid'])
        }) as r:
            return await self.__handle_login_page(str(r.url), await r.text())

返回內容爲一個網頁，其中有腳本跳轉到下一個地址

		<html>
		<head>
		<title>新浪通行證</title>
		<meta http-equiv="refresh" content="0; url=&#39;https://login.sina.com.cn/crossdomain2.php?action=login&entry=weibo&......&#39;"/>
		<meta http-equiv="Content-Type" content="text/html; charset=GBK" />
		</head>
		<body bgcolor="#ffffff" text="#000000" link="#0000cc" vlink="#551a8b" alink="#ff0000">
		<script type="text/javascript" language="javascript">
		location.replace("https://login.sina.com.cn/crossdomain2.php?action=login&entry=weibo&......");
		</script>
		</body>
		</html>

微博登錄中跳轉的網址是不確定的，有時候是https://login.sina.com.cn/crossdomain2.php，有時候是https://passport.weibo.com/visitor/visitor，整個流程不能寫死，所以我專門寫了一個函數來處理這些跳轉

    async def __handle_login_page(self, url, res):
        while True:
            # 登錄頁
            if url.startswith('https://login.sina.com.cn/sso/login.php'):
                next_url = self.__get_next_url(res)

            # ...

            # 登錄完畢
            elif url.startswith('https://weibo.com'):
                return '/home' in url

            # 未知的地址
            else:
                print('未知的地址：' + url)
                print(res)
                return False

            async with self._session.get(next_url, headers={
                'Referer': url  # 訪問visitor?a=restore必須帶referer
            }) as r:
                url = str(r.url)
                res = await r.text()

跨域廣播登錄

這是上一步跳轉到的crossdomain2.php乾的事，就是你打開微博偶爾會看到的"Signing in …"的頁面。舊版sso登錄腳本沒有混淆並且有中文註釋，有興趣的自己看吧。arrURL是要發送請求的URL列表，請求完成後調用location.replace跳轉

廣播URL中要關注的是passport.weibo.com這個域名，會設置一些恢復session用的cookie，Python中有些坑會導致這些cookie設置不上，這個後面再說

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=GBK" />
<title>Sina Passport</title>


<script charset="utf-8" src="https://i.sso.sina.com.cn/js/ssologin.js"></script>
</head>
<body>
Signing in ...
<script>
try{sinaSSOController.setCrossDomainUrlList({"retcode":0,"arrURL":["......"]});}
		catch(e){
			var msg = e.message;
			var img = new Image();
			var type = 1;
			img.src = 'https://login.sina.com.cn/sso/debuglog?msg=' + msg +'&type=' + type;
		}try{sinaSSOController.crossDomainAction('login',function(){location.replace('https://passport.weibo.com/wbsso/login?......');});}
		catch(e){
			var msg = e.message;
			var img = new Image();
			var type = 2;
			img.src = 'https://login.sina.com.cn/sso/debuglog?msg=' + msg +'&type=' + type;
		}
</script>
</body>
</html>

__handle_login_page中對這個頁面的處理如下

            # 跨域登錄廣播
            elif url.startswith('https://login.sina.com.cn/crossdomain2.php'):
                async def cross_domain_callback(url_, i):
                    async with self._session.get(url_, params={
                        'callback': 'sinaSSOController.doCrossDomainCallBack',
                        'scriptId': 'ssoscript' + str(i),
                        'client': 'ssologin.js(v1.4.2)'
                    }) as r_:
                        await r_.read()

                url_list = re.search(r'setCrossDomainUrlList\((.*?)\);', res)[1]
                url_list = json.loads(url_list)['arrURL']
                await asyncio.gather(*(
                    cross_domain_callback(url, i) for i, url in enumerate(url_list)
                ))

                next_url = self.__get_next_url(res)

解決cookie設置不上的問題

這個是因爲新浪返回的cookie中expire的星期幾用的是全稱，而Python標準庫中的正則假設它用的是3個字符的簡稱：\w{3},\s[\w\d\s-]{9,11}\s[\d:]{8}\sGMT

>>> import http.cookies
>>> cookie = http.cookies.SimpleCookie()
>>> cookie.load('SRF=155......; expires=Sunday, 15-Apr-2029 13:18:56 GMT; path=/; domain=.passport.weibo.com')
>>> cookie
<SimpleCookie: >
>>> cookie.load('SRF=155......; expires=Sun, 15-Apr-2029 13:18:56 GMT; path=/; domain=.passport.weibo.com')
>>> cookie
<SimpleCookie: SRF='155......'>

我的解決方法是在腳本開頭~~用黑科技~~把這個正則改了，不過不保證可移植性，如果不能用就自己想辦法吧

import http.cookies
http.cookies.BaseCookie._BaseCookie__parse_string.__defaults__ = (
    re.compile(r"""
        \s*                            # Optional whitespace at start of cookie
        (?P<key>                       # Start of group 'key'
        [""" + http.cookies._LegalKeyChars + r"""]+?   # Any word of at least one letter
        )                              # End of group 'key'
        (                              # Optional group: there may not be a value.
        \s*=\s*                          # Equal Sign
        (?P<val>                         # Start of group 'val'
        "(?:[^\\"]|\\.)*"                  # Any doublequoted string
        |                                  # or
        \w{3,},\s[\w\d\s-]{9,11}\s[\d:]{8}\sGMT  # Special case for "expires" attr
        |                                  # or
        [""" + http.cookies._LegalValueChars + r"""]*      # Any word or empty string
        )                                # End of group 'val'
        )?                             # End of optional value group
        \s*                            # Any number of spaces.
        (\s+|;|$)                      # Ending either at space, semicolon, or EOS.
        """, re.ASCII | re.VERBOSE),   # re.ASCII may be removed if safe.
)

跨域廣播之後

廣播完成後腳本跳轉到https://passport.weibo.com/wbsso/login，然後通常會重定向到https://weibo.com/ajaxlogin.php，redirect是下一個跳轉地址

<html><head><script language='javascript'>parent.sinaSSOController.feedBackUrlCallBack({"result":true,"userinfo":{"uniqueid":"......","userid":null,"displayname":null,"userdomain":"?wvr=5&lf=reg"},"redirect":"https:\/\/weibo.com\/nguide\/interest"});</script></head><body></body></html>

__handle_login_page中對這個頁面的處理如下

            # 調用parent.sinaSSOController.feedBackUrlCallBack跳轉到weibo.com/nguide/interest
            elif url.startswith('https://weibo.com/ajaxlogin.php'):
                res_ = self.__get_jsonp_response(res)
                next_url = res_['redirect']

然後經過一堆重定向到你的微博主頁https://weibo.com/....../home

恢復session

24小時後session會失效，此時再訪問微博首頁會被重定向到https://login.sina.com.cn/sso/login.php，之後按登錄流程處理即可

    async def restore_session(self):
        async with self._session.get('https://weibo.com/') as r:
            return await self.__handle_login_page(str(r.url), await r.text())

有時候會重定向到新浪訪客系統https://passport.weibo.com/visitor/visitor?a=enter，這個頁面的腳本也是沒有混淆，帶中文註釋的。不過不管哪種方式最後都會到https://login.sina.com.cn/sso/login.php

            # 新浪訪客系統，用來恢復cookie
            elif url.startswith('https://passport.weibo.com/visitor/visitor'):
                if 'a=enter' in url:
                    next_url = 'https://passport.weibo.com/visitor/visitor?a=restore&cb=restore_back&from=weibo'
                else:
                    res_ = self.__get_jsonp_response(res)
                    next_url = (
                        f'https://login.sina.com.cn/sso/login.php?entry=sso&alt={res_["data"]["alt"]}'
                        f'&returntype=META&gateway=1&savestate={res_["data"]["savestate"]}'
                        f'&url=https%3A%2F%2Fweibo.com%2F%3Fdisplay%3D0%26retcode%3D6102'
                    )

微博登錄和session恢復過程

微博登錄和session恢復過程

登錄

預登錄

獲取驗證碼

登錄

密碼密文sp計算

登錄請求

跨域廣播登錄

解決cookie設置不上的問題

跨域廣播之後

恢復session

.NET週刊【5月第3期 2024-05-19】

2020年上半年數據庫系統工程師考試

用Scrapy爬動畫列表

東方天空璋修改器相關地址

C/C++的編碼轉換

Minecraft的世界生成過程（三）噪聲函數

Minecraft的世界生成過程（四）地表

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結