目錄~

引入

就在我剛剛寫完舊版正方系統爬蟲的時候（舊版正方系統爬蟲代碼）
學校就出了新版的正方教務系統

估計是裝空調的錢有的多
那就開始講解叭~

需要什麼軟件？

基本的: Python！ (我最喜歡3.x的版本啦); 一個你喜歡的IDE！ (順手的IDE事半功倍哦)
庫: requests(爬蟲基本都大家都明白的吧~); BeautifulSoup(解析結構化數據很好用啦); re(正則化表達式html還算好用吧); time(這是網址中神祕代碼的來源哦); datetime(這個可以不加啦我做了和數據庫的鏈接方便記錄時間); subprocess(委屈是不經意間沒有辦法撿來的解決辦法); sys(防止報錯意外停止啦)
擴展的: Fiddler 4！(模擬爬蟲是真滴好用)

模擬登陸

首先我們啓動Fiddler 然後正常訪問一下教務系統

發現了這兩條事件
第78條事件就是訪問主頁面啦
但是第80條事件返回了一個json格式的數據不知道幹嘛的先保存下來吧
輸入賬號密碼

在Fiddler中出現了一條post數據

點一下WebForms看看帶了什麼數據進去

Body	Value
time	time庫裏的time~
csrftioken	不知道什麼東西怎麼沒出現過
yhm	用戶名是明文唉大家心知肚明就好了
mm	這個就是輸入的密碼了一看就經過了加密

什麼？？加密過了？？那是怎麼加密的呢
在主頁面經過審查元素髮現了js的文件

啊~在login.js裏面找到了這些代碼

$.getJSON(_path+"/xtgl/login_getPublicKey.html?time="+new Date().getTime(),function(data){
        modulus = data["modulus"];
        exponent = data["exponent"];
});
------我是分割符------
var rsaKey = new RSAKey();
rsaKey.setPublic(b64tohex(modulus), b64tohex(exponent));
var enPassword = hex2b64(rsaKey.encrypt($("#mm").val()));
$("#mm").val(enPassword);
$("#hidMm").val(enPassword);

大概翻譯一下就是
獲取了publishkey之後使用publishkey對明文的密碼做RSA算法加密再使用BASE64填充

升級了系統不就是不用驗證碼了嗎爲什麼做爬蟲更累了呢

總結一下

保存PublishKey
對密碼進行加密
獲取csrftoken
登陸！

代碼實現(登陸)

首先準備一個session(會話)

session = requests.Session()
time_now = int(time.time())
session.headers.update({
	'Accept': 'text/html, */*; q=0.01',
	'Accept-Encoding': 'gzip, deflate',
	'Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8',
	'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:58.0) Gecko/20100101 Firefox/58.0',
	'X-Requested-With': 'XMLHttpRequest',
	'Connection': 'keep-alive',
	'Content-Length': '0',
	'Content-Type': 'application/x-www-form-urlencoded',
	'Host': 'qjxyjw.hznu.edu.cn',
	'Referer': 'http://qjxyjw.hznu.edu.cn/jwglxt/xtgl/index_initMenu.html?jsdm=&_t=' + str(time_now),
	'Upgrade-Insecure-Requests': '1'
	})

至於這個頭是怎麼來的詳情Fiddler

# 準備publickey
url = 'http://qjxyjw.hznu.edu.cn/jwglxt/xtgl/login_getPublicKey.html?time=' + str(time_now)
r = session.get(url)
publickey = r.json()

提一下這個csrftoken 找來找去最後在訪問主頁面的時候找到了

# 準備csrftoken
url = 'http://qjxyjw.hznu.edu.cn/jwglxt/xtgl/login_slogin.html?language=zh_CN&_t=' + str(time_now)
r = session.get(url)
r.encoding = r.apparent_encoding
soup = BeautifulSoup(r.text, 'html.parser')
csrftoken = soup.find('input', attrs={'id': 'csrftoken'}).attrs['value']

說一下這個加密雖然github上面有大佬從js裏面移植過來的rsa算法但是我無論如何都用不來
只好用別人的java程序在python裏面使用txt對密碼傳輸（雖然很low但是是無奈之舉了）

url = r"C:\Users\Administrator\Desktop\new_jiaowu"
f = open(url + r"\code.txt", "w")
f.write(studentid + '\n')
f.write(studentpwd + '\n')
f.write(publickey['modulus'] + '\n')
f.write(publickey['exponent'] + '\n')
f.close()
try:
	subprocess.Popen('code.exe', shell=False, close_fds=True)
except:
	print("啓動加密程序錯誤")
	sys.exit()
time.sleep(1)
with open(url + r"\encode.txt", 'r') as f:
	list1 = f.readlines()
for i in range(0, len(list1)):
	list1[i] = list1[i].rstrip('\n')

id = list1[0]
rsacode = list1[1]
f.close()
if id != studentid:
	print("RSA加密錯誤...等待調試")
	sys.exit()

對應的java代碼在筆記本里給個空位=。=

等待更新

嘿嘿嘿東西都準備好了嘗試登陸吧

try:
	url = 'http://qjxyjw.hznu.edu.cn/jwglxt/xtgl/login_slogin.html'
	data = {
	'csrftoken': csrftoken,
	'mm': rsacode,
	'mm': rsacode,
	'yhm': studentid
	}
	result = session.post(url, data=data)
	return result.text
except Exception as e:
	print(e)

如果密碼輸入錯誤的話會有提示框這裏使用in就可以簡單的實現判斷了

if '用戶名或密碼不正確' in result.text:
	return "用戶名或密碼不正確"

最後封裝一下就實現了主頁面登陸的按鈕啦
貼個代碼

def login(studentid, studentpwd, session):
    time_now = int(time.time())
    # 準備publickey
    url = 'http://qjxyjw.hznu.edu.cn/jwglxt/xtgl/login_getPublicKey.html?time=' + str(time_now)
    r = session.get(url)
    publickey = r.json()

    # 準備csrftoken
    url = 'http://qjxyjw.hznu.edu.cn/jwglxt/xtgl/login_slogin.html?language=zh_CN&_t=' + str(time_now)
    r = session.get(url)
    r.encoding = r.apparent_encoding
    soup = BeautifulSoup(r.text, 'html.parser')
    csrftoken = soup.find('input', attrs={'id': 'csrftoken'}).attrs['value']

    # 加密密碼
    url = r"C:\Users\Administrator\Desktop\new_jiaowu"
    f = open(url + r"\code.txt", "w")
    f.write(studentid + '\n')
    f.write(studentpwd + '\n')
    f.write(publickey['modulus'] + '\n')
    f.write(publickey['exponent'] + '\n')
    f.close()

    try:
        subprocess.Popen('code.exe', shell=False, close_fds=True)
    except:
        print("啓動加密程序錯誤")
        sys.exit()
    time.sleep(1)
    with open(url + r"\encode.txt", 'r') as f:
        list1 = f.readlines()
    for i in range(0, len(list1)):
        list1[i] = list1[i].rstrip('\n')

    id = list1[0]
    rsacode = list1[1]
    f.close()
    if id != studentid:
        print("RSA加密錯誤...等待調試")
        sys.exit()

    # 單擊登錄按鈕
    try:
        url = 'http://qjxyjw.hznu.edu.cn/jwglxt/xtgl/login_slogin.html'
        data = {
            'csrftoken': csrftoken,
            'mm': rsacode,
            'mm': rsacode,
            'yhm': studentid
        }
        result = session.post(url, data=data)
        return result.text
    except Exception as e:
        print(e)

模擬獲取成績

老樣子我們先模擬登陸使用Fiddler看看是怎麼一個過程

這裏可以一次性查詢所有成績有一點點方便

鐺鐺在Fiddler中發現瞭如下數據

Body	Value
xnm	學年名
xqm	學期名
_search	固定false
nd	這個和之前的time一樣的啦都是time庫裏面的time函數整數化一下就好了
query*	固定的照抄照抄
time	固定0（總覺得這個的存在是正方有點問題）

還是提一下這裏學期名發現第一學期發送的是3 第二學期發送的是12

代碼實現(獲取成績)

def score_page(session, year, term):
    url = 'http://qjxyjw.hznu.edu.cn/jwglxt/cjcx/cjcx_cxDgXscj.html?doType=query&gnmkdm=N305005'
    # 定義所有學期 3爲第一學期 12爲第二學期
    if term == "1":
        term = "3"
    elif term == "2":
        term = "12"
        
    try:
        data = {'_search': 'false',
                'nd': int(time.time()),
                'queryModel.currentPage': '1',
                'queryModel.showCount': '15',
                'queryModel.sortName': '',
                'queryModel.sortOrder': 'asc',
                'time': '0',
                'xnm': year,
                'xqm': term
                }
        result = session.post(url, data=data)
        result = result.json()
        return result
    except:
        return '[Error]獲取該學期成績失敗'

解析成績

既然成績我們都獲取到啦還很方便給的是json數據 so 你懂了嗎

stu_name = result['items'][0]['xm']
sch_stu = result['items'][0]['xslb']
institute = result['items'][0]['jgmc']
stu_class = result['items'][0]['bj']
print('姓名:{}\t學歷:{}\t\t學院:{}\t班級:{}'.format(stu_name, sch_stu, institute, stu_class))
# dt = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")
plt = '{0:{4}<15}\t{1:{4}<6}\t{2:{4}<6}\t{3:{4}<4}'
for i in result['items']:
	print(plt.format(i['kcmc'], i['bfzcj'], i['jd'], i['jsxm'], chr(12288)))
	# sql.insert_score(studentid, year, term, i['kcmc'], i['bfzcj'], i['jd'], i['jsxm'], dt)

因爲後面我做好了和sql數據庫的寫入作爲演示我都註釋掉啦

測試(完成圖)

嗝。。因爲加密密碼的環境問題之後用筆記本再貼進來啦給個空位=。=
（我是圖片）

作者的話

爲了這個程序真的是心力憔悴(寫博客的格式更累)
但是還是結束啦~

希望能幫助到學習爬蟲的各位~

Python實現新版正方教務系統爬蟲

目錄~

引入

需要什麼軟件？

模擬登陸

代碼實現(登陸)

模擬獲取成績

代碼實現(獲取成績)

解析成績

測試(完成圖)

作者的話

如何使用 JS 判斷用戶是否處於活躍狀態

Mono 支持LoongArch架構

lightdb秒級增加列和刪除列（not null帶默認值）

lightdb數據庫超時相關控制參數

通過HPA+CronHPA組合應對業務複雜彈性伸縮場景

❤️‍🔥 Solon Cloud Event 新的事務特性與應用

網絡爬蟲的祕密：如何高效地抓取JD.com視頻鏈接

lightdb mysql 8.0兼容之不可見主鍵

使用 JS 實現在瀏覽器控制檯打印圖片 console.image()

基於Ubuntu-22.04安裝K8s-v1.28.2實驗（四）使用域名訪問網站應用

Python實現新版正方教務系統爬蟲（二）

Python實現新版正方教務系統爬蟲

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結