selenium自動登錄獲取cookie+爬取在線編程網站阿爾法Coding中已完成的代碼

具體思路：
之前寫過一個爬取阿爾法Coding中已完成的代碼的帖子，這個得手動獲取cookie並複製到代碼中才可以正常爬取數據。
但是通過selenium，我們可以實現自動登錄，並自動獲取cookie，然後直接在原有代碼的基礎上繼續爬取數據。

通過selenium依次點擊並輸入內容（元素的定位直接在檢查工具裏copy xpath）即可，

最後登錄平臺獲取cookie，代碼如下，具體看註釋：

AllinOne.py

# coding=utf-8
import json
import os
import time
from selenium import webdriver
import requests


def getCookie(str):  # 獲取cookie
    wd = webdriver.Chrome(r'D:\chromedriver.exe')
    wd.implicitly_wait(10)
    wd.get('http://www.alphacoding.cn/login/')
    select__selections = wd.find_element_by_xpath(
        '//*[@id="app"]/div/div/div/div[2]/div[1]/div/div[1]/div[1]/div[2]/div[1]/div/span').click()#點擊下拉列表
    select = wd.find_element_by_xpath(
        '//*[@id="app"]/div/div/div/div[2]/div[1]/div/div[1]/div[1]/div[2]/div[2]/ul[2]/li[2]').click()#選擇學校
    username = wd.find_element_by_xpath('//*[@id="app"]/div/div/div/div[2]/div[1]/div/div[1]/div[2]/div[2]/input').send_keys('此處輸入學號')#輸入學號
    password = wd.find_element_by_xpath('//*[@id="app"]/div/div/div/div[2]/div[1]/div/div[1]/div[3]/div[2]/input').send_keys('此處輸入密碼')#輸入密碼

    login = wd.find_element_by_xpath('//*[@id="app"]/div/div/div/div[2]/div[1]/div/div[2]/button/span').click()#點擊登錄
    time.sleep(3)
    cookie_list = wd.get_cookies()#獲取cookie
    cookie1 = cookie_list[1]['name'] + '=' + cookie_list[1]['value']
    cookie2 = cookie_list[0]['name'] + '=' + cookie_list[0]['value']
    usefulcookie = cookie1 + ';' + cookie2#cookie拼接
    # print(usefulcookie)
    wd.quit()#獲取cookie後退出瀏覽器
    return usefulcookie


def getHTMLText(url, usefulcookie):  # 獲取網頁內容
    try:
        kv = {'cookie': 'null', 'user-agent': 'Mozilla/5.0'}
        kv['cookie'] = usefulcookie
        # print(kv)
        r = requests.get(url, headers=kv, timeout=30)
        r.raise_for_status()
        r.encoding = r.apparent_encoding
        demo = r.text
        return demo
    except:
        return ""


def grabCode(demo):  # 提取數據
    jsonstr = json.loads(demo)
    # print(type(jsonstr))
    #   加工數據
    # 輸出標題
    print("題目".center(40, '*'))
    print(jsonstr['data']['lesson']['title'])  # 對字典進行訪問
    # 輸出內容
    print("要求".center(40, '*'))
    print(jsonstr['data']['lesson']['exercises'][0]['description']['content'])  # 對字典和列表進行訪問
    # 輸出代碼
    print("代碼".center(40, '*'))
    print(jsonstr['data']['lesson']['exercises'][0]['files'][0]['correctAnswer'])
    # 數據加工後，將題目，內容，代碼拼接，相當於成品，下一步存入文件
    code = "題目".center(40, '*') + "\n" + jsonstr['data']['lesson']['title'] + "\n" + \
           "要求".center(40, '*') + "\n" + jsonstr['data']['lesson']['exercise'][0]['description']['content'] + "\n" + \
           "代碼".center(40, '*') + "\n" + \
           jsonstr['data']['lesson']['exercise'][0]['files'][0]['correctAnswer']
    return code


def saveText(i, enddate, title):
    root = "d:/CrawlAlphaCoding/"
    textname = root + title[i] + ".txt"
    if not os.path.exists(root):  # 目錄不存在則創建
        os.makedirs(root)
    if not os.path.exists(textname):  # 文件不存在則創建
        f = open(textname, "w+", encoding="utf-8")  # 不加encoding會導致部分文件無內容寫入
        f.write(enddate)
    else:
        print("文件已存在")


def main():
    usefulcookie = getCookie(str)
    url = 'http://www.alphacoding.cn/api/courses/v3/79/chapterDetail'  # 獲取章節信息，包含id和title
    jsonstr = json.loads(getHTMLText(url, usefulcookie))  # 將json格式的字符轉換爲dict，從文件中讀取
    # print(getHTMLText(url, usefulcookie))
    batchUrl = []  # 定義列表，將id和title存入列表，便於保存文件時的使用
    title = []
    rawdate = jsonstr['data']['chapters']
    for i in rawdate:  # 第一個for循環，遍歷章節，用於做url後綴訪問網站
        for a in range(len(i['lessons'])):  # 第二個for循環，遍歷題目，用作文件名
            # print(len(i['lessons']))
            batchUrl.append("http://www.alphacoding.cn/api/learning/v3/79/lesson/" + i['lessons'][a]['lessonId'])#將網址存入Url列表
            title.append(i['lessons'][a]['title'])

    for i in range(len(batchUrl)):
        try:  # 導學部分沒有代碼，會訪問出錯，使用try/except處理異常
            print(batchUrl[i])
            saveText(i, grabCode(getHTMLText(batchUrl[i], usefulcookie)), title)
        except:
            print("章節導航沒有代碼")


main()

運行結果：

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

selenium自動登錄獲取cookie+爬取在線編程網站阿爾法Coding中已完成的代碼

虛擬機集羣搭建過程中hive和zookeeper啓動錯誤原因及分析

zookeeper啓動異常提示Caused by: java.lang.IllegalArgumentException: serverid is not a number

kettle實現用戶名動態脫敏和時間模糊

分佈式系統第三章進程

selenium自動登錄獲取cookie+爬取在線編程網站阿爾法Coding中已完成的代碼

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結