不用登陸可以直接爬取,數據好找,主要在於分析頁面

參考於:https://blog.csdn.net/weixin_44835732/article/details/103350174

請求:requests 解析:xpath

看界面圖片,看到下面要下載客戶端,先不用慌,分析url,上邊是1-8888,推測可能一共有8888頁,但是總共500首歌曲,肯定不對,我們更改url試試看

果然,經過測試,只有前23頁有數據,往後都沒有數據

代碼很簡單,如下

from lxml import etree
import requests
base_url = "https://www.kugou.com/yy/rank/home/{}-8888.html?from=rank"
data_lst = []
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.106 Safari/537.36",
    "Referer": "https://www.kugou.com/"
}


def get_information(url):
    response = requests.get(url,headers=headers).content
    html = etree.HTML(response)
    song_lst = html.xpath(".//div[@class='pc_temp_songlist ']//li/@title")
    for song in song_lst:
        data_lst.append(song)

if __name__ == "__main__":
    for i in range(1,24):
        page = str(i)
        url = base_url.format(page)
        get_information(url)
    print(data_lst)

數據如下:

確實是500個,沒有遺漏

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

爬蟲項目3[爬取酷狗音樂Top500歌名]

不用登陸可以直接爬取,數據好找,主要在於分析頁面

詐騙（殺豬盤）網站進行滲透測試

Python 潮流週刊#50：我最喜歡的 Python 3.13 新特性！

外行也能讀懂的網絡硬件設備功能原理速成

pliiow模塊生成驗證碼

30.modelform的使用

騰訊雲短信的使用

使用virtualenv搭建虛擬環境

31.django離線腳本

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結