目標:前十數據
過程:
# -*- coding:utf-8 -*- #不加這個報錯 import requests from bs4 import BeautifulSoup import re import csv import datetime url = 'http://data.10jqka.com.cn/market/rzrq/' today = datetime.date.today().strftime('%Y%m%d') #採集日期 res = requests.get(url) res.encoding = res.apparent_encoding html = BeautifulSoup(res.text,'lxml') data = html.select('#table1 > table > tbody') data = str(data).replace('-','') datas = re.findall('(\d+.?\d\d)',data) exc = [datas[i:i+13] for i in range(0,len(datas),13)] f = open('rzrq.csv', 'w', newline="") writer = csv.writer(f) writer.writerow(('交易日期','本日融資餘額(億元)上海', '本日融資餘額(億元)深圳', '本日融資餘額(億元)滬深合計', '本日融資買入額(億元)上海', '本日融資買入額(億元)深圳', '本日融資買入額(億元)滬深合計','本日融券餘量餘額(億元)上海', '本日融券餘量餘額(億元)深圳', '本日融券餘量餘額(億元)滬深合計','本日融資融券餘額(億元)上海', '本日融資融券餘額(億元)深圳', '本日融資融券餘額(億元)滬深合計','採集日期')) for i in range(len(exc)): line = exc[i] line.append(today) #這裏每一行追加採集日期 writer.writerow(line)
結果:
拓展:雖然需求只要前十個,但是以防萬一還是全取出來。
這時候我發現一個問題,就是這個:,我點這東西,網址竟然不動,還是跑到源代碼看一看把。
javascript:void(0) what????,這東西隱藏了#,怎麼辦?繼續找找吧,看看隱藏到哪去了。
看圖:
按順序點,看看效果,嘿!出來了
ojbk ,網址有了,大功告成!按照動態網頁的辦法去寫吧,附上代碼
# -*- coding:utf-8 -*- #不加這個報錯 import requests from bs4 import BeautifulSoup import re import csv import datetime today = datetime.date.today().strftime('%Y%m%d') #採集日期 Cookie = "Hm_lvt_60bad21af9c824a4a0530d5dbf4357ca=1587644691; Hm_lvt_f79b64788a4e377c608617fba4c736e2=1587644692; Hm_lvt_78c58f01938e4d85eaf619eae71b4ed1=1587644692; Hm_lpvt_f79b64788a4e377c608617fba4c736e2=1587644737; Hm_lpvt_60bad21af9c824a4a0530d5dbf4357ca=1587644737; Hm_lpvt_78c58f01938e4d85eaf619eae71b4ed1=1587644737; v=AmU_JzrXV8VWALMZXrG3U_-tdCqcohk0Y1b9iGdKIRyrfotcL_IpBPOmDVT0" url = "http://data.10jqka.com.cn/market/rzrq/board/getRzrqPage/page/{}/ajax/1/" headers = { 'User-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.113 Safari/537.36', 'Cookie': Cookie, 'Connection': 'keep-alive', 'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Accept-Language': 'zh-CN,zh;q=0.9', 'Host': 'data.10jqka.com.cn', # 'Referer': 'http://www.sse.com.cn/market/stockdata/overview/weekly/' } #文件放在循環外打開,如果放在內部,那麼前一次循環的數據會被覆蓋掉 f = open('rzrq.csv', 'w', newline="") writer = csv.writer(f) writer.writerow(('交易日期','本日融資餘額(億元)上海', '本日融資餘額(億元)深圳', '本日融資餘額(億元)滬深合計', '本日融資買入額(億元)上海', '本日融資買入額(億元)深圳', '本日融資買入額(億元)滬深合計','本日融券餘量餘額(億元)上海', '本日融券餘量餘額(億元)深圳', '本日融券餘量餘額(億元)滬深合計','本日融資融券餘額(億元)上海', '本日融資融券餘額(億元)深圳', '本日融資融券餘額(億元)滬深合計','採集日期')) for i in range(1,6): req = requests.get(url.format(i),headers=headers) html = BeautifulSoup(req.text,'lxml') data = html.select('#table1 > table > tbody') data = str(data).replace('-','') datas = re.findall('(\d+.?\d\d)',data) exc = [datas[i:i+13] for i in range(0,len(datas),13)] for j in range(len(exc)): line = exc[j] line.append(today) #這裏每一行追加採集日期 writer.writerow(line) f.close()
結果:
嗯 ,完美!