2020.01.05

1、scrapy 將str轉化爲HTML用於xpath

from scrapy.selector import Selector
names = Selector(text=datas).xpath("//div[contains(@class,'jDesc')]/a/text()").extract()

2、selenium webdriver find_element_by_xpath（）內容帶參數方法：（和C語言輸出是方法類似，與xpath不一樣）

driver.find_element_by_xpath("//td[contains(text(),'%s')]" % cluster_name)

其中cluster_name是參數名稱，%s是參數類型（當前爲字符串，整型爲%d），參數提前賦值

3、設置主鍵自增從1開始

truncate table ‘tablename'

4、豆瓣源 pip install -i https://pypi.doubanio.com/simple/ XXX

5、微博詳情點擊：

ac = self.web.find_element_by_xpath(".//div[@class = 'm-container-max']/div/div/div[%s]" % j).find_element_by_xpath(".//footer/div[2]/h4")
self.web.execute_script("arguments[0].click();", ac)  # 用js執行

只能使用self.web.execute_script才能模擬點擊微博

6、點擊QQ登陸：

打開qq登陸後

self.web.page_source中沒有左邊的源代碼，左邊源代碼在iframe中，需要再進入iframe中

self.web.switch_to.frame(self.web.find_element_by_xpath(".//iframe[@id = 'ptlogin_iframe']"))#進入iframe，如果不進入，則拿不到iframe中的源碼
 ac = self.web.find_element_by_xpath(".//span[@id = 'img_out_11943809']")#id根據QQ號決定
 self.web.execute_script("arguments[0].click();", ac)  # 用js執行

7、微博爬蟲未登錄狀態，每次只可以最多連續爬取29個網頁內容

8 scrapyd 啓動爬蟲

跳轉到爬蟲項目根目錄下

1、scrapyd

2、scrapyd-deploy

3、curl http://localhost:6800/schedule.json -d project=weibo -d spider=film

985 碩士程序員，空窗 4 個月沒有 Offer！

【入門教程】5分鐘教你快速學會集成Java springboot ~

營銷系統黑名單優化：位圖的應用解析

一文搞懂 Spring 循環依賴

我真的從測試轉成了開發......

盛大發布 | Zabbix 7.0 LTS--性能與擴展的卓越融合

nginx添加相應配置，通過瀏覽器訪問或curl時返回客戶端對應公網IP

賽博鬥地主——使用大語言模型扮演Agent智能體玩牌類遊戲。

python內置函數——sorted

[oeasy]python020在遊戲中體驗數值自由_勇闖地下城_終端文字遊戲

最新，最新！selenium模擬登陸知乎，繞過驗證碼

雜物堆

Linux 下快速產生多個隨機數

Python 報錯中心

爬蟲相關技術

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結