利用scrapy抓取數據，批量插入mysql

原創

2020-06-16 11:50

具體抓取代碼沒什麼好講的，要注意的地方就是將抓取的數據插入數據庫。

每1000條數據提交一次數據庫：

pipelines.py

def process_item(self, item, spider):
    try:
        page_data = (item["scode"], item["name"], item["gender"], item["age"], item["education"], item["position"], item["in_office_time"], item["introduction"], item["insert_time"],
        item["hold_count"], item["order_num"])

        self.item_list.append(tuple(page_data))
        if(len(self.item_list)==1000):#1000條提交一次
            self.cursor.executemany(
                "INSERT INTO ssb_insight_company_team_manager_info(scode,name,gender,age,education,position,in_office_time,introduction,insert_time,hold_count,order_num" \
                        ") values (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s) " \
                    "on duplicate key update name =values(name),gender=values(gender)," \
                    "age =values(age),education=values(education)," \
                    "position =values(position),in_office_time=values(in_office_time)," \
                    "introduction =values(introduction),hold_count=values(hold_count),order_num=values(order_num)",
                (self.item_list)
            )
            # 提交sql語句
            self.connect.commit()
            del self.item_list[:]


    except Exception as e:
        # 出現錯誤時打印錯誤日誌
        log.INFO("數據庫插入異常===", e)
        self.connect.rollback()
    return item

注意：如果只是這樣的話，還會有一個問題，最後一批數據，如果不滿足1000，則不會插入數據庫，這樣會有數據遺漏，所以還需要在關閉爬蟲的時候，將剩下的數據提交一次

def close_spider(self, spider):
    try:
        self.cursor.executemany(
            "INSERT INTO ssb_insight_company_team_manager_info(scode,name,gender,age,education,position,in_office_time,introduction,insert_time,hold_count,order_num" \
            ") values (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s) " \
            "on duplicate key update name =values(name),gender=values(gender)," \
            "age =values(age),education=values(education)," \
            "position =values(position),in_office_time=values(in_office_time)," \
            "introduction =values(introduction),hold_count=values(hold_count),order_num=values(order_num)",
            (self.item_list)
        )
        # 提交sql語句
        self.connect.commit()
    except Exception as e:
        # 出現錯誤時打印錯誤日誌
        log.INFO("數據庫插入異常===", e)
        self.connect.rollback()
    self.cursor.close()
    self.connect.close()

完整代碼：

鏈接：https://pan.baidu.com/s/1rH4T-EgUoDSTLBLSjBKCtQ
提取碼：fje3

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

利用scrapy抓取數據，批量插入mysql

Python 潮流週刊#52：Python 處理 Excel 的資源

java steam對象根據屬性值排序正序倒序

利用docker安裝mysql鏡像及遠程連接mysql

利用scrapy抓取數據，批量插入mysql

python3 讀取配置文件

利用asyncio併發下載pdf並同步到七牛雲網盤

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結