【每日爬蟲】：給自己打造一個溫馨的家，面朝大海，春暖花開

文章目錄

一、前言

2020-04-07日爬蟲練習
每日一個爬蟲小練習，學習爬蟲的記得關注哦！

學習編程就像學習騎自行車一樣，對新手來說最重要的是持之以恆的練習。
在《汲取地下水》這一章節中看見的一句話：“別擔心自己的才華或能力不足。持之以恆地練習，才華便會有所增長”，現在想來，真是如此。

二、需求

2.1 意義

很多人裝修完房子，感覺房子不好看？是因爲前期工作沒做到位，我家就是這樣。正好閒暇，看到一些很漂亮的裝修效果圖，就想如果我家裝修之前能看到這樣的裝修風格就好了。

2.2 戶型分類：

於是，於是我就找了一家比較大的裝修平臺，死命薅，算了一下，總計爬了

一居室裝修示意圖 ： 205套
二居室裝修示意圖： 966套
三居室裝修示意圖： 2591套
四居室裝修示意圖：979套
複式裝修示意圖：331套
別墅裝修示意圖：694套
其他建築：7套

看到各種裝修風格的效果圖，相信裏面總有一款適合你。

2.3 風格分類：

2.4 風格的風格分類：

三、技術路線

1.requests
2.BeautifulSoup
3.re

具體模塊用法可以關注我的免費專欄：爬蟲學習筆記

四、總結

1、爬蟲一定要加上異常處理，不讓自己的程序因爲異常而崩潰。
2、代碼寫不出來，得安裝步驟來

拿到需求，先分析實現的思路
把實現的思路邊分析邊記錄下來
寫實現代碼
拿不準的地方，邊分析、邊寫思路然後就編寫代碼也是可以的。
堅持，多練習

3、該代碼還有很多優化的地方，晚點上多線程爬取的源代碼，如果爬取有問題，評論告訴我。

五、源代碼

'''
    爬土巴兔裝修效果圖，按分類爬取

    version:01
    author：金鞍少年
    Blog：https://jasn67.blog.csdn.net/
    date：2020-04-08

'''
import requests
import random, os, sys
from bs4 import BeautifulSoup
import re

class House_renderings():

    def __init__(self):
        # 戶型
        self.house_lis = '''
                        ------- 請選擇戶型 ---------
                        1:一居室
                        2:兩居室
                        3:三居室
                        4:四居室及以上
                        5:複式
                        6:別墅豪宅
                        7:其他
                        8:退出
                        '''
        self.headers = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36',
            'Referer': 'https://xiaoguotu.to8to.com/'
        }

        # 代理ip
        self.all_proxies = [
            {'http': '183.166.20.179:9999'}, {'http': '125.108.124.168:9000'},
            {'http': '182.92.113.148:8118'}, {'http': '163.204.243.51:9999'},
            {'http': '175.42.158.45:9999'}]  # 需要自行去找一些免費的代理,參考我其他博客案例

        self.path = './res/'  # 本地存儲目錄

    # 請求url，獲取html
    def get_html(self, url):
        try:
            result = requests.get(url=url, headers=self.headers, proxies=random.choice(self.all_proxies))
            result.raise_for_status()  # 主動拋出一個異常
            html = BeautifulSoup(result.text, 'lxml')
            return html
        except:
            print('鏈接失敗！')

    # 獲取page分頁
    def get_page_urls(self, url, html):
        try:
            Pages = list(html.find('div', class_="pages").find_all('a'))[-2].string
            for page in range(1, int(Pages) + 1):
                page_url = url+'p{}'.format(page)
                yield page_url
        except AttributeError:
            yield url

    # 獲取詳情頁面url
    def get_detail_urls(self, html):
        a_tag = html.find('div', class_="xmp_container").find_all('a', class_="item_img")
        for a in a_tag:
            detail_urls = 'https:' + a['href']
            yield detail_urls


    # 獲取詳情頁內容,並保存到本地
    def Save_detail_page(self, detail_html):
        try:
            house_style = detail_html.find('ul', class_="tag_list xg_tag").find('a').string  # 裝修風格
            house_type = detail_html.find('ul', class_="tag_list xg_tag").find_all('a')[1].string  # 戶型
            atlas_name = detail_html.find('strong', id="fine_n").get_text()   # 圖集名
            atlas_name = re.sub(r"[\/\\\:\*\?\"\<\>\|]", "_", atlas_name)  # 轉義 Windows文件名中的非法字符方法

            file_path = self.path + house_type + '/' + house_style + '/' + atlas_name + '/'  # 拼接文件存儲路徑

            # 遞歸創建文件夾
            if not os.path.exists(file_path):
                os.makedirs(file_path)

            # imgs = detail_html.find('div', class_="display-none").find_all('img')
            # for index, img in enumerate(imgs):
            #     jpg = requests.get(img['src'], headers=self.headers, proxies=random.choice(self.all_proxies))
            #     with open(file_path + '%s.jpg' % (index + 1), 'wb' )as f:
            #         f.write(jpg.content)
            #         print('{}圖集：{}效果圖下載成功，！'.format(atlas_name, index))
        except Exception as e:
            if hasattr(e, 'reason'):
                print(f'抓取失敗，失敗原因：{e.reason}')


    # 選擇戶型
    def choice_house(self):
        while True:
            print(self.house_lis)
            choice = input("請選擇輸入序號選擇戶型 ：").strip()
            if choice == "1":
                return 'https://xiaoguotu.to8to.com/list-h2s7i0'
            elif choice == "2":
                return 'https://xiaoguotu.to8to.com/list-h2s2i0'
            elif choice == "3":
                return 'https://xiaoguotu.to8to.com/list-h2s3i0'
            elif choice == "4":
                return 'https://xiaoguotu.to8to.com/list-h2s4i0'
            elif choice == "5":
                return 'https://xiaoguotu.to8to.com/list-h2s5i0'
            elif choice == "6":
                return 'https://xiaoguotu.to8to.com/list-h2s6i0'
            elif choice == "7":
                return 'https://xiaoguotu.to8to.com/list-h2s8i0'
            elif choice == "8":
                print('退出成功!')
                sys.exit()
            else:
                print('輸入錯誤，重新輸入！')

    # 邏輯功能
    def func(self):
        house_classify_url = self.choice_house()  # 獲取戶型url
        house_classify_html = self.get_html(house_classify_url)  # 獲取首頁html
        for page_url in self.get_page_urls(house_classify_url, house_classify_html):  # 獲取分頁url
            try:
                page_html = self.get_html(page_url)  # 獲取分頁html
                for detail_url in self.get_detail_urls(page_html):  # 獲取詳情頁url
                    detail_html = self.get_html(detail_url)  # 獲取詳情頁html
                    self.Save_detail_page(detail_html)  # 獲取圖片集,並保存到本地
            except Exception as e:
                print(f'抓取失敗，失敗原因：{e}')
                continue  # 循環中遇到異常跳過繼續運行

if __name__ == '__main__':
    h = House_renderings()
    h.func()

六、實現效果

![](https://img-blog.csdnimg.cn/20200408154205394.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80MjQ0NDY5Mw==,size_16,color_FFFFFF,t_70

【每日爬蟲】：給自己打造一個溫馨的家，面朝大海，春暖花開

文章目錄

一、前言

二、需求

2.1 意義

2.2 戶型分類：

2.3 風格分類：

2.4 風格的風格分類：

三、技術路線

四、總結

五、源代碼

六、實現效果

【爬蟲學的好，基礎少不了】：數據解析之BeautifulSoup4庫

每日爬蟲練習：瓜子二手車爬蟲信息的採集

【5分鐘力扣】06.Z字形變換

【python內功修煉009】：基於threading.Timer實現任務定時器

Python基礎： repr函數和str的區別

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結