使用python爬取獵聘網的職位信息

原創

2020-06-13 01:00

閒來無事，就像看看現在各個行業找工作的情況，寫了個簡單的爬蟲，爬取獵聘網的職位信息。
話不多說，直接上代碼。

#-*- coding:utf-8 -*-
# 抓取獵聘的職位
import time
import threading

import requests
import re
from bs4 import BeautifulSoup
def get_job_list(job):
    thread_name = threading.current_thread().name
    print(f'[{thread_name}]:{job}')
    page_num=0
    while True:
        url='https://www.liepin.com/city-bj/zhaopin/pn'+str(page_num)+'/?key='+job+'&d_sfrom=search_city&d_ckId=757adc2153c6034f3c9d7fc1970e617d&d_curPage=1&d_pageSize=40&d_headId=757adc2153c6034f3c9d7fc1970e617d'
        resp=requests.get(url)
        soup=BeautifulSoup(resp.text,'html.parser')
        try:
            for div in soup.find_all('div',class_='sojob-item-main clearfix'):
                print(div.find('a').text,end=' ')
                xinzi= div.find('span',class_='text-warning')
                print(xinzi.next,end=' ')
                area=div.find('a',class_='area')
                edu= div.find('span',class_='edu')
                print(area.text,end=' ')
                print(edu.next)
        except Exception as e:
            print(e)

        div=soup.find('a',string=re.compile('下一頁'))
        if div:
            print(div.text)
            print(page_num)
        else:
            break
        time.sleep(1)
        page_num+=1
# 獲取所有的職位類型
def get_all_job_type():
    url='https://www.liepin.com/city-bj/zhaogongzuo/?sfrom=click-pc_homepage-centre_keywordjobs-search_new'
    resp=requests.get(url)
    soup=BeautifulSoup(resp.text,'html.parser')
    all_list=[]
    for dd in soup.find_all('dd'):
        a_all=dd.find_all('a')
        for a in a_all:
            # print(a.text)
            all_list.append(a.text)
    job_list=all_list[:-21]
    return job_list


job_list=get_all_job_type()
for job in job_list:
    print(job)
    t=threading.Thread(target=get_job_list,args=(job,))
    t.start()

這裏採用多線程爬取，每一類job一個線程。

爬取是非常快，但是運行一段時間後，就被獵聘把我的ip禁了，手機端也無法訪問獵聘的app，說是過幾天自動就解除了，真是悲劇。

所以建議還是老老實實的就用一個線程，慢慢爬取，然後再分析。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

使用python爬取獵聘網的職位信息

《Python進階》學習筆記

Leetcode 3161. 物塊放置查詢

一個docker容器暴露多個端口

leetcode 60 排列序列

微服務實踐之使用 Visual Studio 2022 調試Dapr 應用程序

wpf附加屬性理解 WPF附加屬性

solidity函數狀態可變性

智能合約：猜數字合約

django報錯信息 'AutoSchema' object has no attribute 'get_link的解決辦法

暢購商城第十六天

暢購商城第十五天

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結