使用Docker構建scrapy自動化數據爬取環境

1、scrapy 數據爬取框架

安裝Anaconda環境

安裝依賴庫

構建項目

編寫實現代碼

2、scrapy-redis 分佈式爬取

安裝依賴

修改配置文件

多開終端模擬分佈式爬取

3、編寫Dockerfile

# Set Base OS
FROM centos:7.6.1810
# Set Creater
MAINTAINER xxx <[email protected]>

# Install Basic Dependencies
RUN yum install -y initscripts
RUN yum install -y crontabs
RUN yum install -y wget
RUN yum install -y bzip2

# Download Anaconda Python
RUN wget --quiet https://repo.anaconda.com/archive/Anaconda3-2019.03-Linux-x86_64.sh -O ~/anaconda.sh
# Install Anaconda Python
RUN /bin/bash ~/anaconda.sh -b -p /opt/conda
# Remove Anaconda SH File
RUN rm ~/anaconda.sh
# Add Env
RUN echo "export PATH=/opt/conda/bin:$PATH" >> ~/.bashrc

#安裝爬蟲依賴
RUN /opt/conda/bin/conda install -y scrapy
RUN /opt/conda/bin/pip install scrapy-redis
RUN /opt/conda/bin/conda install -y pymongo

# Set timezone
RUN ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime

# Set crontab task
COPY crontab var/spool/cron/root

# Set locale
ENV LANG C.UTF-8 LC_ALL=C.UTF-8

4、構建容器(未做版本管理)

#!/bin/bash
result=$(docker ps | grep "spider_car")
if [[ "$result" != "" ]]
then
echo "stop spider_car"
docker stop spider_car
fi
result1=$(docker ps -a | grep "spider_car")
if [[ "$result1" != "" ]]
then
echo "rm spider_car"
docker rm spider_car
fi
result2=$(docker images | grep "spider_car")
if [[ "$result2" != "" ]]
then
echo "spider_car"
docker rmi spider_car
fi

docker build -t spider_car .
# 採用目錄掛載的方式,這樣不不需要每次都重新構建鏡像
docker run -dit --privileged --name spider_car -v /root/spider/spider_py:/opt/spider_py spider_car /usr/sbin/init

5、Jenkins自動化構建

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章