【Python】使用Docker构建scrapy自动化数据爬取环境

原創

丨勥烎灬

2020-06-01 16:51

1、scrapy 数据爬取框架

安装Anaconda环境

安装依赖库

构建项目

编写实现代码

2、scrapy-redis 分布式爬取

安装依赖

修改配置文件

多开终端模拟分布式爬取

3、编写Dockerfile

# Set Base OS
FROM centos:7.6.1810
# Set Creater
MAINTAINER xxx <[email protected]>

# Install Basic Dependencies
RUN yum install -y initscripts
RUN yum install -y crontabs
RUN yum install -y wget
RUN yum install -y bzip2

# Download Anaconda Python
RUN wget --quiet https://repo.anaconda.com/archive/Anaconda3-2019.03-Linux-x86_64.sh -O ~/anaconda.sh
# Install Anaconda Python
RUN /bin/bash ~/anaconda.sh -b -p /opt/conda
# Remove Anaconda SH File
RUN rm ~/anaconda.sh
# Add Env
RUN echo "export PATH=/opt/conda/bin:$PATH" >> ~/.bashrc

#安装爬虫依赖
RUN /opt/conda/bin/conda install -y scrapy
RUN /opt/conda/bin/pip install scrapy-redis
RUN /opt/conda/bin/conda install -y pymongo

# Set timezone
RUN ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime

# Set crontab task
COPY crontab var/spool/cron/root

# Set locale
ENV LANG C.UTF-8 LC_ALL=C.UTF-8

4、构建容器（未做版本管理）

#!/bin/bash
result=$(docker ps | grep "spider_car")
if [[ "$result" != "" ]]
then
echo "stop spider_car"
docker stop spider_car
fi
result1=$(docker ps -a | grep "spider_car")
if [[ "$result1" != "" ]]
then
echo "rm spider_car"
docker rm spider_car
fi
result2=$(docker images | grep "spider_car")
if [[ "$result2" != "" ]]
then
echo "spider_car"
docker rmi spider_car
fi

docker build -t spider_car .
# 采用目录挂载的方式，这样不不需要每次都重新构建镜像
docker run -dit --privileged --name spider_car -v /root/spider/spider_py:/opt/spider_py spider_car /usr/sbin/init

5、Jenkins自动化构建

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

【Python】使用Docker构建scrapy自动化数据爬取环境

如何使用 JS 判断用户是否处于活跃状态

通过HPA+CronHPA组合应对业务复杂弹性伸缩场景

【Android】畫板Demo

【Android】GesturePassword

【Python】使用Docker構建scrapy自動化數據爬取環境

使用Docker構建scrapy自動化數據爬取環境

GesturePassword

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結