Selenium是瀏覽器功能自動化測試工具,在終端界面下也能運行,所以可作爲爬蟲的 js 引擎。
Dockerfile:
FROM ubuntu:16.04
MAINTAINER tuweifeg "[email protected]"
RUN apt update; \
apt install -y bzip2 \
unzip \
vim \
wget \
libxss1 \
libappindicator1 \
xvfb \
libindicator7
# gdebi
RUN mkdir -p /home/ubuntu/project; \
mkdir -p /home/ubuntu/soft; \
cd /home/ubuntu/soft; \
wget https://www.slimjet.com/chrome/download-chrome.php?file=files%2F75.0.3770.80%2Fgoogle-chrome-stable_current_amd64.deb; \
apt install -y ./*google-chrome*.deb; \
rm *google-chrome*.deb; \
# gdebi *google-chrome*.deb; \
wget https://npm.taobao.org/mirrors/chromedriver/75.0.3770.90/chromedriver_linux64.zip; \
unzip chromedriver_linux64.zip; \
rm chromedriver_linux64.zip; \
wget https://repo.continuum.io/archive/Anaconda3-5.0.1-Linux-x86_64.sh; \
sh Anaconda3-5.0.1-Linux-x86_64.sh -b; \
rm Anaconda3-5.0.1-Linux-x86_64.sh; \
echo 'export PATH=/root/anaconda3/bin:$PATH' >> ~/.bashrc
RUN /root/anaconda3/bin/pip install -i https://pypi.tuna.tsinghua.edu.cn/simple jieba==0.39 \
pymysql==0.9.3 \
selenium==3.141.0
ENV LANG C.UTF-8
EXPOSE 8000:8000
EXPOSE 8001:8001
EXPOSE 8002:8002
EXPOSE 8003:8003
EXPOSE 8004:8004
WORKDIR /home/ubuntu/
關於 *.deb 的安裝,有三種辦法
1. dpkg -i [包名] , 容易出現依賴不存在導致安裝失敗。因此先試錯一次,然後用apt install -f 去修復依賴
dpkg -i [包名]
apt install -f
dpkg -i [包名]
2. gdebi [包名], 自動安裝依賴,但需要先安裝 gdebi
apt install gdebi
gdebi [包名]
3. apt install [包名], 僅支持 ubuntu16.04 及之後版本, 需要注意的是即使在當前路徑包名前依然要加上路徑 ./
apt install [包名]