Lubuntu14.04(Ubuntu)安裝爬蟲框架Scrapy

Scrapy,Python開發的一個快速,高層次的屏幕抓取和web抓取框架,用於抓取web站點並從頁面中提取結 Scrapy Pthyon爬蟲框架 logo[1]構化的數據。Scrapy用途廣泛,可以用於數據挖掘、監測和自動化測試。Scrapy吸引人的地方在於它是一個框架,任何人都可以根據需求方便的修改。它也提供了多種類型爬蟲的基類,如BaseSpider、sitemap爬蟲等,最新版本又提供了web2.0爬蟲的支持。
準備工作

Python 2.5, 2.6, 2.7 (3.x is not yet supported)(一般Linux都會默認安裝了Python2.7)

Twisted 2.5.0, 8.0 or above (Windows users: you’ll need to install Zope.Interface and maybe pywin32 because of this Twisted bug)

w3lib

lxml or libxml2 (if using libxml2, version 2.6.28 or above is highly recommended)

simplejson (not required if using Python 2.6 or above)

python-dev(很重要,否則在安裝pyopenssl時會提示找不到Python.h)

pyopenssl (for HTTPS support. Optional, but highly recommended)

---------------------------------------------
Twisted安裝過程

sudo apt-get install python-twisted python-libxml2 python-simplejson

安裝完成後進入python,測試Twisted是否安裝成功


python-dev安裝

apt-get install python-dev


pyOpenSSL安裝

wget http://pypi.python.org/packages/source/p/pyOpenSSL/pyOpenSSL-0.13.tar.gz#md5=767bca18a71178ca353dff9e10941929

tar -zxvf pyOpenSSL-0.13.tar.gz

cd pyOpenSSL-0.13

sudo python setup.py install


pycrypto安裝

wget http://pypi.python.org/packages/source/p/pycrypto/pycrypto-2.5.tar.gz#md5=783e45d4a1a309e03ab378b00f97b291

tar -zxvf pycrypto-2.5.tar.gz

cd pycrypto-2.5

sudo python setup.py install


測試是否安裝成功
$python
>>> import Crypto
>>> import twisted.conch.ssh.transport
>>> print Crypto.PublicKey.RSA
<module 'Crypto.PublicKey.RSA' from '/usr/python/lib/python2.5/site-packages/Crypto/PublicKey/RSA.pyc'>
>>> import OpenSSL 
>>> import twisted.internet.ssl
>>> twisted.internet.ssl

<module 'twisted.internet.ssl' from '/usr/python/lib/python2.5/site-packages/Twisted-10.1.0-py2.5-linux-i686.egg/twisted/internet/ssl.pyc'>

如果出現類似提示,說明pyOpenSSL模塊已經安裝成功了,否則,請檢查上面的安裝過程(OpenSSL需要pycrypto)。


w3lib安裝

首先安裝python setuptool

sudo python-setuptool

然後

sudo easy_install -U w3lib

Scrapy
wget http://pypi.python.org/packages/source/S/Scrapy/Scrapy-0.14.3.tar.gz#md5=59f1225f7692f28fa0f78db3d34b3850
tar -zxvf Scrapy-0.14.3.tar.gz
cd Scrapy-0.14.3
sudo python setup.py install


Scrapy安裝驗證

經過上面的安裝和配置過程,已經完成了Scrapy的安裝,我們可以通過如下命令行來驗證一下:

$ scrapy
Scrapy 0.14.3 - no active project


Usage:
  scrapy <command> [options] [args]


Available commands:
  fetch         Fetch a URL using the Scrapy downloader
  runspider     Run a self-contained spider (without creating a project)
  settings      Get settings values
  shell         Interactive scraping console
  startproject  Create new project
  version       Print Scrapy version
  view          Open URL in browser, as seen by Scrapy


Use "scrapy <command> -h" to see more info about a command

至此Linux下Scrapy已安裝成功了

開始學習爬蟲了

在windows下安裝一直不成功,鬱悶啊,每次安裝pyOpenSSL時都編譯失敗了,說缺少openssl/aes.h,網上找了很多方法都不行,安裝OpenSSL也一直編譯不成功,如果有人遇到同樣的問題,希望能一起交流下
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章