Pyspider & Pycurl
pyspider是一個大名鼎鼎的爬蟲框架,在安裝過程中,碰到了pycurl的相關問題,現在記錄如下,方便後續參考。
說明: pyspider在windows 7下的安裝可以正常,但是運行過程中,會報出不同的錯誤信息,不建議在windows上運行。
環境說明
Centos 7, Python 3.6.5
分析過程
pip install pyspider
碰到的錯誤信息如下:Looking in indexes: http://mirrors.aliyun.com/pypi/simple/ Collecting pycurl Downloading http://mirrors.aliyun.com/pypi/packages/e8/e4/0dbb8735407189f00b33d84122b9be52c790c7c3b25286826f4e1bdb7bde/pycurl-7.43.0.2.tar.gz (214kB) 100% |████████████████████████████████| 215kB 10.5MB/s Complete output from command python setup.py egg_info: Traceback (most recent call last): File "/tmp/pip-install-ramaxa44/pycurl/setup.py", line 223, in configure_unix stdout=subprocess.PIPE, stderr=subprocess.PIPE) File "/root/.pyenv/versions/3.6.5/lib/python3.6/subprocess.py", line 709, in __init__ restore_signals, start_new_session) File "/root/.pyenv/versions/3.6.5/lib/python3.6/subprocess.py", line 1344, in _execute_child raise child_exception_type(errno_num, err_msg, err_filename) FileNotFoundError: [Errno 2] No such file or directory: 'curl-config': 'curl-config' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "<string>", line 1, in <module> File "/tmp/pip-install-ramaxa44/pycurl/setup.py", line 913, in <module> ext = get_extension(sys.argv, split_extension_source=split_extension_source) File "/tmp/pip-install-ramaxa44/pycurl/setup.py", line 582, in get_extension ext_config = ExtensionConfiguration(argv) File "/tmp/pip-install-ramaxa44/pycurl/setup.py", line 99, in __init__ self.configure() File "/tmp/pip-install-ramaxa44/pycurl/setup.py", line 227, in configure_unix raise ConfigurationError(msg) __main__.ConfigurationError: Could not run curl-config: [Errno 2] No such file or directory: 'curl-config': 'curl-config' ---------------------------------------- Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-ramaxa44/pycurl/
經過分析,發現其爲pycurl的錯誤信息。
於是直接安裝pycurl:pip install pycurl
得到同樣的錯誤信息,確定是由於安裝pycurl出現的問題。
2. 安裝libcurl-devel
在網絡上搜索之後,發現了需要安裝如下類庫:
yum install libcurl-devel
安裝完成之後,重新嘗試安裝pycurl。原有的問題,已經解決,但是新的問題出現了。
3. 新的問題 openssl
新的錯誤信息如下:
Looking in indexes: http://mirrors.aliyun.com/pypi/simple/
Collecting pycurl
Downloading http://mirrors.aliyun.com/pypi/packages/e8/e4/0dbb8735407189f00b33d84122b9be52c790c7c3b25286826f4e1bdb7bde/pycurl-7.43.0.2.tar.gz (214kB)
100% |████████████████████████████████| 215kB 6.4MB/s
Complete output from command python setup.py egg_info:
Using curl-config (libcurl 7.29.0)
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-install-apf84sq1/pycurl/setup.py", line 913, in <module>
ext = get_extension(sys.argv, split_extension_source=split_extension_source)
File "/tmp/pip-install-apf84sq1/pycurl/setup.py", line 582, in get_extension
ext_config = ExtensionConfiguration(argv)
File "/tmp/pip-install-apf84sq1/pycurl/setup.py", line 99, in __init__
self.configure()
File "/tmp/pip-install-apf84sq1/pycurl/setup.py", line 316, in configure_unix
specify the SSL backend manually.''')
__main__.ConfigurationError: Curl is configured to use SSL, but we have not been able to determine which SSL backend it is using. Please see PycURL documentation for how to specify the SSL backend manually.
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-apf84sq1/pycurl/
從字面來分析,是由於openssl的配置問題,但是該如何來解決呢?
4. 解決ssl的問題
嘗試進行了各種嘗試,思路是按照libcurl、pycurl和openssl的配置問題。於是找到了如下的方式:
yum install libcurl-openssl-devel # 無效的方案
最終的方案如下:
pip uninstall pycurl
將export PYCURL_SSL_LIBRARY=openssl寫入~/.bashrc
pip install pycurl
執行上述方案之後,發現問題依然,最終發現,是由於沒有使用source命令執行bashrc。
於是執行如下命令:
source ~/.bashrc
問題完美解決。
5.ubuntu的方案
sudo apt-get install libssl-dev libcurl4-openssl-dev python-dev
最後的最後
pip install pyspider
這個最終的目標也被正確安裝了…….
總結
問題總是在尋找中,不停地被解決和碰到的,:-)