運行scrapy shell r’http://quotes.toscrape.com‘出現錯誤ValueError: invalid hostname: 'http

原創

2019-02-27 21:33

運行scrapy shell r’http://quotes.toscrape.com‘出現錯誤ValueError: invalid hostname: 'http
如果你也在學習python的scrapy框架時，在windows10下面運行cmd後，在命令行裏輸入

scrapy shell ‘http://quotes.toscrape.com/page/1‘

命令時報錯 ValueError: invalid hostname: ‘http
詳細的錯誤如下：

2019-02-27 16:34:13 [scrapy.utils.log] INFO: Scrapy 1.6.0 started (bot: tutorial)
2019-02-27 16:34:13 [scrapy.utils.log] INFO: Versions: lxml 3.7.2.0, libxml2 2.9.4, cssselect 1.0.3, parsel 1.5.1, w3lib 1.20.0, Twisted 18.9.0, Python 3.6.5 (v3.6.5:f59c0932b4, Mar 28 2018, 17:00:18) [MSC v.1900 64 bit (AMD64)], pyOpenSSL 19.0.0 (OpenSSL 1.1.1a  20 Nov 2018), cryptography 2.5, Platform Windows-10-10.0.17134-SP0
2019-02-27 16:34:13 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'tutorial', 'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter', 'LOGSTATS_INTERVAL': 0, 'NEWSPIDER_MODULE': 'tutorial.spiders', 'ROBOTSTXT_OBEY': True, 'SPIDER_MODULES': ['tutorial.spiders']}
2019-02-27 16:34:13 [scrapy.extensions.telnet] INFO: Telnet Password: 2e94bb235c11d72a
2019-02-27 16:34:13 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole']
2019-02-27 16:34:15 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2019-02-27 16:34:15 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2019-02-27 16:34:15 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2019-02-27 16:34:15 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2019-02-27 16:34:15 [scrapy.core.engine] INFO: Spider opened
2019-02-27 16:34:15 [scrapy.downloadermiddlewares.robotstxt] ERROR: Error downloading <GET http://'http/robots.txt>: invalid hostname: 'http
Traceback (most recent call last):
  File "f:\program files\lib\site-packages\scrapy\core\downloader\middleware.py", line 43, in process_request
    defer.returnValue((yield download_func(request=request,spider=spider)))
ValueError: invalid hostname: 'http
Traceback (most recent call last):
  File "f:\program files\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "f:\program files\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "F:\Program Files\Scripts\scrapy.exe\__main__.py", line 9, in <module>
  File "f:\program files\lib\site-packages\scrapy\cmdline.py", line 150, in execute
    _run_print_help(parser, _run_command, cmd, args, opts)
  File "f:\program files\lib\site-packages\scrapy\cmdline.py", line 90, in _run_print_help
    func(*a, **kw)
  File "f:\program files\lib\site-packages\scrapy\cmdline.py", line 157, in _run_command
    cmd.run(args, opts)
  File "f:\program files\lib\site-packages\scrapy\commands\shell.py", line 74, in run

    shell.start(url=url, redirect=not opts.no_redirect)
  File "f:\program files\lib\site-packages\scrapy\shell.py", line 48, in start
    self.fetch(url, spider, redirect=redirect)
  File "f:\program files\lib\site-packages\scrapy\shell.py", line 115, in fetch
    reactor, self._schedule, request, spider)
  File "f:\program files\lib\site-packages\twisted\internet\threads.py", line 122, in blockingCallFromThread
    result.raiseException()
  File "f:\program files\lib\site-packages\twisted\python\failure.py", line 467, in raiseException
    raise self.value.with_traceback(self.tb)
ValueError: invalid hostname: 'http

那麼只要一行將包裹網址的單引號改爲雙引號就ok！！！就是改爲：

scrapy shell "http://quotes.toscrape.com/page/1/"

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

運行scrapy shell r’http://quotes.toscrape.com‘出現錯誤ValueError: invalid hostname: 'http

公司剛入職了一名 Java 中級開發，短短 4 行代碼居然湊齊了 3 個 bug！我哭了~~

公衆號5月C#/.NET熱文一覽

git 下載大陸鏡像地址

ETL 之kettle 8下載

Pyspark ValueError: Cannot run multiple SparkContexts at once 解決之道

八斗十六期系列學習比記--The authenticity of host 'node2 (xxx.xxx.xxx.xxx)' can't be established.

windows 和 Linux 添加環境變量

Vscode python debug過程中Terminal 終端路徑的設置

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結