python爬虫框架搭建中的问题
给位朋友你们好:我是一名努力学习python的小白,有一些搞不明白的问题还想请教各位:
自己在摸索着pytharm的安装与应用,在用到scrapy框架爬图的编译过程中出现了这样的问题 正是搞不明白是怎么一回事。
程序代码:
C:\Python27\python.exe C:/first/main.py 2016-10-09 23:19:48 [scrapy] INFO: Scrapy 1.2.0 started (bot: first) 2016-10-09 23:19:48 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'first.spiders', 'SPIDER_MODULES': ['first.spiders'], 'ROBOTSTXT_OBEY': True, 'USER_AGENT': 'Mozilla/5.0 (Windows NT 6.3; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0', 'BOT_NAME': 'first'} 2016-10-09 23:19:48 [scrapy] INFO: Enabled extensions: ['scrapy.extensions.logstats.LogStats', 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.corestats.CoreStats'] 2016-10-09 23:19:49 [scrapy] INFO: Enabled downloader middlewares: ['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware', 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', 'scrapy.downloadermiddlewares.chunked.ChunkedTransferMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats'] 2016-10-09 23:19:49 [scrapy] INFO: Enabled spider middlewares: ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] 2016-10-09 23:19:49 [scrapy] INFO: Enabled item pipelines: [] 2016-10-09 23:19:49 [scrapy] INFO: Spider opened 2016-10-09 23:19:49 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2016-10-09 23:19:49 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023 2016-10-09 23:19:49 [scrapy] ERROR: Error downloading <GET https:///robots.txt>: Empty domain Traceback (most recent call last): File "C:\Python27\Lib\site-packages\twisted\internet\defer.py", line 1105, in _inlineCallbacks result = result.throwExceptionIntoGenerator(g) File "C:\Python27\Lib\site-packages\twisted\python\failure.py", line 389, in throwExceptionIntoGenerator return g.throw(self.type, self.value, self.tb) File "C:\Python27\lib\site-packages\scrapy-1.2.0-py2.7.egg\scrapy\core\downloader\middleware.py", line 43, in process_request defer.returnValue((yield download_func(request=request,spider=spider))) File "C:\Python27\lib\site-packages\scrapy-1.2.0-py2.7.egg\scrapy\utils\defer.py", line 45, in mustbe_deferred result = f(*args, **kw) File "C:\Python27\lib\site-packages\scrapy-1.2.0-py2.7.egg\scrapy\core\downloader\handlers\__init__.py", line 65, in download_request return handler.download_request(request, spider) File "C:\Python27\lib\site-packages\scrapy-1.2.0-py2.7.egg\scrapy\core\downloader\handlers\http11.py", line 60, in download_request return agent.download_request(request) File "C:\Python27\lib\site-packages\scrapy-1.2.0-py2.7.egg\scrapy\core\downloader\handlers\http11.py", line 285, in download_request method, to_bytes(url, encoding='ascii'), headers, bodyproducer) File "C:\Python27\Lib\site-packages\twisted\web\client.py", line 1470, in request parsedURI.port) File "C:\Python27\Lib\site-packages\twisted\web\client.py", line 1450, in _getEndpoint tlsPolicy = self._policyForHTTPS.creatorForNetloc(host, port) File "C:\Python27\lib\site-packages\scrapy-1.2.0-py2.7.egg\scrapy\core\downloader\contextfactory.py", line 57, in creatorForNetloc return ScrapyClientTLSOptions(hostname.decode("ascii"), self.getContext()) File "C:\Python27\Lib\site-packages\twisted\internet\_sslverify.py", line 1059, in __init__ self._hostnameBytes = _idnaBytes(hostname) File "C:\Python27\Lib\site-packages\twisted\internet\_sslverify.py", line 86, in _idnaBytes return idna.encode(text).encode("ascii") File "C:\Python27\Lib\site-packages\idna\core.py", line 350, in encode raise IDNAError('Empty domain') IDNAError: Empty domain 2016-10-09 23:19:49 [scrapy] ERROR: Error downloading <GET https:///%20//%20www. (most recent call last): File "C:\Python27\Lib\site-packages\twisted\internet\defer.py", line 1105, in _inlineCallbacks result = result.throwExceptionIntoGenerator(g) File "C:\Python27\Lib\site-packages\twisted\python\failure.py", line 389, in throwExceptionIntoGenerator return g.throw(self.type, self.value, self.tb) File "C:\Python27\lib\site-packages\scrapy-1.2.0-py2.7.egg\scrapy\core\downloader\middleware.py", line 43, in process_request defer.returnValue((yield download_func(request=request,spider=spider))) File "C:\Python27\lib\site-packages\scrapy-1.2.0-py2.7.egg\scrapy\utils\defer.py", line 45, in mustbe_deferred result = f(*args, **kw) File "C:\Python27\lib\site-packages\scrapy-1.2.0-py2.7.egg\scrapy\core\downloader\handlers\__init__.py", line 65, in download_request return handler.download_request(request, spider) File "C:\Python27\lib\site-packages\scrapy-1.2.0-py2.7.egg\scrapy\core\downloader\handlers\http11.py", line 60, in download_request return agent.download_request(request) File "C:\Python27\lib\site-packages\scrapy-1.2.0-py2.7.egg\scrapy\core\downloader\handlers\http11.py", line 285, in download_request method, to_bytes(url, encoding='ascii'), headers, bodyproducer) File "C:\Python27\Lib\site-packages\twisted\web\client.py", line 1470, in request parsedURI.port) File "C:\Python27\Lib\site-packages\twisted\web\client.py", line 1450, in _getEndpoint tlsPolicy = self._policyForHTTPS.creatorForNetloc(host, port) File "C:\Python27\lib\site-packages\scrapy-1.2.0-py2.7.egg\scrapy\core\downloader\contextfactory.py", line 57, in creatorForNetloc return ScrapyClientTLSOptions(hostname.decode("ascii"), self.getContext()) File "C:\Python27\Lib\site-packages\twisted\internet\_sslverify.py", line 1059, in __init__ self._hostnameBytes = _idnaBytes(hostname) File "C:\Python27\Lib\site-packages\twisted\internet\_sslverify.py", line 86, in _idnaBytes return idna.encode(text).encode("ascii") File "C:\Python27\Lib\site-packages\idna\core.py", line 350, in encode raise IDNAError('Empty domain') IDNAError: Empty domain 2016-10-09 23:19:49 [scrapy] INFO: Closing spider (finished) 2016-10-09 23:19:49 [scrapy] INFO: Dumping Scrapy stats: {'downloader/exception_count': 2, 'downloader/exception_type_count/idna.core.IDNAError': 2, 'downloader/request_bytes': 539, 'downloader/request_count': 2, 'downloader/request_method_count/GET': 2, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2016, 10, 9, 15, 19, 49, 706000), 'log_count/DEBUG': 1, 'log_count/ERROR': 2, 'log_count/INFO': 7, 'scheduler/dequeued': 1, 'scheduler/dequeued/memory': 1, 'scheduler/enqueued': 1, 'scheduler/enqueued/memory': 1, 'start_time': datetime.datetime(2016, 10, 9, 15, 19, 49, 256000)} 2016-10-09 23:19:49 [scrapy] INFO: Spider closed (finished)在pycharm中出现这样的回复到底是什么问题呢?弄了两天了依旧没有弄明白,都快被这些弄疯了,求各位帮帮忙?
代码链接在http://www.