V2EX = way to explore
V2EX 是一个关于分享和探索的地方
现在注册
已注册用户请  登录
V2EX 提问指南
leilux
V2EX  ›  问与答

SAE 上抓取 https 资源出错,调了两天都没调出来,请大神帮忙看看 。

  •  2
     
  •   leilux · 2015-11-03 12:52:27 +08:00 · 2219 次点击
    这是一个创建于 3307 天前的主题,其中的信息可能已经有所发展或是发生改变。

    调了两天都没调出来,发到这里请大神帮忙看看可能是什么问题?

    描述:在 SAE 上使用 tornado.simple_httpclient.SimpleAsyncHTTPClient 来抓取 https 网页,本地测试是没问题的

    错误重现 URLhttp://droprest.sinaapp.com/article?url=https%3A%2F%2Fpress.taobao.com%2Fdetail.html%3Fspm%3Da21bo.7724922.8439-0.1.K2HoLf%26postId%3D1723845&next=true

    环境为 python2.7.9 , tornado 为 2.1.1

    核心代码

    from tornado import httpclient
    from tornado import httpserver
    from tornado.ioloop import IOLoop
    from tornado import web
    
    class Application(web.Application):
        def __init__(self, handlers=[], **kwargs):
            handlers.extend([
                (r"/article", Handler),
            ])
    
            settings = dict({
                'template_path': os.path.join(os.path.dirname(__file__), 'templates'),
                "debug": False,
            }, **kwargs)
    
            super(Application, self).__init__(handlers, **settings)
    
    
    class Handler(web.RequestHandler):
        @web.asynchronous
        def get(self):
            self.url = self.get_argument('url', u'')
    
            headers = {
                'Accept-Encoding':'gzip',
                'Accept-Language': 'zh-CN,zh;q=0.8',
                "Accept-Charset": "UTF-8,*;q=0.5",
                "User-Agent": "Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.52 Safari/537.17",
                "Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
            }
            # asynchronously fetch web page
            httpclient.AsyncHTTPClient(max_clients=20).fetch(
                httpclient.HTTPRequest(
                    method='GET',
                    url=self.url,
                    headers=headers,
                    follow_redirects=True),
                self.on_fetch,
            )
    
    
        def on_fetch(self, response):
            response.rethrow()
    
            content_type = response.headers.get('Content-Type')
            if 'text/html' not in content_type and 'application/xhtml' not in content_type:
                raise TypeError('not html or xhtml file')
    
            html = response.body
    
            # get content
            content = {u'content': html, 'url': self.url}
    
            self.finish()
    
    
    if __name__ == '__main__':
        from tornado.options import parse_command_line
        parse_command_line()
        application = Application(**{'debug':True})
    
        logging.info('Server running on http://localhost:8080')
        http_server = httpserver.HTTPServer(application)
        http_server.listen(8080)
        IOLoop.instance().start()
    

    详细信息:

    - [2015/10/29 18:52:12] - ERROR:root:Exception in I/O handler for fd 10
    Traceback (most recent call last):
      File "/usr/local/sae/python/lib/python2.7/site-packages/tornado/ioloop.py", line 309, in start
        self._handlers[fd](fd, events)
      File "/usr/local/sae/python/lib/python2.7/site-packages/tornado/iostream.py", line 270, in _handle_events
        self._handle_write()
      File "/usr/local/sae/python/lib/python2.7/site-packages/tornado/iostream.py", line 614, in _handle_write
        self._do_ssl_handshake()
      File "/usr/local/sae/python/lib/python2.7/site-packages/tornado/iostream.py", line 584, in _do_ssl_handshake
        self.socket.do_handshake()
      File "/usr/local/sae/python/lib/python2.7/ssl.py", line 788, in do_handshake
        self._sslobj.do_handshake()
    SSLError: socket write not completed (_ssl.c:562) yq34 
    
    
    - [2015/10/29 18:52:12] - ERROR:root:Uncaught exception, closing connection.
    Traceback (most recent call last):
      File "/usr/local/sae/python/lib/python2.7/site-packages/tornado/iostream.py", line 270, in _handle_events
        self._handle_write()
      File "/usr/local/sae/python/lib/python2.7/site-packages/tornado/iostream.py", line 614, in _handle_write
        self._do_ssl_handshake()
      File "/usr/local/sae/python/lib/python2.7/site-packages/tornado/iostream.py", line 584, in _do_ssl_handshake
        self.socket.do_handshake()
      File "/usr/local/sae/python/lib/python2.7/ssl.py", line 788, in do_handshake
        self._sslobj.do_handshake()
    SSLError: socket write not completed (_ssl.c:562) yq34
    
    2 条回复    2015-11-03 13:27:25 +08:00
    yinxingren
        1
    yinxingren  
       2015-11-03 13:00:53 +08:00 via iPhone   ❤️ 1
    ip 被淘宝封了吧
    leilux
        2
    leilux  
    OP
       2015-11-03 13:27:25 +08:00
    关于   ·   帮助文档   ·   博客   ·   API   ·   FAQ   ·   实用小工具   ·   2581 人在线   最高记录 6679   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 23ms · UTC 04:48 · PVG 12:48 · LAX 20:48 · JFK 23:48
    Developed with CodeLauncher
    ♥ Do have faith in what you're doing.