今天在用爬蟲時gevent報了AssertionError: Impossible to call blocking function in the event loop callback
異常,很奇怪,難道是patch_socket惹的貨,因爲之前沒有使用patch_socket是正常的,代碼簡化如下
import urllib
import gevent
from gevent.monkey import patch_socket
from gevent.hub import get_hub
def f():
r = urllib.urlopen("http://www.baidu.com/").read()
print r[:10]
def timer(after, repeat, f):
t = get_hub().loop.timer(after, repeat)
t.start(f)
return t
def run():
patch_socket()
timer(1, 5, f)
gevent.sleep(100)
run()
這段代碼就是每5秒調用一次f,f也就是很簡單的打壓百度首頁前10個字符,各位看官在揭開答案請先想想爲什麼爲這樣?
我把異常棧也貼在下面,有助有分析
File "C:\Python27\lib\httplib.py", line 772, in connect
self.timeout, self.source_address)
File "C:\Python27\lib\site-packages\gevent\socket.py", line 570, in create_connection
for res in getaddrinfo(host, port, 0 if has_ipv6 else AF_INET, SOCK_STREAM):
File "C:\Python27\lib\site-packages\gevent\socket.py", line 621, in getaddrinfo
return get_hub().resolver.getaddrinfo(host, port, family, socktype, proto, flags)
File "C:\Python27\lib\site-packages\gevent\resolver_thread.py", line 34, in getaddrinfo
return self.pool.apply_e(self.expected_errors, _socket.getaddrinfo, args, kwargs)
File "C:\Python27\lib\site-packages\gevent\threadpool.py", line 222, in apply_e
success, result = self.spawn(wrap_errors, expected_errors, function, args, kwargs).get()
File "C:\Python27\lib\site-packages\gevent\event.py", line 226, in get
result = self.hub.switch()
File "C:\Python27\lib\site-packages\gevent\hub.py", line 330, in switch
switch_out()
File "C:\Python27\lib\site-packages\gevent\hub.py", line 334, in switch_out
raise AssertionError('Impossible to call blocking function in the event loop callback')
AssertionError: Impossible to call blocking function in the event loop callback
<timer at 0x2652ed0 callback=<function f at 0x026B0070> args=()> failed with AssertionError
看異常棧是調用hub的switch_out出的問題,
def switch(self):
switch_out = getattr(getcurrent(), 'switch_out', None)
if switch_out is not None:
switch_out()
return greenlet.switch(self)
def switch_out(self):
raise AssertionError('Impossible to call blocking function in the event loop callback')
以前文章提過,gevent提供了switch_out方法用於當前greenlet換出時調用,咦,可爲什麼調用的hub的
switch_out?按理說應該調用其它greenlet的switch_out,怪不得有問題,hub都被換出了,誰去做調度呢?
問題就出在這裏?你有沒有發現,在上面的代碼中只有hub,壓根沒有其它的greenlet。
我們走一遍代碼邏輯,首先給系統註冊一定時器f,當調用f時由於socket阻塞,所以會切換到hub,此時會調用之前greenlet的switch_out方法,可不幸的是之前的greenlet就是hub,所以出問題了。
知道了問題所在就好解決了,也就是用一個greenlet包裝一下f,代碼如下:
import urllib
import gevent
from gevent.monkey import patch_socket
from gevent.hub import get_hub
def patch_greenlet(f):
def inner(*args, **kwargs):
return gevent.spawn(f, *args, **kwargs)
return inner
@patch_greenlet
def f():
r = urllib.urlopen("http://www.baidu.com/").read()
print r[:10]
def timer(after, repeat, f):
t = get_hub().loop.timer(after, repeat)
t.start(f)
return t
def run():
patch_socket()
timer(1, 0, f)
gevent.sleep(100)
run()
不得不說使用gevent會碰到很多問題,這也許就是協成讓人癡迷的一個原因吧,享受"找虐"的興趣,越享受,越能駕馭它。