開發背景:一直以來,Python受到全局解析器的影響,多線程的性能較低;在Python3.5之後,Python asyncio正式發佈,利用多協程對Pythony語法在並行條件下進行補充。
線程定義:比如你需要完成一個任務,即從1加到100的和,這個任務可以看成是一個進程;然後你請了10個小學生,進行分批計算,每個小學生算10組數字。現在,每個小學生的任務,可以看成是一個線程,並且他們之間算數是互不影響的。
協程定義:協程不是進程或線程,其執行過程更類似於子例程,或者說不帶返回值的函數調用-----來自於百度百科。
區別:線程的並行是在操作系統層面上完成,協程的並行是基於用戶層面上完成。舉例:要完成100個網頁的訪問,從線程的層面上理解,這個完成任務,每個線程可以負責10個,最後將結果整合;從協程角度考慮,每個網頁訪問可以看做是一個子例程,進行訪問和結果的留存。
凡是並行,都涉及到“切換”、“中間結果緩存”、“暫停環節”。
以上是個人的理解,Python asyncio走的也是這套機制,我們可以看一個多協程抓取api數據的例子:每次循環訪問可以看成是一個協程。
PS:給大家安利一個Python 的調試組件,對Python的初學者非常友好- -better_exceptions,熟練的還是建議看Log!具體用法可以參考網上教程,個人覺得還是不錯的。
# install
pip install better_exceptions
# set the BETTER_EXCEPTIONS environment variable to any value
export BETTER_EXCEPTIONS=1 # Linux / OSX
setx BETTER_EXCEPTIONS 1 # Windows
import requests
import urllib
import json
import pandas as pd
import asyncio
import nest_asyncio
nest_asyncio.apply()
import math
source_data = pd.read_csv("/Users/hzp/Desktop/City_poi_concat.txt",sep=" ")
async def download(i):
province_list = []
urban_list = []
alias = []
keyword_list = []
poi_id = []
poi_address = []
poi_location = []
poi_name = []
poi_alias_name = []
poi_type = []
poi_typecode = []
business_area = []
poi_adname = []
poi_adcode = []
print("Out Layer Loop is: " + str(i))
city = source_data['City'][i]
subclass = source_data['Subclass'][i]
parameter = {
'types':subclass,
'city':city,
'output':'Json',
'offset':30,
'page':'1',
'key':'f00fffsfsfsffsd',
'extensions':'all',
'citylimit':'true',
'children':1
}
#python3.x urllib.parse.urlencode
url = 'https://restapi.amap.com/v3/place/text?' + urllib.parse.urlencode(parameter)
try:
response = requests.request("GET", url, timeout=3)
result = json.loads(response.text)
for j in range(0,len(result['pois'])):
print("Inner Loop is: " + str(j))
province_list.append(source_data['Province'][i])
urban_list.append(source_data['City'][i])
alias.append(source_data['Alias'][i])
keyword_list.append(subclass)
poi_id.append(result['pois'][j]['id'])
poi_name.append(result['pois'][j]['name'])
poi_address.append(result['pois'][j]['address'])
poi_location.append(result['pois'][j]['location'])
poi_adname.append(result['pois'][j]['adname'])
poi_adcode.append(result['pois'][j]['adcode'])
if len(result['pois'][j]['business_area']) > 0:
business_area.append(result['pois'][j]['business_area'])
else:
business_area.append(pd.NaT)
if len(result['pois'][j]['alias']) > 0:
poi_alias_name.append(result['pois'][j]['alias'])
else:
poi_alias_name.append(pd.NaT)
poi_type.append(result['pois'][j]['type'])
poi_typecode.append(result['pois'][j]['typecode'])
except:
pass
for k in range(2,(math.ceil(int(result['count']) / 30) +1)):
parameter = {
'types':subclass,
'city':city,
'output':'Json',
'offset':30,
'page':str(k),
'key':'f00fffsfsfsffsd',
'extensions':'all',
'citylimit':'true',
'children':1
}
#python3.x urllib.parse.urlencode
url = 'https://restapi.amap.com/v3/place/text?' + urllib.parse.urlencode(parameter)
try:
response = requests.request("GET", url, timeout=3)
result = json.loads(response.text)
for m in range(0,len(result['pois'])):
province_list.append(source_data['Province'][i])
urban_list.append(source_data['City'][i])
alias.append(source_data['Alias'][i])
keyword_list.append(subclass)
poi_id.append(result['pois'][m]['id'])
poi_name.append(result['pois'][m]['name'])
poi_address.append(result['pois'][m]['address'])
poi_location.append(result['pois'][m]['location'])
poi_adname.append(result['pois'][m]['adname'])
poi_adcode.append(result['pois'][m]['adcode'])
if len(result['pois'][m]['business_area']) > 0:
business_area.append(result['pois'][m]['business_area'])
else:
business_area.append(pd.NaT)
if len(result['pois'][m]['alias']) > 0:
poi_alias_name.append(result['pois'][m]['alias'])
else:
poi_alias_name.append(pd.NaT)
poi_type.append(result['pois'][m]['type'])
poi_typecode.append(result['pois'][m]['typecode'])
except:
pass
concat_result = {
'province_list':province_list,
'urban_list':urban_list,
'alias':alias,
'keyword_list':keyword_list,
'poi_id':poi_id,
'poi_address':poi_address,
'poi_location':poi_location,
'poi_name':poi_name,
'poi_alias_name':poi_alias_name,
'poi_type':poi_type,
'poi_typecode':poi_typecode,
'business_area':business_area,
'poi_adname':poi_adname,
'poi_adcode':poi_adcode
}
df_result = pd.DataFrame(concat_result)
def run():
for i in range(976,source_data.shape[0]):
loop.run_until_complete(download(i))
loop = asyncio.get_event_loop()
if __name__ == '__main__':
run()