巧用groupby解決Dataframe篩選分組效率慢問題

原代碼:

for name in list_valid_perfor_inventory:
    time_stamp = time.time()
    df_tmp1 = df_all_performance[df_all_performance['res_ins_id'] == name] ###170萬行,該語句大約需要2S
    if df_tmp1.empty:
        continue
    del df_tmp1['res_ins_id']
    print('choose time ')
    print(str(time.time() - time_stamp))
    time_stamp = time.time()
    df_tmp1.to_csv(path_or_buf=os.path.join(cs.max_avg_busy_dir, str(name) + '.csv'))
    print(str(time.time() - time_stamp))

優化後代碼:

groups = df_all_performance.groupby('res_ins_id')  ##先分組
for name in list_valid_perfor_inventory:
    time_stamp = time.time()
    df_tmp1 = groups.get_group(name) ##再取每組的值,返回dataframe
    if df_tmp1.empty:
        continue

    del df_tmp1['res_ins_id']
    df_tmp1.to_csv(path_or_buf=os.path.join(cs.max_avg_busy_dir, str(name) + '.csv'))



發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章