dataframe日期按周、按月、按季度聚合

dataframe中的每一列都是一個Series對象,這個Series對象的index默認是從0開始,步長爲1進行遞增。

如果你的dataframe中有一列是日期,對於其他列字段需要按照日期字段進行聚合的話,需要將待聚合列的Series對象index屬性修改爲該日期字段。

關鍵是這個操作:

offline_ratio = custs.offline_ratio
offline_ratio.index=custs['day_tm'].apply(lambda x:datetime.strptime(x,'%Y-%m-%d'))
offline_ratio_month = offline_ratio.resample('m').mean()
  1. 修改Series對象的index屬性爲日期
  2. resample函數可以通過調整參數,實現按月(m),按周(w),按季度(q)等聚合
import pandas as pd
import matplotlib.pyplot as plt
from impala.dbapi import connect
from impala.util import as_pandas
from datetime import datetime

conn = connect(host='你服務器的ip',port=端口號,user='用戶名',password='密碼',auth_mechanism='PLAIN')
curs = conn.cursor()

curs.execute("""
                select substring(created_time,1,10) as day_tm,
                       sum(case when date_from in (0,3) 
                           then 1 
                           else 0 
                           end) as offline_custs,
                       sum(case when date_from in (0,3) 
                                     and so_no is not null
                                     and substring(created_time,1,10)= substring(so_date,1,10)
                           then 1 
                           else 0 
                           end) as offline_orders,
                       sum(case when date_from not in (0,3) 
                           then 1 
                           else 0 
                           end) as online_custs,
                       sum(case when date_from not in (0,3) 
                                     and so_no is not null
                                     and substring(created_time,1,10)= substring(so_date,1,10)
                           then 1 
                           else 0 
                           end) as online_orders
                from 你的表名 c
                where substring(created_time,1,4) >= '2019'                     
                group by substring(created_time,1,10)
                order by substring(created_time,1,10) asc
             """)
custs_to_orders = as_pandas(curs)

custs_to_orders['offline_ratio'] = custs_to_orders['offline_orders']/custs_to_orders['offline_custs']
custs_to_orders['online_ratio'] = custs_to_orders['online_orders']/custs_to_orders['online_custs']

offline_ratio = custs_to_orders.offline_ratio
offline_ratio.index=custs_to_orders['day_tm'].apply(lambda x:datetime.strptime(x,'%Y-%m-%d'))
offline_ratio_month = offline_ratio.resample('m').mean()

day_tm
2019-01-31 0.301876
2019-02-28 0.183390
2019-03-31 0.178983
2019-04-30 0.183437
2019-05-31 0.202010
2019-06-30 0.242368
2019-07-31 0.177942
2019-08-31 0.173683
2019-09-30 0.179291
2019-10-31 0.186196
2019-11-30 0.183292
2019-12-31 0.221013
2020-01-31 0.256396
2020-02-29 0.260454
2020-03-31 0.162729
2020-04-30 0.117873
2020-05-31 0.103655
Freq: M, Name: offline_ratio, dtype: float64

關注微信公衆號:數據分析師手記
在這裏插入圖片描述

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章