小結 pandas 時間序列

目錄:
在這裏插入圖片描述---------------------------------------------

Timestamp

import pandas as pd

pd.Timestamp(2020, 4, 1)
pd.Timestamp(2020, 4, 1, 0, 0, 10)

from datetime import datetime
pd.Timestamp(datetime(2020, 4, 1))

Period

pd.Period('2020-4')  # default  'M'

pd.Period('2020-4', freq='D')

以時間爲元素的Series

dates = ['2020-4-1', '2020-4-2', '2020-4-3']
pd.to_datetime(dates)

seri = pd.Series(['2020-4-1', '2020-4-2'])
pd.to_datetime(seri)

df = pd.DataFrame({'year': [2020, 2021], 'month': [4, 5], 'day': [1, 2], 'hour': [10, 10]})
pd.to_datetime(df)

DatetimeIndex

dates = ['2020-4-1', '2020-4-2', '2020-4-3']
pd.DatetimeIndex(dates)

pd.date_range('2020-4-1', '2020-4-3', freq='D')  # freq='M' 月末

pd.bdate_range('2020-4-1', periods=100)  # 工作日

pd.period_range('2020-4-1', periods=100)

以時間爲索引的Series

import numpy as np

dates = [pd.Timestamp(2020, 4, 1), pd.Timestamp(2020, 4, 2), pd.Timestamp(2020, 4, 3)]
series = pd.Series(np.random.randn(len(dates)), dates)

dates = [pd.Period('2020-4'), pd.Period('2020-5'), pd.Period('2020-6')]
series = pd.Series(np.random.randn(len(dates)), dates)

prng = pd.period_range('2020Q1', '2022Q4', freq='Q-NOV')
ps = pd.DataFrame(np.random.randn(len(prng)), columns=['A'], index=prng)

時間索引對象處理

# 處理
ts = pd.Series(np.random.randn(100), pd.date_range('2020-4-1', periods=100, freq='D'))

# 查找
ts[:5]
ts[::2]
ts['2020-7-2']
ts[[1, 3, 5]]
ts['2020-4']  # 4月
ts.truncate(before='2020-4-1', after='2020-4-10')   # 切片
ts['2020-4-1' : '2020-4-10']

# 移動 shift
ts.shift(1)  # 數據向下移動1位
ts.shift(1, freq='D')  # 索引向上移動1位  -- 試試freq='W'

# 重採樣 resample
# 下采樣,增大時間間隔,減少記錄數量;減小時間顆粒度
ts.resample('W').sum()  # 周
ts.resample('M').sum()  # 月
ts.resample('W').mean() 
ts.resample('W').ohlc()  # 對所有未被採樣值進行統計

# 上採樣,減小時間間隔頻率,增加記錄數量; 增大時間顆粒度
ts.resample('12H').asfreq()
ts.resample('12H').ffill()

時間計算

pandas 內的時間類

常用於時間的索引位移。
在這裏插入圖片描述

from pandas.tseries.offsets import DateOffset
d = pd.Timestamp(2020, 4, 1, 0, 0, 10)

d
Out[49]: Timestamp('2020-04-01 00:00:10')

d + DateOffset()
Out[50]: Timestamp('2020-04-02 00:00:10')

d + DateOffset(months=1, days=1)
Out[53]: Timestamp('2020-05-02 00:00:10')


from pandas.tseries.offsets import BDay
d + BDay()
Out[52]: Timestamp('2020-04-02 00:00:10')

d + 10 * BDay()
Out[54]: Timestamp('2020-04-15 00:00:10')

d + BMonthEnd() * 2
Out[57]: Timestamp('2020-05-29 00:00:10')


from pandas.tseries.offsets import BYearEnd
d + BYearEnd()
Out[66]: Timestamp('2020-12-31 00:00:10')

d + BYearEnd() * 2
Out[67]: Timestamp('2021-12-31 00:00:10')

d + BYearEnd(month=1)
Out[71]: Timestamp('2021-01-29 00:00:10')



from pandas.tseries.offsets import Week
d - Week()
Out[73]: Timestamp('2020-03-25 00:00:10')

d - Week(weekday=3)  # 移動到上週四
Out[76]: Timestamp('2020-03-26 00:00:10')


from pandas.tseries.offsets import Minute
d + Minute(10)
Out[104]: Timestamp('2020-04-01 00:10:10')

常用時間頻率參數

參數名 說明
B 工作日頻率
C 定製工作日頻率
D 日曆日頻率
W 每週頻率
M 月結束頻率
SM 半月結束頻率(15 個月和月末)
BM 業務月末頻率
CBM 定製業務月末頻率
MS 月起始頻率
sMs 半月起始頻率(第 1 和 15)
BMS 業務月開始頻率
CBMS 定製商業月份開始頻率
Q 四分頻結束頻率
BQ 業務四分之一頻率
QS 四分頻啓動頻率
BQS 業務季開始頻率
A 年結束頻率
BA 業務年結束頻率
AS 年起始頻率
BAS 業務年開始頻率
BH 工作時間頻率
H 每小時頻率
T, min 分鐘頻率
S 次頻
L, ms 毫秒
U, uS 微秒
N 納秒


pd.date_range('2020-4-1', periods=10, freq='B')
Out[106]: 
DatetimeIndex(['2020-04-01', '2020-04-02', '2020-04-03', '2020-04-06',
               '2020-04-07', '2020-04-08', '2020-04-09', '2020-04-10',
               '2020-04-13', '2020-04-14'],
              dtype='datetime64[ns]', freq='B')

pd.date_range('2020-4-1', periods=10, freq='D')
Out[108]: 
DatetimeIndex(['2020-04-01', '2020-04-02', '2020-04-03', '2020-04-04',
               '2020-04-05', '2020-04-06', '2020-04-07', '2020-04-08',
               '2020-04-09', '2020-04-10'],
              dtype='datetime64[ns]', freq='D')

pd.date_range('2020-4-1', periods=10, freq='W')
Out[109]: 
DatetimeIndex(['2020-04-05', '2020-04-12', '2020-04-19', '2020-04-26',
               '2020-05-03', '2020-05-10', '2020-05-17', '2020-05-24',
               '2020-05-31', '2020-06-07'],
              dtype='datetime64[ns]', freq='W-SUN')

pd.date_range('2020-4-1', periods=10, freq='M')
Out[110]: 
DatetimeIndex(['2020-04-30', '2020-05-31', '2020-06-30', '2020-07-31',
               '2020-08-31', '2020-09-30', '2020-10-31', '2020-11-30',
               '2020-12-31', '2021-01-31'],
              dtype='datetime64[ns]', freq='M')

pd.date_range('2020-4-1', periods=10, freq='SM')
Out[111]: 
DatetimeIndex(['2020-04-15', '2020-04-30', '2020-05-15', '2020-05-31',
               '2020-06-15', '2020-06-30', '2020-07-15', '2020-07-31',
               '2020-08-15', '2020-08-31'],
              dtype='datetime64[ns]', freq='SM-15')

pd.date_range('2020-4-1', periods=10, freq='BM')
Out[112]: 
DatetimeIndex(['2020-04-30', '2020-05-29', '2020-06-30', '2020-07-31',
               '2020-08-31', '2020-09-30', '2020-10-30', '2020-11-30',
               '2020-12-31', '2021-01-29'],
              dtype='datetime64[ns]', freq='BM')

pd.date_range('2020-4-1', periods=10, freq='MS')
Out[113]: 
DatetimeIndex(['2020-04-01', '2020-05-01', '2020-06-01', '2020-07-01',
               '2020-08-01', '2020-09-01', '2020-10-01', '2020-11-01',
               '2020-12-01', '2021-01-01'],
              dtype='datetime64[ns]', freq='MS')

pd.date_range('2020-4-1', periods=10, freq='Q')
Out[114]: 
DatetimeIndex(['2020-06-30', '2020-09-30', '2020-12-31', '2021-03-31',
               '2021-06-30', '2021-09-30', '2021-12-31', '2022-03-31',
               '2022-06-30', '2022-09-30'],
              dtype='datetime64[ns]', freq='Q-DEC')

pd.date_range('2020-4-1', periods=10, freq='QS')
Out[115]: 
DatetimeIndex(['2020-04-01', '2020-07-01', '2020-10-01', '2021-01-01',
               '2021-04-01', '2021-07-01', '2021-10-01', '2022-01-01',
               '2022-04-01', '2022-07-01'],
              dtype='datetime64[ns]', freq='QS-JAN')

pd.date_range('2020-4-1', periods=10, freq='BQ')
Out[116]: 
DatetimeIndex(['2020-06-30', '2020-09-30', '2020-12-31', '2021-03-31',
               '2021-06-30', '2021-09-30', '2021-12-31', '2022-03-31',
               '2022-06-30', '2022-09-30'],
              dtype='datetime64[ns]', freq='BQ-DEC')

pd.date_range('2020-4-1', periods=10, freq='BH')
Out[117]: 
DatetimeIndex(['2020-04-01 09:00:00', '2020-04-01 10:00:00',
               '2020-04-01 11:00:00', '2020-04-01 12:00:00',
               '2020-04-01 13:00:00', '2020-04-01 14:00:00',
               '2020-04-01 15:00:00', '2020-04-01 16:00:00',
               '2020-04-02 09:00:00', '2020-04-02 10:00:00'],
              dtype='datetime64[ns]', freq='BH')

pd.date_range('2020-4-1', periods=10, freq='T')
Out[118]: 
DatetimeIndex(['2020-04-01 00:00:00', '2020-04-01 00:01:00',
               '2020-04-01 00:02:00', '2020-04-01 00:03:00',
               '2020-04-01 00:04:00', '2020-04-01 00:05:00',
               '2020-04-01 00:06:00', '2020-04-01 00:07:00',
               '2020-04-01 00:08:00', '2020-04-01 00:09:00'],
              dtype='datetime64[ns]', freq='T')

pd.date_range('2020-4-1', periods=10, freq='L')
Out[120]: 
DatetimeIndex([       '2020-04-01 00:00:00', '2020-04-01 00:00:00.001000',
               '2020-04-01 00:00:00.002000', '2020-04-01 00:00:00.003000',
               '2020-04-01 00:00:00.004000', '2020-04-01 00:00:00.005000',
               '2020-04-01 00:00:00.006000', '2020-04-01 00:00:00.007000',
               '2020-04-01 00:00:00.008000', '2020-04-01 00:00:00.009000'],
              dtype='datetime64[ns]', freq='L')

pd.date_range('2020-4-1', periods=10, freq='S')
Out[121]: 
DatetimeIndex(['2020-04-01 00:00:00', '2020-04-01 00:00:01',
               '2020-04-01 00:00:02', '2020-04-01 00:00:03',
               '2020-04-01 00:00:04', '2020-04-01 00:00:05',
               '2020-04-01 00:00:06', '2020-04-01 00:00:07',
               '2020-04-01 00:00:08', '2020-04-01 00:00:09'],
              dtype='datetime64[ns]', freq='S')

pd.date_range('2020-4-1', periods=10, freq='N')
Out[122]: 
DatetimeIndex([          '2020-04-01 00:00:00',
               '2020-04-01 00:00:00.000000001',
               '2020-04-01 00:00:00.000000002',
               '2020-04-01 00:00:00.000000003',
               '2020-04-01 00:00:00.000000004',
               '2020-04-01 00:00:00.000000005',
               '2020-04-01 00:00:00.000000006',
               '2020-04-01 00:00:00.000000007',
               '2020-04-01 00:00:00.000000008',
               '2020-04-01 00:00:00.000000009'],
              dtype='datetime64[ns]', freq='N')

pd.date_range('2020-4-1', periods=10, freq='1D1H10T10U')
Out[125]: 
DatetimeIndex([       '2020-04-01 00:00:00', '2020-04-02 01:10:00.000010',
               '2020-04-03 02:20:00.000020', '2020-04-04 03:30:00.000030',
               '2020-04-05 04:40:00.000040', '2020-04-06 05:50:00.000050',
               '2020-04-07 07:00:00.000060', '2020-04-08 08:10:00.000070',
               '2020-04-09 09:20:00.000080', '2020-04-10 10:30:00.000090'],
              dtype='datetime64[ns]', freq='90600000010U')

指定後綴默認以改變默認間隔點

在這裏插入圖片描述

pd.date_range('2020-4-1', periods=10, freq='W-Wed')
Out[126]: 
DatetimeIndex(['2020-04-01', '2020-04-08', '2020-04-15', '2020-04-22',
               '2020-04-29', '2020-05-06', '2020-05-13', '2020-05-20',
               '2020-05-27', '2020-06-03'],
              dtype='datetime64[ns]', freq='W-WED')

採樣聚合

ts.resample('M').sum()
Out[129]: 
2020-04-30   -0.247197
2020-05-31    1.055703
2020-06-30   -0.221805
2020-07-31    1.433503
Freq: M, dtype: float64

ts.resample('M').agg([np.sum, np.mean])
Out[132]: 
                 sum      mean
2020-04-30 -0.247197 -0.008240
2020-05-31  1.055703  0.034055
2020-06-30 -0.221805 -0.007394
2020-07-31  1.433503  0.159278

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章