目錄:
---------------------------------------------
Timestamp
import pandas as pd
pd.Timestamp(2020, 4, 1)
pd.Timestamp(2020, 4, 1, 0, 0, 10)
from datetime import datetime
pd.Timestamp(datetime(2020, 4, 1))
Period
pd.Period('2020-4') # default 'M'
pd.Period('2020-4', freq='D')
以時間爲元素的Series
dates = ['2020-4-1', '2020-4-2', '2020-4-3']
pd.to_datetime(dates)
seri = pd.Series(['2020-4-1', '2020-4-2'])
pd.to_datetime(seri)
df = pd.DataFrame({'year': [2020, 2021], 'month': [4, 5], 'day': [1, 2], 'hour': [10, 10]})
pd.to_datetime(df)
DatetimeIndex
dates = ['2020-4-1', '2020-4-2', '2020-4-3']
pd.DatetimeIndex(dates)
pd.date_range('2020-4-1', '2020-4-3', freq='D') # freq='M' 月末
pd.bdate_range('2020-4-1', periods=100) # 工作日
pd.period_range('2020-4-1', periods=100)
以時間爲索引的Series
import numpy as np
dates = [pd.Timestamp(2020, 4, 1), pd.Timestamp(2020, 4, 2), pd.Timestamp(2020, 4, 3)]
series = pd.Series(np.random.randn(len(dates)), dates)
dates = [pd.Period('2020-4'), pd.Period('2020-5'), pd.Period('2020-6')]
series = pd.Series(np.random.randn(len(dates)), dates)
prng = pd.period_range('2020Q1', '2022Q4', freq='Q-NOV')
ps = pd.DataFrame(np.random.randn(len(prng)), columns=['A'], index=prng)
時間索引對象處理
# 處理
ts = pd.Series(np.random.randn(100), pd.date_range('2020-4-1', periods=100, freq='D'))
# 查找
ts[:5]
ts[::2]
ts['2020-7-2']
ts[[1, 3, 5]]
ts['2020-4'] # 4月
ts.truncate(before='2020-4-1', after='2020-4-10') # 切片
ts['2020-4-1' : '2020-4-10']
# 移動 shift
ts.shift(1) # 數據向下移動1位
ts.shift(1, freq='D') # 索引向上移動1位 -- 試試freq='W'
# 重採樣 resample
# 下采樣,增大時間間隔,減少記錄數量;減小時間顆粒度
ts.resample('W').sum() # 周
ts.resample('M').sum() # 月
ts.resample('W').mean()
ts.resample('W').ohlc() # 對所有未被採樣值進行統計
# 上採樣,減小時間間隔頻率,增加記錄數量; 增大時間顆粒度
ts.resample('12H').asfreq()
ts.resample('12H').ffill()
時間計算
pandas 內的時間類
常用於時間的索引位移。
from pandas.tseries.offsets import DateOffset
d = pd.Timestamp(2020, 4, 1, 0, 0, 10)
d
Out[49]: Timestamp('2020-04-01 00:00:10')
d + DateOffset()
Out[50]: Timestamp('2020-04-02 00:00:10')
d + DateOffset(months=1, days=1)
Out[53]: Timestamp('2020-05-02 00:00:10')
from pandas.tseries.offsets import BDay
d + BDay()
Out[52]: Timestamp('2020-04-02 00:00:10')
d + 10 * BDay()
Out[54]: Timestamp('2020-04-15 00:00:10')
d + BMonthEnd() * 2
Out[57]: Timestamp('2020-05-29 00:00:10')
from pandas.tseries.offsets import BYearEnd
d + BYearEnd()
Out[66]: Timestamp('2020-12-31 00:00:10')
d + BYearEnd() * 2
Out[67]: Timestamp('2021-12-31 00:00:10')
d + BYearEnd(month=1)
Out[71]: Timestamp('2021-01-29 00:00:10')
from pandas.tseries.offsets import Week
d - Week()
Out[73]: Timestamp('2020-03-25 00:00:10')
d - Week(weekday=3) # 移動到上週四
Out[76]: Timestamp('2020-03-26 00:00:10')
from pandas.tseries.offsets import Minute
d + Minute(10)
Out[104]: Timestamp('2020-04-01 00:10:10')
常用時間頻率參數
參數名 | 說明 |
---|---|
B | 工作日頻率 |
C | 定製工作日頻率 |
D | 日曆日頻率 |
W | 每週頻率 |
M | 月結束頻率 |
SM | 半月結束頻率(15 個月和月末) |
BM | 業務月末頻率 |
CBM | 定製業務月末頻率 |
MS | 月起始頻率 |
sMs | 半月起始頻率(第 1 和 15) |
BMS | 業務月開始頻率 |
CBMS | 定製商業月份開始頻率 |
Q | 四分頻結束頻率 |
BQ | 業務四分之一頻率 |
QS | 四分頻啓動頻率 |
BQS | 業務季開始頻率 |
A | 年結束頻率 |
BA | 業務年結束頻率 |
AS | 年起始頻率 |
BAS | 業務年開始頻率 |
BH | 工作時間頻率 |
H | 每小時頻率 |
T, min | 分鐘頻率 |
S | 次頻 |
L, ms | 毫秒 |
U, uS | 微秒 |
N | 納秒 |
pd.date_range('2020-4-1', periods=10, freq='B')
Out[106]:
DatetimeIndex(['2020-04-01', '2020-04-02', '2020-04-03', '2020-04-06',
'2020-04-07', '2020-04-08', '2020-04-09', '2020-04-10',
'2020-04-13', '2020-04-14'],
dtype='datetime64[ns]', freq='B')
pd.date_range('2020-4-1', periods=10, freq='D')
Out[108]:
DatetimeIndex(['2020-04-01', '2020-04-02', '2020-04-03', '2020-04-04',
'2020-04-05', '2020-04-06', '2020-04-07', '2020-04-08',
'2020-04-09', '2020-04-10'],
dtype='datetime64[ns]', freq='D')
pd.date_range('2020-4-1', periods=10, freq='W')
Out[109]:
DatetimeIndex(['2020-04-05', '2020-04-12', '2020-04-19', '2020-04-26',
'2020-05-03', '2020-05-10', '2020-05-17', '2020-05-24',
'2020-05-31', '2020-06-07'],
dtype='datetime64[ns]', freq='W-SUN')
pd.date_range('2020-4-1', periods=10, freq='M')
Out[110]:
DatetimeIndex(['2020-04-30', '2020-05-31', '2020-06-30', '2020-07-31',
'2020-08-31', '2020-09-30', '2020-10-31', '2020-11-30',
'2020-12-31', '2021-01-31'],
dtype='datetime64[ns]', freq='M')
pd.date_range('2020-4-1', periods=10, freq='SM')
Out[111]:
DatetimeIndex(['2020-04-15', '2020-04-30', '2020-05-15', '2020-05-31',
'2020-06-15', '2020-06-30', '2020-07-15', '2020-07-31',
'2020-08-15', '2020-08-31'],
dtype='datetime64[ns]', freq='SM-15')
pd.date_range('2020-4-1', periods=10, freq='BM')
Out[112]:
DatetimeIndex(['2020-04-30', '2020-05-29', '2020-06-30', '2020-07-31',
'2020-08-31', '2020-09-30', '2020-10-30', '2020-11-30',
'2020-12-31', '2021-01-29'],
dtype='datetime64[ns]', freq='BM')
pd.date_range('2020-4-1', periods=10, freq='MS')
Out[113]:
DatetimeIndex(['2020-04-01', '2020-05-01', '2020-06-01', '2020-07-01',
'2020-08-01', '2020-09-01', '2020-10-01', '2020-11-01',
'2020-12-01', '2021-01-01'],
dtype='datetime64[ns]', freq='MS')
pd.date_range('2020-4-1', periods=10, freq='Q')
Out[114]:
DatetimeIndex(['2020-06-30', '2020-09-30', '2020-12-31', '2021-03-31',
'2021-06-30', '2021-09-30', '2021-12-31', '2022-03-31',
'2022-06-30', '2022-09-30'],
dtype='datetime64[ns]', freq='Q-DEC')
pd.date_range('2020-4-1', periods=10, freq='QS')
Out[115]:
DatetimeIndex(['2020-04-01', '2020-07-01', '2020-10-01', '2021-01-01',
'2021-04-01', '2021-07-01', '2021-10-01', '2022-01-01',
'2022-04-01', '2022-07-01'],
dtype='datetime64[ns]', freq='QS-JAN')
pd.date_range('2020-4-1', periods=10, freq='BQ')
Out[116]:
DatetimeIndex(['2020-06-30', '2020-09-30', '2020-12-31', '2021-03-31',
'2021-06-30', '2021-09-30', '2021-12-31', '2022-03-31',
'2022-06-30', '2022-09-30'],
dtype='datetime64[ns]', freq='BQ-DEC')
pd.date_range('2020-4-1', periods=10, freq='BH')
Out[117]:
DatetimeIndex(['2020-04-01 09:00:00', '2020-04-01 10:00:00',
'2020-04-01 11:00:00', '2020-04-01 12:00:00',
'2020-04-01 13:00:00', '2020-04-01 14:00:00',
'2020-04-01 15:00:00', '2020-04-01 16:00:00',
'2020-04-02 09:00:00', '2020-04-02 10:00:00'],
dtype='datetime64[ns]', freq='BH')
pd.date_range('2020-4-1', periods=10, freq='T')
Out[118]:
DatetimeIndex(['2020-04-01 00:00:00', '2020-04-01 00:01:00',
'2020-04-01 00:02:00', '2020-04-01 00:03:00',
'2020-04-01 00:04:00', '2020-04-01 00:05:00',
'2020-04-01 00:06:00', '2020-04-01 00:07:00',
'2020-04-01 00:08:00', '2020-04-01 00:09:00'],
dtype='datetime64[ns]', freq='T')
pd.date_range('2020-4-1', periods=10, freq='L')
Out[120]:
DatetimeIndex([ '2020-04-01 00:00:00', '2020-04-01 00:00:00.001000',
'2020-04-01 00:00:00.002000', '2020-04-01 00:00:00.003000',
'2020-04-01 00:00:00.004000', '2020-04-01 00:00:00.005000',
'2020-04-01 00:00:00.006000', '2020-04-01 00:00:00.007000',
'2020-04-01 00:00:00.008000', '2020-04-01 00:00:00.009000'],
dtype='datetime64[ns]', freq='L')
pd.date_range('2020-4-1', periods=10, freq='S')
Out[121]:
DatetimeIndex(['2020-04-01 00:00:00', '2020-04-01 00:00:01',
'2020-04-01 00:00:02', '2020-04-01 00:00:03',
'2020-04-01 00:00:04', '2020-04-01 00:00:05',
'2020-04-01 00:00:06', '2020-04-01 00:00:07',
'2020-04-01 00:00:08', '2020-04-01 00:00:09'],
dtype='datetime64[ns]', freq='S')
pd.date_range('2020-4-1', periods=10, freq='N')
Out[122]:
DatetimeIndex([ '2020-04-01 00:00:00',
'2020-04-01 00:00:00.000000001',
'2020-04-01 00:00:00.000000002',
'2020-04-01 00:00:00.000000003',
'2020-04-01 00:00:00.000000004',
'2020-04-01 00:00:00.000000005',
'2020-04-01 00:00:00.000000006',
'2020-04-01 00:00:00.000000007',
'2020-04-01 00:00:00.000000008',
'2020-04-01 00:00:00.000000009'],
dtype='datetime64[ns]', freq='N')
pd.date_range('2020-4-1', periods=10, freq='1D1H10T10U')
Out[125]:
DatetimeIndex([ '2020-04-01 00:00:00', '2020-04-02 01:10:00.000010',
'2020-04-03 02:20:00.000020', '2020-04-04 03:30:00.000030',
'2020-04-05 04:40:00.000040', '2020-04-06 05:50:00.000050',
'2020-04-07 07:00:00.000060', '2020-04-08 08:10:00.000070',
'2020-04-09 09:20:00.000080', '2020-04-10 10:30:00.000090'],
dtype='datetime64[ns]', freq='90600000010U')
指定後綴默認以改變默認間隔點
pd.date_range('2020-4-1', periods=10, freq='W-Wed')
Out[126]:
DatetimeIndex(['2020-04-01', '2020-04-08', '2020-04-15', '2020-04-22',
'2020-04-29', '2020-05-06', '2020-05-13', '2020-05-20',
'2020-05-27', '2020-06-03'],
dtype='datetime64[ns]', freq='W-WED')
採樣聚合
ts.resample('M').sum()
Out[129]:
2020-04-30 -0.247197
2020-05-31 1.055703
2020-06-30 -0.221805
2020-07-31 1.433503
Freq: M, dtype: float64
ts.resample('M').agg([np.sum, np.mean])
Out[132]:
sum mean
2020-04-30 -0.247197 -0.008240
2020-05-31 1.055703 0.034055
2020-06-30 -0.221805 -0.007394
2020-07-31 1.433503 0.159278