文章目錄
時間日期
- 時間戳 tiimestamp:固定的時刻 -> pd.Timestamp
- 固定時期 period:比如 2016年3月份,再如2015年銷售額 -> pd.Period
- 時間間隔 interval:由起始時間和結束時間來表示,固定時期是時間間隔的一個特殊
時間日期在 Pandas 裏的作用
- 分析金融數據,如股票交易數據
- 分析服務器日誌
import pandas as pd
import numpy as np
from datetime import datetime
from datetime import timedelta
時間差
date1 = datetime(2016, 3, 20)
date2 = datetime(2016, 3, 16)
delta = date1 - date2
delta
[Out:]
datetime.timedelta(4)
delta.days
[Out:]
4
delta.total_seconds()
[Out:]
345600.0
date2 + delta
[Out:]
datetime.datetime(2016, 3, 20, 0, 0)
date2 + timedelta(4.5)
[Out:]
datetime.datetime(2016, 3, 20, 12, 0)
字符串和 datetime 轉換
關於 datetime 格式定義,可以參閱 python 官方文檔
date = datetime(2016, 3, 20, 8, 30)
date
[Out:]
datetime.datetime(2016, 3, 20, 8, 30)
str(date)
[Out:]
'2016-03-20 08:30:00'
date.strftime('%Y-%m-%d %H:%M:%S')
[Out:]
'2016-03-20 08:30:00'
datetime.strptime('2016-03-20 09:30', '%Y-%m-%d %H:%M')
[Out:]
datetime.datetime(2016, 3, 20, 9, 30)
Pandas 裏的時間序列
Pandas 裏使用 Timestamp 來表達時間
dates = [datetime(2016, 3, 1), datetime(2016, 3, 2), datetime(2016, 3, 3), datetime(2016, 3, 4)]
s = pd.Series(np.random.randn(4), index=dates)
s
[Out:]
2016-03-01 1.650889
2016-03-02 -0.328463
2016-03-03 1.674872
2016-03-04 -0.310849
dtype: float64
type(s.index)
[Out:]
pandas.tseries.index.DatetimeIndex
type(s.index[0])
[Out:]
pandas.tslib.Timestamp
日期範圍
生成日期範圍
pd.date_range('20160320', '20160331')
[Out:]
DatetimeIndex(['2016-03-20', '2016-03-21', '2016-03-22', '2016-03-23',
'2016-03-24', '2016-03-25', '2016-03-26', '2016-03-27',
'2016-03-28', '2016-03-29', '2016-03-30', '2016-03-31'],
dtype='datetime64[ns]', freq='D')
pd.date_range(start='20160320', periods=10)
[Out:]
DatetimeIndex(['2016-03-20', '2016-03-21', '2016-03-22', '2016-03-23',
'2016-03-24', '2016-03-25', '2016-03-26', '2016-03-27',
'2016-03-28', '2016-03-29'],
dtype='datetime64[ns]', freq='D')
## 規則化時間戳
pd.date_range(start='2016-03-20 16:23:32', periods=10, normalize=True)
[Out:]
DatetimeIndex(['2016-03-20', '2016-03-21', '2016-03-22', '2016-03-23',
'2016-03-24', '2016-03-25', '2016-03-26', '2016-03-27',
'2016-03-28', '2016-03-29'],
dtype='datetime64[ns]', freq='D')
時間頻率
## 星期
pd.date_range(start='20160320', periods=10, freq='W')
[Out:]
DatetimeIndex(['2016-03-20', '2016-03-27', '2016-04-03', '2016-04-10',
'2016-04-17', '2016-04-24', '2016-05-01', '2016-05-08',
'2016-05-15', '2016-05-22'],
dtype='datetime64[ns]', freq='W-SUN')
# 月
pd.date_range(start='20160320', periods=10, freq='M')
[Out:]
DatetimeIndex(['2016-03-31', '2016-04-30', '2016-05-31', '2016-06-30',
'2016-07-31', '2016-08-31', '2016-09-30', '2016-10-31',
'2016-11-30', '2016-12-31'],
dtype='datetime64[ns]', freq='M')
## 每個月最後一個工作日組成的索引
pd.date_range(start='20160320', periods=10, freq='BM')
[Out:]
DatetimeIndex(['2016-03-31', '2016-04-29', '2016-05-31', '2016-06-30',
'2016-07-29', '2016-08-31', '2016-09-30', '2016-10-31',
'2016-11-30', '2016-12-30'],
dtype='datetime64[ns]', freq='BM')
# 小時
pd.date_range(start='20160320', periods=10, freq='4H')
[Out:]
DatetimeIndex(['2016-03-20 00:00:00', '2016-03-20 04:00:00',
'2016-03-20 08:00:00', '2016-03-20 12:00:00',
'2016-03-20 16:00:00', '2016-03-20 20:00:00',
'2016-03-21 00:00:00', '2016-03-21 04:00:00',
'2016-03-21 08:00:00', '2016-03-21 12:00:00'],
dtype='datetime64[ns]', freq='4H')
時期及算術運算
- pd.Period 表示時期,比如幾日,月或幾個月等。比如用來統計每個月的銷售額,就可以用時期作爲單位。
p1 = pd.Period(2010)
p1
[Out:]
Period('2010', 'A-DEC')
p2 = p1 + 2
p2
[Out:]
Period('2012', 'A-DEC')
p2 - p1
[Out:]
2L
p1 = pd.Period(2016, freq='M')
p1
[Out:]
Period('2016-01', 'M')
p1 + 3
[Out:]
Period('2016-04', 'M')
時期序列
pd.period_range(start='2016-01', periods=12, freq='M')
[Out:]
PeriodIndex(['2016-01', '2016-02', '2016-03', '2016-04', '2016-05', '2016-06',
'2016-07', '2016-08', '2016-09', '2016-10', '2016-11', '2016-12'],
dtype='int64', freq='M')
pd.period_range(start='2016-01', end='2016-10', freq='M')
[Out:]
PeriodIndex(['2016-01', '2016-02', '2016-03', '2016-04', '2016-05', '2016-06',
'2016-07', '2016-08', '2016-09', '2016-10'],
dtype='int64', freq='M')
# 直接用字符串
index = pd.PeriodIndex(['2016Q1', '2016Q2', '2016Q3'], freq='Q-DEC')
index
[Out:]
PeriodIndex(['2016Q1', '2016Q2', '2016Q3'], dtype='int64', freq='Q-DEC')
時期的頻率轉換
asfreq
- A-DEC: 以 12 月份作爲結束的年時期
- A-NOV: 以 11 月份作爲結束的年時期
- Q-DEC: 以 12 月份作爲結束的季度時期
p = pd.Period('2016', freq='A-DEC')
p
[Out:]
Period('2016', 'A-DEC')
p.asfreq('M', how='start')
[Out:]
Period('2016-01', 'M')
p.asfreq('M', how='end')
[Out:]
Period('2016-12', 'M')
p = pd.Period('2016-04', freq='M')
p
[Out:]
Period('2016-04', 'M')
p.asfreq('A-DEC')
[Out:]
Period('2016', 'A-DEC')
# 以年爲週期,以一年中的 3 月份作爲年的結束(財年)
p.asfreq('A-MAR')
[Out:]
Period('2017', 'A-MAR')
季度時間頻率
- Pandas 支持 12 種季度型頻率,從 Q-JAN 到 Q-DEC
p = pd.Period('2016Q4', 'Q-JAN')
p
[Out:]
Period('2016Q4', 'Q-JAN')
# 以 1 月份結束的財年中,2016Q4 的時期是指 2015-11-1 到 2016-1-31
p.asfreq('D', how='start'), p.asfreq('D', how='end')
[Out:]
(Period('2015-11-01', 'D'), Period('2016-01-31', 'D'))
# 獲取該季度倒數第二個工作日下午4點的時間戳
p4pm = (p.asfreq('B', how='end') - 1).asfreq('T', 'start') + 16 * 60
p4pm
[Out:]
Period('2016-01-28 16:00', 'T')
# 轉換爲 timestamp
p4pm.to_timestamp()
[Out:]
Timestamp('2016-01-28 16:00:00')
Timestamp 和 Period 相互轉換
ts = pd.Series(np.random.randn(5), index = pd.date_range('2016-01-01', periods=5, freq='M'))
ts
[Out:]
2016-01-31 -0.773323
2016-02-29 0.215953
2016-03-31 1.301631
2016-04-30 -0.066134
2016-05-31 1.651792
Freq: M, dtype: float64
ts.to_period()
[Out:]
2016-01 -0.773323
2016-02 0.215953
2016-03 1.301631
2016-04 -0.066134
2016-05 1.651792
Freq: M, dtype: float64
ts = pd.Series(np.random.randn(5), index = pd.date_range('2016-12-29', periods=5, freq='D'))
ts
[Out:]
2016-12-29 -0.110462
2016-12-30 -0.265792
2016-12-31 -0.382456
2017-01-01 -0.036111
2017-01-02 -1.029658
Freq: D, dtype: float64
pts = ts.to_period(freq='M')
pts
[Out:]
2016-12 -0.110462
2016-12 -0.265792
2016-12 -0.382456
2017-01 -0.036111
2017-01 -1.029658
Freq: M, dtype: float64
pts.groupby(level=0).sum()
[Out:]
2016-12 -0.758711
2017-01 -1.065769
Freq: M, dtype: float64
# 轉換爲時間戳時,細部時間會丟失
pts.to_timestamp(how='end')
[Out:]
2016-12-31 -0.110462
2016-12-31 -0.265792
2016-12-31 -0.382456
2017-01-31 -0.036111
2017-01-31 -1.029658
dtype: float64
重採樣
- 高頻率 -> 低頻率 -> 降採樣:5 分鐘股票交易數據轉換爲日交易數據
- 低頻率 -> 高頻率 -> 升採樣
- 其他重採樣:每週三 (W-WED) 轉換爲每週五 (W-FRI)
ts = pd.Series(np.random.randint(0, 50, 60), index=pd.date_range('2016-04-25 09:30', periods=60, freq='T'))
ts
[Out:]
2016-04-25 09:30:00 18
2016-04-25 09:31:00 41
2016-04-25 09:32:00 49
2016-04-25 09:33:00 26
2016-04-25 09:34:00 5
2016-04-25 09:35:00 12
2016-04-25 09:36:00 6
2016-04-25 09:37:00 47
2016-04-25 09:38:00 16
2016-04-25 09:39:00 37
2016-04-25 09:40:00 44
2016-04-25 09:41:00 8
2016-04-25 09:42:00 22
2016-04-25 09:43:00 24
2016-04-25 09:44:00 12
2016-04-25 09:45:00 26
2016-04-25 09:46:00 30
2016-04-25 09:47:00 38
2016-04-25 09:48:00 5
2016-04-25 09:49:00 26
2016-04-25 09:50:00 39
2016-04-25 09:51:00 7
2016-04-25 09:52:00 6
2016-04-25 09:53:00 12
2016-04-25 09:54:00 24
2016-04-25 09:55:00 0
2016-04-25 09:56:00 12
2016-04-25 09:57:00 27
2016-04-25 09:58:00 10
2016-04-25 09:59:00 26
2016-04-25 10:00:00 27
2016-04-25 10:01:00 18
2016-04-25 10:02:00 27
2016-04-25 10:03:00 25
2016-04-25 10:04:00 25
2016-04-25 10:05:00 35
2016-04-25 10:06:00 28
2016-04-25 10:07:00 3
2016-04-25 10:08:00 20
2016-04-25 10:09:00 48
2016-04-25 10:10:00 5
2016-04-25 10:11:00 48
2016-04-25 10:12:00 30
2016-04-25 10:13:00 2
2016-04-25 10:14:00 11
2016-04-25 10:15:00 18
2016-04-25 10:16:00 21
2016-04-25 10:17:00 32
2016-04-25 10:18:00 43
2016-04-25 10:19:00 10
2016-04-25 10:20:00 5
2016-04-25 10:21:00 45
2016-04-25 10:22:00 3
2016-04-25 10:23:00 30
2016-04-25 10:24:00 3
2016-04-25 10:25:00 24
2016-04-25 10:26:00 46
2016-04-25 10:27:00 2
2016-04-25 10:28:00 33
2016-04-25 10:29:00 25
Freq: T, dtype: int32
# 0-4 分鐘爲第一組
ts.resample('5min', how='sum')
[Out:]
2016-04-25 09:30:00 139
2016-04-25 09:35:00 118
2016-04-25 09:40:00 110
2016-04-25 09:45:00 125
2016-04-25 09:50:00 88
2016-04-25 09:55:00 75
2016-04-25 10:00:00 122
2016-04-25 10:05:00 134
2016-04-25 10:10:00 96
2016-04-25 10:15:00 124
2016-04-25 10:20:00 86
2016-04-25 10:25:00 130
Freq: 5T, dtype: int32
# 0-4 分鐘爲第一組
ts.resample('5min', how='sum', label='right'
[Out:]
2016-04-25 09:35:00 139
2016-04-25 09:40:00 118
2016-04-25 09:45:00 110
2016-04-25 09:50:00 125
2016-04-25 09:55:00 88
2016-04-25 10:00:00 75
2016-04-25 10:05:00 122
2016-04-25 10:10:00 134
2016-04-25 10:15:00 96
2016-04-25 10:20:00 124
2016-04-25 10:25:00 86
2016-04-25 10:30:00 130
Freq: 5T, dtype: int32
OHLC 重採樣
金融數據專用:Open/High/Low/Close
ts.resample('5min', how='ohlc')
[Out:]
open high low close
2016-04-25 09:30:00 18 49 5 5
2016-04-25 09:35:00 12 47 6 37
2016-04-25 09:40:00 44 44 8 12
2016-04-25 09:45:00 26 38 5 26
2016-04-25 09:50:00 39 39 6 24
2016-04-25 09:55:00 0 27 0 26
2016-04-25 10:00:00 27 27 18 25
2016-04-25 10:05:00 35 48 3 48
2016-04-25 10:10:00 5 48 2 11
2016-04-25 10:15:00 18 43 10 10
2016-04-25 10:20:00 5 45 3 3
2016-04-25 10:25:00 24 46 2 25
### 通過 groupby 重採樣
ts = pd.Series(np.random.randint(0, 50, 100), index=pd.date_range('2016-03-01', periods=100, freq='D'))
ts
[Out:]
2016-03-01 13
2016-03-02 21
2016-03-03 26
2016-03-04 3
2016-03-05 31
2016-03-06 29
2016-03-07 42
2016-03-08 24
2016-03-09 10
2016-03-10 42
2016-03-11 42
2016-03-12 7
2016-03-13 10
2016-03-14 48
2016-03-15 12
2016-03-16 15
2016-03-17 16
2016-03-18 34
2016-03-19 45
2016-03-20 40
2016-03-21 45
2016-03-22 46
2016-03-23 21
2016-03-24 27
2016-03-25 10
2016-03-26 47
2016-03-27 8
2016-03-28 9
2016-03-29 0
2016-03-30 20
..
2016-05-10 38
2016-05-11 46
2016-05-12 8
2016-05-13 15
2016-05-14 13
2016-05-15 30
2016-05-16 25
2016-05-17 15
2016-05-18 3
2016-05-19 5
2016-05-20 21
2016-05-21 18
2016-05-22 11
2016-05-23 47
2016-05-24 14
2016-05-25 33
2016-05-26 37
2016-05-27 40
2016-05-28 5
2016-05-29 27
2016-05-30 2
2016-05-31 31
2016-06-01 31
2016-06-02 41
2016-06-03 28
2016-06-04 2
2016-06-05 21
2016-06-06 10
2016-06-07 21
2016-06-08 18
Freq: D, dtype: int32
ts.groupby(lambda x: x.month).sum()
[Out:]
3 759
4 648
5 748
6 172
dtype: int32
ts.groupby(ts.index.to_period('M')).sum()
[Out:]
2016-03 759
2016-04 648
2016-05 748
2016-06 172
Freq: M, dtype: int32
升採樣和插值
# 以周爲單位,每週五采樣
df = pd.DataFrame(np.random.randint(1, 50, 2), index=pd.date_range('2016-04-22', periods=2, freq='W-FRI'))
df
[Out:]
0
2016-04-22 10
2016-04-29 6
df.resample('D')
[Out:]
0
2016-04-22 10
2016-04-23 NaN
2016-04-24 NaN
2016-04-25 NaN
2016-04-26 NaN
2016-04-27 NaN
2016-04-28 NaN
2016-04-29 6
df.resample('D', fill_method='ffill', limit=3)
[Out:]
0
2016-04-22 10
2016-04-23 10
2016-04-24 10
2016-04-25 10
2016-04-26 NaN
2016-04-27 NaN
2016-04-28 NaN
2016-04-29 6
# 以周爲單位,每週一採樣
df.resample('W-MON', fill_method='ffill')
[Out:]
0
2016-04-25 10
2016-05-02 6
時期重採樣
df = pd.DataFrame(np.random.randint(2, 30, (24, 4)),
index=pd.period_range('2015-01', '2016-12', freq='M'),
columns=list('ABCD'))
df
[Out:]
A B C D
2015-01 20 7 22 18
2015-02 2 28 21 19
2015-03 13 17 12 7
2015-04 24 17 20 14
2015-05 15 13 15 20
2015-06 19 28 2 22
2015-07 20 7 2 27
2015-08 10 18 2 16
2015-09 17 24 11 9
2015-10 23 2 21 25
2015-11 24 3 19 8
2015-12 7 16 6 12
2016-01 18 13 8 15
2016-02 17 14 2 21
2016-03 17 6 5 24
2016-04 24 14 22 14
2016-05 16 14 20 14
2016-06 26 29 14 15
2016-07 2 11 11 2
2016-08 12 11 17 18
2016-09 19 21 4 16
2016-10 21 16 11 7
2016-11 16 23 2 22
2016-12 21 9 27 11
adf = df.resample('A-DEC', how='mean')
adf
[Out:]
A B C D
2015 16.166667 15.000000 12.750000 16.416667
2016 17.416667 15.083333 11.916667 14.916667
df.resample('A-MAY', how='mean')
[Out:]
A B C D
2015 14.800000 16.400000 18.000000 15.60
2016 17.666667 13.250000 10.000000 17.25
2017 16.714286 17.142857 12.285714 13.00
# 升採樣
adf.resample('Q-DEC')
[Out:]
A B C D
2015Q1 16.166667 15.000000 12.750000 16.416667
2015Q2 NaN NaN NaN NaN
2015Q3 NaN NaN NaN NaN
2015Q4 NaN NaN NaN NaN
2016Q1 17.416667 15.083333 11.916667 14.916667
2016Q2 NaN NaN NaN NaN
2016Q3 NaN NaN NaN NaN
2016Q4 NaN NaN NaN NaN
adf.resample('Q-DEC', fill_method='ffill')
[Out:]
A B C D
2015Q1 16.166667 15.000000 12.750000 16.416667
2015Q2 16.166667 15.000000 12.750000 16.416667
2015Q3 16.166667 15.000000 12.750000 16.416667
2015Q4 16.166667 15.000000 12.750000 16.416667
2016Q1 17.416667 15.083333 11.916667 14.916667
2016Q2 17.416667 15.083333 11.916667 14.916667
2016Q3 17.416667 15.083333 11.916667 14.916667
2016Q4 17.416667 15.083333 11.916667 14.916667
性能
n = 1000000
ts = pd.Series(np.random.randn(n),
index=pd.date_range('2000-01-01', periods=n, freq='10ms'))
len(ts)
[Out:]
1000000
%timeit ts.resample('10min', how='ohlc')
# out=> 10 loops, best of 3: 21.9 ms per loop
ts.resample('D', how='ohlc')
[Out:]
open high low close
2000-01-01 1.161091 4.551988 -4.660681 -0.406231
從文件中讀取日期序列
數據在這:練習數據下載,也可以不用,自己用循環生成一些就好
df = pd.read_csv('data/002001.csv', index_col='Date')
df
[Out:]
Open High Low Close Volume Adj Close
Date
2015-12-22 16.86 17.13 16.48 16.95 13519900 16.95
2015-12-21 16.31 17.00 16.20 16.85 14132200 16.85
2015-12-18 16.59 16.70 16.21 16.31 10524300 16.31
2015-12-17 16.28 16.75 16.16 16.60 12326500 16.60
2015-12-16 16.23 16.42 16.05 16.28 8026000 16.28
2015-12-15 16.06 16.31 15.95 16.18 6647500 16.18
2015-12-14 15.60 16.06 15.45 16.06 8355200 16.06
2015-12-11 15.50 15.80 15.41 15.62 7243500 15.62
2015-12-10 15.99 16.05 15.51 15.56 7654900 15.56
2015-12-09 16.00 16.19 15.80 15.83 7926900 15.83
2015-12-08 16.54 16.55 16.01 16.05 7640100 16.05
2015-12-07 16.50 17.04 16.48 16.63 11917200 16.63
2015-12-04 16.13 16.85 16.01 16.62 14011100 16.62
2015-12-03 15.97 16.34 15.88 16.21 9504000 16.21
2015-12-02 15.89 16.04 15.50 15.88 11229600 15.88
2015-12-01 15.67 15.96 15.50 15.85 7192200 15.85
2015-11-30 15.54 15.90 15.05 15.70 11615200 15.70
2015-11-27 16.61 16.99 15.10 15.54 15177000 15.54
2015-11-26 16.98 17.22 16.62 16.78 13196300 16.78
2015-11-25 16.15 17.04 16.03 16.94 18600100 16.94
2015-11-24 15.90 16.20 15.70 16.15 8561200 16.15
2015-11-23 16.09 16.32 16.00 16.05 9441700 16.05
2015-11-20 15.96 16.17 15.81 16.08 8022200 16.08
2015-11-19 15.75 16.05 15.71 16.02 5193300 16.02
2015-11-18 16.26 16.30 15.72 15.75 7318500 15.75
2015-11-17 16.41 16.47 16.11 16.22 11479800 16.22
2015-11-16 15.70 16.22 15.61 16.21 9083200 16.21
2015-11-13 16.36 16.47 15.90 15.95 12924400 15.95
2015-11-12 16.23 16.92 16.00 16.59 16492800 16.59
2015-11-11 16.16 16.28 15.81 16.22 15661900 16.22
2015-11-10 16.29 16.69 16.04 16.15 21457600 16.15
2015-11-09 15.70 16.29 15.56 16.02 20842600 16.02
2015-11-06 15.53 16.01 15.41 15.86 17735800 15.86
2015-11-05 15.33 15.79 15.21 15.52 19051400 15.52
2015-11-04 14.65 15.35 14.65 15.33 14578200 15.33
2015-11-03 14.84 14.96 14.44 14.62 6576300 14.62
2015-11-02 14.91 15.18 14.74 14.74 9487800 14.74
2015-10-30 15.25 15.52 14.81 15.22 12908500 15.22
2015-10-29 15.01 15.36 14.96 15.30 11177100 15.30
2015-10-28 15.14 15.50 14.96 15.02 11373200 15.02
2015-10-27 15.10 15.17 14.51 15.15 12950400 15.15
2015-10-26 15.41 15.55 14.87 15.18 15844500 15.18
2015-10-23 14.80 15.23 14.75 15.20 14769000 15.20
2015-10-22 14.28 14.82 14.25 14.73 10428900 14.73
2015-10-21 15.24 15.70 14.08 14.26 21113500 14.26
2015-10-20 14.99 15.24 14.89 15.22 11935800 15.22
2015-10-19 15.27 15.35 14.85 15.03 11601300 15.03
2015-10-16 15.23 15.35 14.82 15.25 14168700 15.25
2015-10-15 14.73 15.15 14.60 15.12 11177700 15.12
2015-10-14 14.99 15.12 14.72 14.73 10368900 14.73
2015-10-13 15.02 15.19 14.85 15.07 13408200 15.07
2015-10-12 14.63 15.43 14.41 15.30 24110800 15.30
2015-10-09 14.50 14.79 14.11 14.62 23818500 14.62
2015-10-08 14.75 14.75 14.65 14.75 18317200 14.75
2015-10-07 13.41 13.41 13.41 13.41 0 13.41
2015-10-06 13.41 13.41 13.41 13.41 0 13.41
2015-10-05 13.41 13.41 13.41 13.41 0 13.41
2015-10-02 13.41 13.41 13.41 13.41 0 13.41
2015-10-01 13.41 13.41 13.41 13.41 0 13.41
df.index
[Out:]
Index([u'2015-12-22', u'2015-12-21', u'2015-12-18', u'2015-12-17',
u'2015-12-16', u'2015-12-15', u'2015-12-14', u'2015-12-11',
u'2015-12-10', u'2015-12-09', u'2015-12-08', u'2015-12-07',
u'2015-12-04', u'2015-12-03', u'2015-12-02', u'2015-12-01',
u'2015-11-30', u'2015-11-27', u'2015-11-26', u'2015-11-25',
u'2015-11-24', u'2015-11-23', u'2015-11-20', u'2015-11-19',
u'2015-11-18', u'2015-11-17', u'2015-11-16', u'2015-11-13',
u'2015-11-12', u'2015-11-11', u'2015-11-10', u'2015-11-09',
u'2015-11-06', u'2015-11-05', u'2015-11-04', u'2015-11-03',
u'2015-11-02', u'2015-10-30', u'2015-10-29', u'2015-10-28',
u'2015-10-27', u'2015-10-26', u'2015-10-23', u'2015-10-22',
u'2015-10-21', u'2015-10-20', u'2015-10-19', u'2015-10-16',
u'2015-10-15', u'2015-10-14', u'2015-10-13', u'2015-10-12',
u'2015-10-09', u'2015-10-08', u'2015-10-07', u'2015-10-06',
u'2015-10-05', u'2015-10-02', u'2015-10-01'],
dtype='object', name=u'Date')
df = pd.read_csv('data/002001.csv', index_col='Date', parse_dates=True)
df.index
[Out:]
DatetimeIndex(['2015-12-22', '2015-12-21', '2015-12-18', '2015-12-17',
'2015-12-16', '2015-12-15', '2015-12-14', '2015-12-11',
'2015-12-10', '2015-12-09', '2015-12-08', '2015-12-07',
'2015-12-04', '2015-12-03', '2015-12-02', '2015-12-01',
'2015-11-30', '2015-11-27', '2015-11-26', '2015-11-25',
'2015-11-24', '2015-11-23', '2015-11-20', '2015-11-19',
'2015-11-18', '2015-11-17', '2015-11-16', '2015-11-13',
'2015-11-12', '2015-11-11', '2015-11-10', '2015-11-09',
'2015-11-06', '2015-11-05', '2015-11-04', '2015-11-03',
'2015-11-02', '2015-10-30', '2015-10-29', '2015-10-28',
'2015-10-27', '2015-10-26', '2015-10-23', '2015-10-22',
'2015-10-21', '2015-10-20', '2015-10-19', '2015-10-16',
'2015-10-15', '2015-10-14', '2015-10-13', '2015-10-12',
'2015-10-09', '2015-10-08', '2015-10-07', '2015-10-06',
'2015-10-05', '2015-10-02', '2015-10-01'],
dtype='datetime64[ns]', name=u'Date', freq=None)
wdf = df['Adj Close'].resample('W-FRI', how='ohlc')
wdf
[Out:]
open high low close
Date
2015-10-02 13.41 13.41 13.41 13.41
2015-10-09 13.41 14.75 13.41 14.62
2015-10-16 15.30 15.30 14.73 15.25
2015-10-23 15.03 15.22 14.26 15.20
2015-10-30 15.18 15.30 15.02 15.22
2015-11-06 14.74 15.86 14.62 15.86
2015-11-13 16.02 16.59 15.95 15.95
2015-11-20 16.21 16.22 15.75 16.08
2015-11-27 16.05 16.94 15.54 15.54
2015-12-04 15.70 16.62 15.70 16.62
2015-12-11 16.63 16.63 15.56 15.62
2015-12-18 16.06 16.60 16.06 16.31
2015-12-25 16.85 16.95 16.85 16.95
wdf['Volume'] = df['Volume'].resample('W-FRI', how='sum')
wdf
[Out:]
open high low close Volume
Date
2015-10-02 13.41 13.41 13.41 13.41 0
2015-10-09 13.41 14.75 13.41 14.62 42135700
2015-10-16 15.30 15.30 14.73 15.25 73234300
2015-10-23 15.03 15.22 14.26 15.20 69848500
2015-10-30 15.18 15.30 15.02 15.22 64253700
2015-11-06 14.74 15.86 14.62 15.86 67429500
2015-11-13 16.02 16.59 15.95 15.95 87379300
2015-11-20 16.21 16.22 15.75 16.08 41097000
2015-11-27 16.05 16.94 15.54 15.54 64976300
2015-12-04 15.70 16.62 15.70 16.62 53552100
2015-12-11 16.63 16.63 15.56 15.62 42382600
2015-12-18 16.06 16.60 16.06 16.31 45879500
2015-12-25 16.85 16.95 16.85 16.95 27652100
自定義時間日期解析函數
def date_parser(s):
s = '2016/' + s
d = datetime.strptime(s, '%Y/%m/%d')
return d
df = pd.read_csv('data/custom_date.csv', parse_dates=True, index_col='Date', date_parser=date_parser)
df
[Out:]
Price
Date
2016-01-01 10.2
2016-01-02 10.4
2016-01-03 10.5
2016-01-04 10.8
2016-01-05 11.2
2016-01-06 10.6
df.index
[Out:]
DatetimeIndex(['2016-01-01', '2016-01-02', '2016-01-03', '2016-01-04',
'2016-01-05', '2016-01-06'],
dtype='datetime64[ns]', name=u'Date', freq=None)