pandas基礎(4.Datetime, Period, 重採樣)


時間日期

  • 時間戳 tiimestamp:固定的時刻 -> pd.Timestamp
  • 固定時期 period:比如 2016年3月份,再如2015年銷售額 -> pd.Period
  • 時間間隔 interval:由起始時間和結束時間來表示,固定時期是時間間隔的一個特殊

時間日期在 Pandas 裏的作用

  • 分析金融數據,如股票交易數據
  • 分析服務器日誌
import pandas as pd
import numpy as np

from datetime import datetime
from datetime import timedelta

時間差

date1 = datetime(2016, 3, 20)
date2 = datetime(2016, 3, 16)
delta = date1 - date2
delta

[Out:]
datetime.timedelta(4)

delta.days

[Out:]
4

delta.total_seconds()

[Out:]
345600.0

date2 + delta

[Out:]
datetime.datetime(2016, 3, 20, 0, 0)

date2 + timedelta(4.5)

[Out:]
datetime.datetime(2016, 3, 20, 12, 0)

字符串和 datetime 轉換

關於 datetime 格式定義,可以參閱 python 官方文檔

date = datetime(2016, 3, 20, 8, 30)
date

[Out:]
datetime.datetime(2016, 3, 20, 8, 30)

str(date)

[Out:]
'2016-03-20 08:30:00'

date.strftime('%Y-%m-%d %H:%M:%S')

[Out:]
'2016-03-20 08:30:00'

datetime.strptime('2016-03-20 09:30', '%Y-%m-%d %H:%M')

[Out:]
datetime.datetime(2016, 3, 20, 9, 30)

Pandas 裏的時間序列

Pandas 裏使用 Timestamp 來表達時間

dates = [datetime(2016, 3, 1), datetime(2016, 3, 2), datetime(2016, 3, 3), datetime(2016, 3, 4)]
s = pd.Series(np.random.randn(4), index=dates)
s


[Out:]
2016-03-01    1.650889
2016-03-02   -0.328463
2016-03-03    1.674872
2016-03-04   -0.310849
dtype: float64

type(s.index)

[Out:]
pandas.tseries.index.DatetimeIndex

type(s.index[0])

[Out:]
pandas.tslib.Timestamp


日期範圍

生成日期範圍

pd.date_range('20160320', '20160331')

[Out:]
DatetimeIndex(['2016-03-20', '2016-03-21', '2016-03-22', '2016-03-23',
               '2016-03-24', '2016-03-25', '2016-03-26', '2016-03-27',
               '2016-03-28', '2016-03-29', '2016-03-30', '2016-03-31'],
              dtype='datetime64[ns]', freq='D')

pd.date_range(start='20160320', periods=10)

[Out:]
DatetimeIndex(['2016-03-20', '2016-03-21', '2016-03-22', '2016-03-23',
               '2016-03-24', '2016-03-25', '2016-03-26', '2016-03-27',
               '2016-03-28', '2016-03-29'],
              dtype='datetime64[ns]', freq='D')

## 規則化時間戳
pd.date_range(start='2016-03-20 16:23:32', periods=10, normalize=True)

[Out:]
DatetimeIndex(['2016-03-20', '2016-03-21', '2016-03-22', '2016-03-23',
               '2016-03-24', '2016-03-25', '2016-03-26', '2016-03-27',
               '2016-03-28', '2016-03-29'],
              dtype='datetime64[ns]', freq='D')

時間頻率

## 星期
pd.date_range(start='20160320', periods=10, freq='W')

[Out:]
DatetimeIndex(['2016-03-20', '2016-03-27', '2016-04-03', '2016-04-10',
               '2016-04-17', '2016-04-24', '2016-05-01', '2016-05-08',
               '2016-05-15', '2016-05-22'],
              dtype='datetime64[ns]', freq='W-SUN')

# 月
pd.date_range(start='20160320', periods=10, freq='M')

[Out:]
DatetimeIndex(['2016-03-31', '2016-04-30', '2016-05-31', '2016-06-30',
               '2016-07-31', '2016-08-31', '2016-09-30', '2016-10-31',
               '2016-11-30', '2016-12-31'],
              dtype='datetime64[ns]', freq='M')

## 每個月最後一個工作日組成的索引
pd.date_range(start='20160320', periods=10, freq='BM')

[Out:]
DatetimeIndex(['2016-03-31', '2016-04-29', '2016-05-31', '2016-06-30',
               '2016-07-29', '2016-08-31', '2016-09-30', '2016-10-31',
               '2016-11-30', '2016-12-30'],
              dtype='datetime64[ns]', freq='BM')

# 小時
pd.date_range(start='20160320', periods=10, freq='4H')

[Out:]
DatetimeIndex(['2016-03-20 00:00:00', '2016-03-20 04:00:00',
               '2016-03-20 08:00:00', '2016-03-20 12:00:00',
               '2016-03-20 16:00:00', '2016-03-20 20:00:00',
               '2016-03-21 00:00:00', '2016-03-21 04:00:00',
               '2016-03-21 08:00:00', '2016-03-21 12:00:00'],
              dtype='datetime64[ns]', freq='4H')

時期及算術運算

  • pd.Period 表示時期,比如幾日,月或幾個月等。比如用來統計每個月的銷售額,就可以用時期作爲單位。
p1 = pd.Period(2010)
p1

[Out:]
Period('2010', 'A-DEC')

p2 = p1 + 2
p2

[Out:]
Period('2012', 'A-DEC')

p2 - p1

[Out:]
2L

p1 = pd.Period(2016, freq='M')
p1

[Out:]
Period('2016-01', 'M')

p1 + 3

[Out:]
Period('2016-04', 'M')

時期序列

pd.period_range(start='2016-01', periods=12, freq='M')

[Out:]
PeriodIndex(['2016-01', '2016-02', '2016-03', '2016-04', '2016-05', '2016-06',
             '2016-07', '2016-08', '2016-09', '2016-10', '2016-11', '2016-12'],
            dtype='int64', freq='M')

pd.period_range(start='2016-01', end='2016-10', freq='M')

[Out:]
PeriodIndex(['2016-01', '2016-02', '2016-03', '2016-04', '2016-05', '2016-06',
             '2016-07', '2016-08', '2016-09', '2016-10'],
            dtype='int64', freq='M')

# 直接用字符串
index = pd.PeriodIndex(['2016Q1', '2016Q2', '2016Q3'], freq='Q-DEC')
index

[Out:]
PeriodIndex(['2016Q1', '2016Q2', '2016Q3'], dtype='int64', freq='Q-DEC')

時期的頻率轉換

asfreq

  • A-DEC: 以 12 月份作爲結束的年時期
  • A-NOV: 以 11 月份作爲結束的年時期
  • Q-DEC: 以 12 月份作爲結束的季度時期
    在這裏插入圖片描述
p = pd.Period('2016', freq='A-DEC')
p

[Out:]
Period('2016', 'A-DEC')

p.asfreq('M', how='start')

[Out:]
Period('2016-01', 'M')

p.asfreq('M', how='end')

[Out:]
Period('2016-12', 'M')

p = pd.Period('2016-04', freq='M')
p

[Out:]
Period('2016-04', 'M')

p.asfreq('A-DEC')

[Out:]
Period('2016', 'A-DEC')

# 以年爲週期,以一年中的 3 月份作爲年的結束(財年)
p.asfreq('A-MAR')

[Out:]
Period('2017', 'A-MAR')

季度時間頻率

  • Pandas 支持 12 種季度型頻率,從 Q-JAN 到 Q-DEC
p = pd.Period('2016Q4', 'Q-JAN')
p

[Out:]
Period('2016Q4', 'Q-JAN')

# 以 1 月份結束的財年中,2016Q4 的時期是指 2015-11-1 到 2016-1-31
p.asfreq('D', how='start'), p.asfreq('D', how='end')

[Out:]
(Period('2015-11-01', 'D'), Period('2016-01-31', 'D'))

# 獲取該季度倒數第二個工作日下午4點的時間戳
p4pm = (p.asfreq('B', how='end') - 1).asfreq('T', 'start') + 16 * 60
p4pm

[Out:]
Period('2016-01-28 16:00', 'T')

# 轉換爲 timestamp
p4pm.to_timestamp()

[Out:]
Timestamp('2016-01-28 16:00:00')

Timestamp 和 Period 相互轉換

ts = pd.Series(np.random.randn(5), index = pd.date_range('2016-01-01', periods=5, freq='M'))
ts

[Out:]
2016-01-31   -0.773323
2016-02-29    0.215953
2016-03-31    1.301631
2016-04-30   -0.066134
2016-05-31    1.651792
Freq: M, dtype: float64

ts.to_period()

[Out:]
2016-01   -0.773323
2016-02    0.215953
2016-03    1.301631
2016-04   -0.066134
2016-05    1.651792
Freq: M, dtype: float64

ts = pd.Series(np.random.randn(5), index = pd.date_range('2016-12-29', periods=5, freq='D'))
ts

[Out:]
2016-12-29   -0.110462
2016-12-30   -0.265792
2016-12-31   -0.382456
2017-01-01   -0.036111
2017-01-02   -1.029658
Freq: D, dtype: float64

pts = ts.to_period(freq='M')
pts

[Out:]
2016-12   -0.110462
2016-12   -0.265792
2016-12   -0.382456
2017-01   -0.036111
2017-01   -1.029658
Freq: M, dtype: float64

pts.groupby(level=0).sum()

[Out:]
2016-12   -0.758711
2017-01   -1.065769
Freq: M, dtype: float64

# 轉換爲時間戳時,細部時間會丟失
pts.to_timestamp(how='end')

[Out:]
2016-12-31   -0.110462
2016-12-31   -0.265792
2016-12-31   -0.382456
2017-01-31   -0.036111
2017-01-31   -1.029658
dtype: float64

重採樣

  • 高頻率 -> 低頻率 -> 降採樣:5 分鐘股票交易數據轉換爲日交易數據
  • 低頻率 -> 高頻率 -> 升採樣
  • 其他重採樣:每週三 (W-WED) 轉換爲每週五 (W-FRI)
ts = pd.Series(np.random.randint(0, 50, 60), index=pd.date_range('2016-04-25 09:30', periods=60, freq='T'))
ts

[Out:]
2016-04-25 09:30:00    18
2016-04-25 09:31:00    41
2016-04-25 09:32:00    49
2016-04-25 09:33:00    26
2016-04-25 09:34:00     5
2016-04-25 09:35:00    12
2016-04-25 09:36:00     6
2016-04-25 09:37:00    47
2016-04-25 09:38:00    16
2016-04-25 09:39:00    37
2016-04-25 09:40:00    44
2016-04-25 09:41:00     8
2016-04-25 09:42:00    22
2016-04-25 09:43:00    24
2016-04-25 09:44:00    12
2016-04-25 09:45:00    26
2016-04-25 09:46:00    30
2016-04-25 09:47:00    38
2016-04-25 09:48:00     5
2016-04-25 09:49:00    26
2016-04-25 09:50:00    39
2016-04-25 09:51:00     7
2016-04-25 09:52:00     6
2016-04-25 09:53:00    12
2016-04-25 09:54:00    24
2016-04-25 09:55:00     0
2016-04-25 09:56:00    12
2016-04-25 09:57:00    27
2016-04-25 09:58:00    10
2016-04-25 09:59:00    26
2016-04-25 10:00:00    27
2016-04-25 10:01:00    18
2016-04-25 10:02:00    27
2016-04-25 10:03:00    25
2016-04-25 10:04:00    25
2016-04-25 10:05:00    35
2016-04-25 10:06:00    28
2016-04-25 10:07:00     3
2016-04-25 10:08:00    20
2016-04-25 10:09:00    48
2016-04-25 10:10:00     5
2016-04-25 10:11:00    48
2016-04-25 10:12:00    30
2016-04-25 10:13:00     2
2016-04-25 10:14:00    11
2016-04-25 10:15:00    18
2016-04-25 10:16:00    21
2016-04-25 10:17:00    32
2016-04-25 10:18:00    43
2016-04-25 10:19:00    10
2016-04-25 10:20:00     5
2016-04-25 10:21:00    45
2016-04-25 10:22:00     3
2016-04-25 10:23:00    30
2016-04-25 10:24:00     3
2016-04-25 10:25:00    24
2016-04-25 10:26:00    46
2016-04-25 10:27:00     2
2016-04-25 10:28:00    33
2016-04-25 10:29:00    25
Freq: T, dtype: int32


# 0-4 分鐘爲第一組
ts.resample('5min', how='sum')

[Out:]
2016-04-25 09:30:00    139
2016-04-25 09:35:00    118
2016-04-25 09:40:00    110
2016-04-25 09:45:00    125
2016-04-25 09:50:00     88
2016-04-25 09:55:00     75
2016-04-25 10:00:00    122
2016-04-25 10:05:00    134
2016-04-25 10:10:00     96
2016-04-25 10:15:00    124
2016-04-25 10:20:00     86
2016-04-25 10:25:00    130
Freq: 5T, dtype: int32

# 0-4 分鐘爲第一組
ts.resample('5min', how='sum', label='right'

[Out:]
2016-04-25 09:35:00    139
2016-04-25 09:40:00    118
2016-04-25 09:45:00    110
2016-04-25 09:50:00    125
2016-04-25 09:55:00     88
2016-04-25 10:00:00     75
2016-04-25 10:05:00    122
2016-04-25 10:10:00    134
2016-04-25 10:15:00     96
2016-04-25 10:20:00    124
2016-04-25 10:25:00     86
2016-04-25 10:30:00    130
Freq: 5T, dtype: int32

OHLC 重採樣

金融數據專用:Open/High/Low/Close

ts.resample('5min', how='ohlc')

[Out:]
			open	high	low	close
2016-04-25 09:30:00	18	49	5	5
2016-04-25 09:35:00	12	47	6	37
2016-04-25 09:40:00	44	44	8	12
2016-04-25 09:45:00	26	38	5	26
2016-04-25 09:50:00	39	39	6	24
2016-04-25 09:55:00	0	27	0	26
2016-04-25 10:00:00	27	27	18	25
2016-04-25 10:05:00	35	48	3	48
2016-04-25 10:10:00	5	48	2	11
2016-04-25 10:15:00	18	43	10	10
2016-04-25 10:20:00	5	45	3	3
2016-04-25 10:25:00	24	46	2	25

### 通過 groupby 重採樣
ts = pd.Series(np.random.randint(0, 50, 100), index=pd.date_range('2016-03-01', periods=100, freq='D'))
ts

[Out:]
2016-03-01    13
2016-03-02    21
2016-03-03    26
2016-03-04     3
2016-03-05    31
2016-03-06    29
2016-03-07    42
2016-03-08    24
2016-03-09    10
2016-03-10    42
2016-03-11    42
2016-03-12     7
2016-03-13    10
2016-03-14    48
2016-03-15    12
2016-03-16    15
2016-03-17    16
2016-03-18    34
2016-03-19    45
2016-03-20    40
2016-03-21    45
2016-03-22    46
2016-03-23    21
2016-03-24    27
2016-03-25    10
2016-03-26    47
2016-03-27     8
2016-03-28     9
2016-03-29     0
2016-03-30    20
              ..
2016-05-10    38
2016-05-11    46
2016-05-12     8
2016-05-13    15
2016-05-14    13
2016-05-15    30
2016-05-16    25
2016-05-17    15
2016-05-18     3
2016-05-19     5
2016-05-20    21
2016-05-21    18
2016-05-22    11
2016-05-23    47
2016-05-24    14
2016-05-25    33
2016-05-26    37
2016-05-27    40
2016-05-28     5
2016-05-29    27
2016-05-30     2
2016-05-31    31
2016-06-01    31
2016-06-02    41
2016-06-03    28
2016-06-04     2
2016-06-05    21
2016-06-06    10
2016-06-07    21
2016-06-08    18
Freq: D, dtype: int32

ts.groupby(lambda x: x.month).sum()

[Out:]
3    759
4    648
5    748
6    172
dtype: int32

ts.groupby(ts.index.to_period('M')).sum()

[Out:]
2016-03    759
2016-04    648
2016-05    748
2016-06    172
Freq: M, dtype: int32

升採樣和插值

# 以周爲單位,每週五采樣
df = pd.DataFrame(np.random.randint(1, 50, 2), index=pd.date_range('2016-04-22', periods=2, freq='W-FRI'))
df

[Out:]
	0
2016-04-22	10
2016-04-29	6

df.resample('D')

[Out:]
	0
2016-04-22	10
2016-04-23	NaN
2016-04-24	NaN
2016-04-25	NaN
2016-04-26	NaN
2016-04-27	NaN
2016-04-28	NaN
2016-04-29	6


df.resample('D', fill_method='ffill', limit=3)

[Out:]
0
2016-04-22	10
2016-04-23	10
2016-04-24	10
2016-04-25	10
2016-04-26	NaN
2016-04-27	NaN
2016-04-28	NaN
2016-04-29	6

# 以周爲單位,每週一採樣
df.resample('W-MON', fill_method='ffill')

[Out:]
	0
2016-04-25	10
2016-05-02	6

時期重採樣

df = pd.DataFrame(np.random.randint(2, 30, (24, 4)), 
                  index=pd.period_range('2015-01', '2016-12', freq='M'),
                  columns=list('ABCD'))
df

[Out:]
	A	B	C	D
2015-01	20	7	22	18
2015-02	2	28	21	19
2015-03	13	17	12	7
2015-04	24	17	20	14
2015-05	15	13	15	20
2015-06	19	28	2	22
2015-07	20	7	2	27
2015-08	10	18	2	16
2015-09	17	24	11	9
2015-10	23	2	21	25
2015-11	24	3	19	8
2015-12	7	16	6	12
2016-01	18	13	8	15
2016-02	17	14	2	21
2016-03	17	6	5	24
2016-04	24	14	22	14
2016-05	16	14	20	14
2016-06	26	29	14	15
2016-07	2	11	11	2
2016-08	12	11	17	18
2016-09	19	21	4	16
2016-10	21	16	11	7
2016-11	16	23	2	22
2016-12	21	9	27	11


adf = df.resample('A-DEC', how='mean')
adf

[Out:]
		A	B	C	D
2015	16.166667	15.000000	12.750000	16.416667
2016	17.416667	15.083333	11.916667	14.916667


df.resample('A-MAY', how='mean')

[Out:]
A	B	C	D
2015	14.800000	16.400000	18.000000	15.60
2016	17.666667	13.250000	10.000000	17.25
2017	16.714286	17.142857	12.285714	13.00

# 升採樣
adf.resample('Q-DEC')

[Out:]
		A	B	C	D
2015Q1	16.166667	15.000000	12.750000	16.416667
2015Q2	NaN	NaN	NaN	NaN
2015Q3	NaN	NaN	NaN	NaN
2015Q4	NaN	NaN	NaN	NaN
2016Q1	17.416667	15.083333	11.916667	14.916667
2016Q2	NaN	NaN	NaN	NaN
2016Q3	NaN	NaN	NaN	NaN
2016Q4	NaN	NaN	NaN	NaN


adf.resample('Q-DEC', fill_method='ffill')

[Out:]
A	B	C	D
2015Q1	16.166667	15.000000	12.750000	16.416667
2015Q2	16.166667	15.000000	12.750000	16.416667
2015Q3	16.166667	15.000000	12.750000	16.416667
2015Q4	16.166667	15.000000	12.750000	16.416667
2016Q1	17.416667	15.083333	11.916667	14.916667
2016Q2	17.416667	15.083333	11.916667	14.916667
2016Q3	17.416667	15.083333	11.916667	14.916667
2016Q4	17.416667	15.083333	11.916667	14.916667

性能

n = 1000000
ts = pd.Series(np.random.randn(n), 
               index=pd.date_range('2000-01-01', periods=n, freq='10ms'))
len(ts)

[Out:]
1000000

%timeit ts.resample('10min', how='ohlc')
# out=> 10 loops, best of 3: 21.9 ms per loop

ts.resample('D', how='ohlc')

[Out:]
open	high	low	close
2000-01-01	1.161091	4.551988	-4.660681	-0.406231

從文件中讀取日期序列

數據在這:練習數據下載,也可以不用,自己用循環生成一些就好

df = pd.read_csv('data/002001.csv', index_col='Date')
df

[Out:]
			Open	High	Low	Close	Volume	Adj Close
Date						
2015-12-22	16.86	17.13	16.48	16.95	13519900	16.95
2015-12-21	16.31	17.00	16.20	16.85	14132200	16.85
2015-12-18	16.59	16.70	16.21	16.31	10524300	16.31
2015-12-17	16.28	16.75	16.16	16.60	12326500	16.60
2015-12-16	16.23	16.42	16.05	16.28	8026000	16.28
2015-12-15	16.06	16.31	15.95	16.18	6647500	16.18
2015-12-14	15.60	16.06	15.45	16.06	8355200	16.06
2015-12-11	15.50	15.80	15.41	15.62	7243500	15.62
2015-12-10	15.99	16.05	15.51	15.56	7654900	15.56
2015-12-09	16.00	16.19	15.80	15.83	7926900	15.83
2015-12-08	16.54	16.55	16.01	16.05	7640100	16.05
2015-12-07	16.50	17.04	16.48	16.63	11917200	16.63
2015-12-04	16.13	16.85	16.01	16.62	14011100	16.62
2015-12-03	15.97	16.34	15.88	16.21	9504000	16.21
2015-12-02	15.89	16.04	15.50	15.88	11229600	15.88
2015-12-01	15.67	15.96	15.50	15.85	7192200	15.85
2015-11-30	15.54	15.90	15.05	15.70	11615200	15.70
2015-11-27	16.61	16.99	15.10	15.54	15177000	15.54
2015-11-26	16.98	17.22	16.62	16.78	13196300	16.78
2015-11-25	16.15	17.04	16.03	16.94	18600100	16.94
2015-11-24	15.90	16.20	15.70	16.15	8561200	16.15
2015-11-23	16.09	16.32	16.00	16.05	9441700	16.05
2015-11-20	15.96	16.17	15.81	16.08	8022200	16.08
2015-11-19	15.75	16.05	15.71	16.02	5193300	16.02
2015-11-18	16.26	16.30	15.72	15.75	7318500	15.75
2015-11-17	16.41	16.47	16.11	16.22	11479800	16.22
2015-11-16	15.70	16.22	15.61	16.21	9083200	16.21
2015-11-13	16.36	16.47	15.90	15.95	12924400	15.95
2015-11-12	16.23	16.92	16.00	16.59	16492800	16.59
2015-11-11	16.16	16.28	15.81	16.22	15661900	16.22
2015-11-10	16.29	16.69	16.04	16.15	21457600	16.15
2015-11-09	15.70	16.29	15.56	16.02	20842600	16.02
2015-11-06	15.53	16.01	15.41	15.86	17735800	15.86
2015-11-05	15.33	15.79	15.21	15.52	19051400	15.52
2015-11-04	14.65	15.35	14.65	15.33	14578200	15.33
2015-11-03	14.84	14.96	14.44	14.62	6576300	14.62
2015-11-02	14.91	15.18	14.74	14.74	9487800	14.74
2015-10-30	15.25	15.52	14.81	15.22	12908500	15.22
2015-10-29	15.01	15.36	14.96	15.30	11177100	15.30
2015-10-28	15.14	15.50	14.96	15.02	11373200	15.02
2015-10-27	15.10	15.17	14.51	15.15	12950400	15.15
2015-10-26	15.41	15.55	14.87	15.18	15844500	15.18
2015-10-23	14.80	15.23	14.75	15.20	14769000	15.20
2015-10-22	14.28	14.82	14.25	14.73	10428900	14.73
2015-10-21	15.24	15.70	14.08	14.26	21113500	14.26
2015-10-20	14.99	15.24	14.89	15.22	11935800	15.22
2015-10-19	15.27	15.35	14.85	15.03	11601300	15.03
2015-10-16	15.23	15.35	14.82	15.25	14168700	15.25
2015-10-15	14.73	15.15	14.60	15.12	11177700	15.12
2015-10-14	14.99	15.12	14.72	14.73	10368900	14.73
2015-10-13	15.02	15.19	14.85	15.07	13408200	15.07
2015-10-12	14.63	15.43	14.41	15.30	24110800	15.30
2015-10-09	14.50	14.79	14.11	14.62	23818500	14.62
2015-10-08	14.75	14.75	14.65	14.75	18317200	14.75
2015-10-07	13.41	13.41	13.41	13.41	0	13.41
2015-10-06	13.41	13.41	13.41	13.41	0	13.41
2015-10-05	13.41	13.41	13.41	13.41	0	13.41
2015-10-02	13.41	13.41	13.41	13.41	0	13.41
2015-10-01	13.41	13.41	13.41	13.41	0	13.41

df.index

[Out:]
Index([u'2015-12-22', u'2015-12-21', u'2015-12-18', u'2015-12-17',
       u'2015-12-16', u'2015-12-15', u'2015-12-14', u'2015-12-11',
       u'2015-12-10', u'2015-12-09', u'2015-12-08', u'2015-12-07',
       u'2015-12-04', u'2015-12-03', u'2015-12-02', u'2015-12-01',
       u'2015-11-30', u'2015-11-27', u'2015-11-26', u'2015-11-25',
       u'2015-11-24', u'2015-11-23', u'2015-11-20', u'2015-11-19',
       u'2015-11-18', u'2015-11-17', u'2015-11-16', u'2015-11-13',
       u'2015-11-12', u'2015-11-11', u'2015-11-10', u'2015-11-09',
       u'2015-11-06', u'2015-11-05', u'2015-11-04', u'2015-11-03',
       u'2015-11-02', u'2015-10-30', u'2015-10-29', u'2015-10-28',
       u'2015-10-27', u'2015-10-26', u'2015-10-23', u'2015-10-22',
       u'2015-10-21', u'2015-10-20', u'2015-10-19', u'2015-10-16',
       u'2015-10-15', u'2015-10-14', u'2015-10-13', u'2015-10-12',
       u'2015-10-09', u'2015-10-08', u'2015-10-07', u'2015-10-06',
       u'2015-10-05', u'2015-10-02', u'2015-10-01'],
      dtype='object', name=u'Date')

df = pd.read_csv('data/002001.csv', index_col='Date', parse_dates=True)
df.index

[Out:]
DatetimeIndex(['2015-12-22', '2015-12-21', '2015-12-18', '2015-12-17',
               '2015-12-16', '2015-12-15', '2015-12-14', '2015-12-11',
               '2015-12-10', '2015-12-09', '2015-12-08', '2015-12-07',
               '2015-12-04', '2015-12-03', '2015-12-02', '2015-12-01',
               '2015-11-30', '2015-11-27', '2015-11-26', '2015-11-25',
               '2015-11-24', '2015-11-23', '2015-11-20', '2015-11-19',
               '2015-11-18', '2015-11-17', '2015-11-16', '2015-11-13',
               '2015-11-12', '2015-11-11', '2015-11-10', '2015-11-09',
               '2015-11-06', '2015-11-05', '2015-11-04', '2015-11-03',
               '2015-11-02', '2015-10-30', '2015-10-29', '2015-10-28',
               '2015-10-27', '2015-10-26', '2015-10-23', '2015-10-22',
               '2015-10-21', '2015-10-20', '2015-10-19', '2015-10-16',
               '2015-10-15', '2015-10-14', '2015-10-13', '2015-10-12',
               '2015-10-09', '2015-10-08', '2015-10-07', '2015-10-06',
               '2015-10-05', '2015-10-02', '2015-10-01'],
              dtype='datetime64[ns]', name=u'Date', freq=None)

wdf = df['Adj Close'].resample('W-FRI', how='ohlc')
wdf

[Out:]
			open	high	low	close
Date				
2015-10-02	13.41	13.41	13.41	13.41
2015-10-09	13.41	14.75	13.41	14.62
2015-10-16	15.30	15.30	14.73	15.25
2015-10-23	15.03	15.22	14.26	15.20
2015-10-30	15.18	15.30	15.02	15.22
2015-11-06	14.74	15.86	14.62	15.86
2015-11-13	16.02	16.59	15.95	15.95
2015-11-20	16.21	16.22	15.75	16.08
2015-11-27	16.05	16.94	15.54	15.54
2015-12-04	15.70	16.62	15.70	16.62
2015-12-11	16.63	16.63	15.56	15.62
2015-12-18	16.06	16.60	16.06	16.31
2015-12-25	16.85	16.95	16.85	16.95

wdf['Volume'] = df['Volume'].resample('W-FRI', how='sum')
wdf

[Out:]
			open	high	low	close	Volume
Date					
2015-10-02	13.41	13.41	13.41	13.41	0
2015-10-09	13.41	14.75	13.41	14.62	42135700
2015-10-16	15.30	15.30	14.73	15.25	73234300
2015-10-23	15.03	15.22	14.26	15.20	69848500
2015-10-30	15.18	15.30	15.02	15.22	64253700
2015-11-06	14.74	15.86	14.62	15.86	67429500
2015-11-13	16.02	16.59	15.95	15.95	87379300
2015-11-20	16.21	16.22	15.75	16.08	41097000
2015-11-27	16.05	16.94	15.54	15.54	64976300
2015-12-04	15.70	16.62	15.70	16.62	53552100
2015-12-11	16.63	16.63	15.56	15.62	42382600
2015-12-18	16.06	16.60	16.06	16.31	45879500
2015-12-25	16.85	16.95	16.85	16.95	27652100

自定義時間日期解析函數

def date_parser(s):
    s = '2016/' + s
    d = datetime.strptime(s, '%Y/%m/%d')
    return d

df = pd.read_csv('data/custom_date.csv', parse_dates=True, index_col='Date', date_parser=date_parser)
df

[Out:]
Price
Date	
2016-01-01	10.2
2016-01-02	10.4
2016-01-03	10.5
2016-01-04	10.8
2016-01-05	11.2
2016-01-06	10.6

df.index

[Out:]
DatetimeIndex(['2016-01-01', '2016-01-02', '2016-01-03', '2016-01-04',
               '2016-01-05', '2016-01-06'],
              dtype='datetime64[ns]', name=u'Date', freq=None)
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章