python 時間序列預測 —— prophet

原創

2020-03-06 10:22

文章目錄

prophet 安裝

prophet 是facebook 開源的一款時間序列預測工具包，直接用 conda 安裝 fbprophet 即可

prophet 的官網：https://facebook.github.io/prophet/

prophet 中文意思是“先知”

prophet 的輸入一般具有兩列：ds和y

ds(datestamp) 列應爲 Pandas 可以識別的日期格式，日期應爲YYYY-MM-DD，時間戳則應爲YYYY-MM-DD HH:MM:SS

y列必須是數值

數據集下載

Metro Interstate Traffic Volume Data Set

prophet 實戰

導入包

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.metrics import mean_squared_error, mean_absolute_error

%matplotlib inline
plt.rcParams['font.sans-serif'] = 'SimHei'  #顯示中文
plt.rcParams['axes.unicode_minus'] = False  #顯示負號
plt.rcParams['figure.dpi'] = 200
plt.rcParams['text.color'] = 'black'
plt.rcParams['font.size'] = 20
plt.style.use('ggplot')
print(plt.style.available)
# ['bmh', 'classic', 'dark_background', 'fast', 'fivethirtyeight', 'ggplot', 'grayscale', 'seaborn-bright', 'seaborn-colorblind', 'seaborn-dark-palette', 'seaborn-dark', 'seaborn-darkgrid', 'seaborn-deep', 'seaborn-muted', 'seaborn-notebook', 'seaborn-paper', 'seaborn-pastel', 'seaborn-poster', 'seaborn-talk', 'seaborn-ticks', 'seaborn-white', 'seaborn-whitegrid', 'seaborn', 'Solarize_Light2', 'tableau-colorblind10', '_classic_test']

pandas 讀取 csv 數據

csv_files = 'Metro_Interstate_Traffic_Volume.csv'
df = pd.read_csv(csv_files)
df.set_index('date_time',inplace=True)
df.index = pd.to_datetime(df.index)
df.head()

略掃一眼表格內容，主要有假期、氣溫、降雨、降雪、天氣類型等因素，因變量是交通流量traffic_volume

df.info()
'''
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 48204 entries, 2012-10-02 09:00:00 to 2018-09-30 23:00:00
Data columns (total 8 columns):
holiday                48204 non-null object
temp                   48204 non-null float64
rain_1h                48204 non-null float64
snow_1h                48204 non-null float64
clouds_all             48204 non-null int64
weather_main           48204 non-null object
weather_description    48204 non-null object
traffic_volume         48204 non-null int64
dtypes: float64(3), int64(2), object(3)
memory usage: 3.3+ MB
'''

df.describe()

畫個圖

原來少了一點數據，不過影響不大

traffic = df[['traffic_volume']]
traffic[:].plot(style='--', figsize=(15,5), title='traffic_volume')
plt.show()

拆分數據集

知識點：pandas 中篩選日期

traffic_train = traffic.loc[(traffic.index >='2017-01') & (traffic.index <= '2018-03')].copy()
traffic_test = traffic.loc[traffic.index > '2018-03'].copy()
_ = traffic_test.rename(columns={'traffic_volume': 'TEST SET'})\
    .join(traffic_train.rename(columns={'traffic_volume': 'TRAINING SET'}),how='outer') \
    .plot(figsize=(20,5), title='traffic_volume', style='.')

因爲是逐小時統計的數據，只選兩年的量就已經夠多了

從日期中拆分特徵

雖然 prophet 不需要我們手工提取特徵，但我們還是可以自己試試

def create_features(df, label=None):
    """
    Creates time series features from datetime index.
    """
    df = df.copy()
    df['date'] = df.index
    df['hour'] = df['date'].dt.hour
    df['dayofweek'] = df['date'].dt.dayofweek
    df['quarter'] = df['date'].dt.quarter
    df['month'] = df['date'].dt.month
    df['year'] = df['date'].dt.year
    df['dayofyear'] = df['date'].dt.dayofyear
    df['dayofmonth'] = df['date'].dt.day
    df['weekofyear'] = df['date'].dt.weekofyear
    
    X = df[['hour','dayofweek','quarter','month','year',
           'dayofyear','dayofmonth','weekofyear']]
    if label:
        y = df[label]
        return X, y
    return X

X, y = create_features(traffic, label='traffic_volume')
features_and_target = pd.concat([X, y], axis=1)
features_and_target.head()

自己體會一下不同特徵對預測變量的影響

sns.pairplot(features_and_target.dropna(),
             hue='hour',
             x_vars=['hour','dayofweek',
                     'year','weekofyear'],
             y_vars='traffic_volume',
             height=5,
             plot_kws={'alpha':0.15, 'linewidth':0}
            )
plt.suptitle('Traffic Volume by Hour, Day of Week, Year and Week of Year')
plt.show()

使用 prophet 訓練和預測

from fbprophet import Prophet

# Setup and train model and fit
model = Prophet()

model.fit(traffic_train.reset_index().rename(columns={'date_time':'ds','traffic_volume':'y'}))

traffic_test_pred = model.predict(df=traffic_test.reset_index() \
                                   .rename(columns={'date_time':'ds'}))

畫出預測結果

f, ax = plt.subplots(1)
f.set_figheight(5)
f.set_figwidth(15)
ax.scatter(traffic_test.index, traffic_test['traffic_volume'], color='r')
fig = model.plot(traffic_test_pred, ax=ax)

造成這種現象是因爲：

訓練數據太多，是模型沒有把握最近趨勢
預測範圍太大，誤差隨時間放大

感興趣的朋友可以自己玩玩

prophet 學到了什麼

從下圖可以看出：

總體趨勢：下行
每週趨勢：工作日流量大、週末流量低
每日趨勢：早晚上下班高峯，所以每天流量基本呈現 M 型曲線

fig = model.plot_components(traffic_test_pred)

放大圖

看看模型對測試集中第一個月的預測情況：

# Plot the forecast with the actuals
f, ax = plt.subplots(1)
f.set_figheight(5)
f.set_figwidth(15)
plt.plot(traffic_test.index, traffic_test['traffic_volume'], color='r')
fig = model.plot(traffic_test_pred, ax=ax)
ax.set_xbound(lower='03-01-2018',
              upper='04-01-2018')
ax.set_ylim(-1000, 8000)
plot = plt.suptitle('Forecast vs Actuals')

是不是有模有樣的 😉

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

python 時間序列預測 —— prophet

文章目錄

prophet 安裝

數據集下載

prophet 實戰

導入包

pandas 讀取 csv 數據

畫個圖

拆分數據集

從日期中拆分特徵

使用 prophet 訓練和預測

prophet 學到了什麼

放大圖

前端使用 Konva 實現可視化設計器（13）- 折線 - 最優路徑應用【思路篇】

求最大李雅普諾夫指數（Largest Lyapunov Exponents，LLE）的 Rosenstein 算法

學習筆記（2):大數據之Hive-基本查詢

敲黑板！數據分析師的基本素養

學習筆記（1):大數據之Hive-Hive安裝配置和簡單命令

學習筆記（1):大數據之Hive-Hive安裝配置和簡單命令

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結