【DCRNN】Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting

論文原文:https://arxiv.org/abs/1707.01926
論文被引:304(06/14/2020)
論文期刊:ICLR 2018



Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting

擴散卷積遞歸神經網絡:數據驅動的流量預測

ABSTRACT

Spatiotemporal forecasting has various applications in neuroscience, climate and transportation domain. Traffic forecasting is one canonical example of such learning task. The task is challenging due to (1) complex spatial dependency on road networks, (2) non-linear temporal dynamics with changing road conditions and (3) inherent difficulty of long-term forecasting. To address these challenges, we propose to model the traffic flow as a diffusion process on a directed graph and introduce Diffusion Convolutional Recurrent Neural Network (DCRNN), a deep learning framework for traffic forecasting that incorporates both spatial and temporal dependency in the traffic flow. Specifically, DCRNN captures the spatial dependency using bidirectional random walks on the graph, and the temporal dependency using the encoder-decoder architecture with scheduled sampling. We evaluate the framework on two real-world large scale road network traffic datasets and observe consistent improvement of 12% - 15% over state-of-the-art baselines. 時空預測在神經科學,氣候和運輸領域具有多種應用。流量預測是此類學習任務的典型示例。由於(1)對道路網絡的複雜空間依賴性,(2)隨路況變化的非線性時間動態變化以及(3)長期預報的固有困難,這項任務具有挑戰性。爲了解決這些挑戰,我們建議將交通流建模爲有向圖上的擴散過程,並引入擴散卷積遞歸神經網絡(DCRNN),這是一種用於交通預測的深度學習框架,在交通流中納入了時空依賴性。具體而言,DCRNN使用圖形上的雙向隨機遊走捕獲空間相關性,並使用具有計劃採樣的編碼器-解碼器體系結構捕獲時間相關性。我們在兩個現實世界的大規模道路網絡交通數據集上評估了該框架,並觀察到與最新基準相比持續改善了12%-15%。

1 INTRODUCTION

Spatiotemporal forecasting is a crucial task for a learning system that operates in a dynamic environment. It has a wide range of applications from autonomous vehicles operations, to energy and smart grid optimization, to logistics and supply chain management. In this paper, we study one important task: traffic forecasting on road networks, the core component of the intelligent transportation systems. The goal of traffic forecasting is to predict the future traffic speeds of a sensor network given historic traffic speeds and the underlying road networks. 時空預測對於在動態環境中運行的學習系統來說是一項至關重要的任務。它具有廣泛的應用,從自動駕駛汽車的操作到能源和智能電網優化,再到物流和供應鏈管理。在本文中,我們研究了一項重要任務:道路交通預測,這是智能交通系統的核心組成部分。交通預測的目的是在歷史交通速度和基礎道路網絡給定的情況下,預測傳感器網絡的未來交通速度。
This task is challenging mainly due to the complex spatiotemporal dependencies and inherent difficulty in the long term forecasting. On the one hand, traffic time series demonstrate strong temporal dynamics. Recurring incidents such as rush hours or accidents can cause nonstationarity, making it difficult to forecast longterm. On the other hand, sensors on the road network contain complex yet unique spatial correlations. Figure 1 illustrates an example. Road 1 and road 2 are correlated, while road 1 and road 3 are not. Although road 1 and road 3 are close in the Euclidean space, they demonstrate very different behaviors. Moreover, the future traffic speed is influenced more by the downstream traffic than the upstream one. This means that the spatial structure in traffic is nonEuclidean and directional. 這項任務之所以具有挑戰性,主要是由於複雜的時空依賴性和長期預報所固有的困難。一方面,交通時間序列顯示出強大的時間動態。高峯時間或事故等反覆發生的事件可能會導致不穩定,從而難以長期預測。另一方面,道路網絡上的傳感器包含複雜而獨特的空間相關性。圖1說明了一個示例。道路1和道路2相關,而道路1和道路3不相關。儘管1號和3號道路在歐幾里得空間中很近,但它們表現出截然不同的行爲。而且,未來的交通速度受下游交通的影響要大於上游交通的影響。這意味着交通中的空間結構是非歐幾里德和定向的。
Traffic forecasting has been studied for decades, falling into two main categories: knowledgedriven approach and data-driven approach. In transportation and operational research, knowledgedriven methods usually apply queuing theory and simulate user behaviors in traffic (Cascetta, 2013). In time series community, data-driven methods such as Auto-Regressive Integrated Moving Average (ARIMA) model and Kalman filtering remain popular (Liu et al., 2011; Lippi et al., 2013). However, simple time series models usually rely on the stationarity assumption, which is often violated by the traffic data. Most recently, deep learning models for traffic forecasting have been developed in Lv et al. (2015); Y u et al. (2017b), but without considering the spatial structure. Wu & Tan (2016) and Ma et al. (2017) model the spatial correlation with Convolutional Neural Networks (CNN), but the spatial structure is in the Euclidean space (e.g., 2D images). Bruna et al. (2014), Defferrard et al. (2016) studied graph convolution, but only for undirected graphs. 流量預測已經研究了數十年,分爲兩大類:知識驅動方法和數據驅動方法。在交通運輸和運籌學中,知識驅動的方法通常採用排隊論並模擬交通中的用戶行爲(Cascetta,2013)。在時間序列社區中,數據驅動方法(例如自迴歸綜合移動平均(ARIMA)模型和卡爾曼濾波)仍然很流行(Liu等人,2011; Lippi等人,2013)。但是,簡單的時間序列模型通常依賴於平穩性假設,交通數據經常違反這種假設。最近,Lv等人開發了用於交通預測的深度學習模型(2015);Yu等(2017b),但未考慮空間結構。 Wu&Tan(2016)和Ma等(2017)使用卷積神經網絡(CNN)對空間相關性進行建模,但空間結構在歐幾里得空間中(例如2D圖像)。布魯納等(2014),Defferrard等(2016年)研究了圖卷積,但僅適用於無向圖。
In this work, we represent the pair-wise spatial correlations between traffic sensors using a directed graph whose nodes are sensors and edge weights denote proximity between the sensor pairs measured by the road network distance. We model the dynamics of the traffic flow as a diffusion process and propose the diffusion convolution operation to capture the spatial dependency. We further propose Diffusion Convolutional Recurrent Neural Network (DCRNN) that integrates diffusion convolution, the sequence to sequence architecture and the scheduled sampling technique. When evaluated on realworld traffic datasets, DCRNN consistently outperforms state-of-the-art traffic forecasting baselines by a large margin. In summary: 在這項工作中,我們使用有向圖表示交通傳感器之間的成對空間關係,該有向圖的節點是傳感器,邊緣權重表示通過路網距離測量的傳感器對之間的接近度。我們將交通流的動力學建模爲擴散過程,並提出擴散卷積操作以捕獲空間依賴性。我們進一步提出了擴散卷積遞歸神經網絡(DCRNN),它集成了擴散卷積,序列到序列的體系結構和調度的採樣技術。在現實交通數據集上進行評估時,DCRNN始終在很大程度上領先於最新的交通預測基線。綜上所述:
• We study the traffic forecasting problem and model the spatial dependency of traffic as a diffusion process on a directed graph. We propose diffusion convolution, which has an intuitive interpretation and can be computed efficiently. • We propose Diffusion Convolutional Recurrent Neural Network (DCRNN), a holistic approach that captures both spatial and temporal dependencies among time series using diffusion convolution and the sequence to sequence learning framework together with scheduled sampling. DCRNN is not limited to transportation and is readily applicable to other spatiotemporal forecasting tasks. • We conducted extensive experiments on two large-scale real-world datasets, and the proposed approach obtains significant improvement over state-of-the-art baseline methods. •我們研究交通預測問題,並在有向圖上將交通的空間依賴性建模爲擴散過程。我們提出了擴散卷積,它具有直觀的解釋並且可以有效地計算。 •我們提出了擴散卷積遞歸神經網絡(DCRNN),這是一種整體方法,它使用擴散卷積和序列到序列學習框架以及計劃的採樣來捕獲時間序列之間的時空依賴性。 DCRNN不僅限於運輸,還可以隨時應用於其他時空預測任務。 •我們在兩個大規模的真實世界數據集上進行了廣泛的實驗,並且所提出的方法相對於最新的基準方法獲得了顯着改進。

在這裏插入圖片描述

Figure 1: Spatial correlation is dominated by road network structure. (1) Traffic speed in road 1 are similar to road 2 as they locate in the same highway. (2) Road 1 and road 3 locate in the opposite directions of the highway. Though close to each other in the Euclidean space, their road network distance is large, and their traffic speeds differ significantly. 圖1:空間相關性主要由路網結構決定。 (1)道路1的行車速度與道路2相似,因爲它們位於同一條高速公路上。 (2)1號公路和3號公路位於高速公路的相反方向。儘管在歐幾里得空間中彼此靠近,但它們的路網距離較大,並且交通速度差異很大。

2 METHODOLOGY

We formalize the learning problem of spatiotemporal traffic forecasting and describe how to model the dependency structures using diffusion convolutional recurrent neural network. 我們將時空交通預測的學習問題形式化,並描述如何使用擴散卷積遞歸神經網絡對依賴結構進行建模。

2.1 TRAFFIC FORECASTING PROBLEM

The goal of traffic forecasting is to predict the future traffic speed given previously observed traffic flow from NN correlated sensors on the road network.We can represent the sensor network as a weighted directed graph G=(ν;ε;W)G = (\nu; \varepsilon;W), where ν\nu is a set of nodes ν=N\mid \nu \mid = N, ε\varepsilon is a set of edges and WRN×NW \in \mathbb{R}^{N \times N} is a weighted adjacency matrix representing the nodes proximity (e.g., a function of their road network distance).Denote the traffic flow observed on GG as a graph signal XRN×PX \in \mathbb{R}^{N \times P}, where PP is the number of features of each node (e.g., velocity, volume). Let X(t)X^{(t)} represent the graph signal observed at time tt, the traffic forecasting problem aims to learn a function h()h(·) that maps TT' historical graph signals to future TT graph signals, given a graph GG: 交通預測的目標是根據道路網絡上 NN 個相關傳感器的先前觀測到的交通流量來預測未來的交通速度。我們可以將傳感器網絡表示爲加權有向圖 G=(ν;ε;W)G=(\nu; \varepsilon; W) ,其中 ν\nu 是一組節點 ν=N\mid \nu \mid = Nε\varepsilon 是一組邊,WRN×NW \in \mathbb {R}^{N \times N} 是表示節點接近度(例如,其路網距離的函數)的加權鄰接矩陣 )。將在 GG上觀察到的流量表示爲圖形信號 XRN×PX \in \mathbb {R}^{N \times P},其中 PP 是每個節點的特徵數(例如,速度, 體積)。 令 X(t)X^{(t)} 代表在時間 tt 觀察到的圖形信號,流量預測問題旨在學習一個函數 h()h(·),該函數將 TT' 歷史圖形信號映射到未來的 TT 給定圖形 GG 的圖形信號:

在這裏插入圖片描述

2.2 SPATIAL DEPENDENCY MODELING

We model the spatial dependency by relating traffic flow to a diffusion process, which explicitly captures the stochastic nature of traffic dynamics.This diffusion process is characterized by a random walk on GG with restart probability α[0,1]α ∈ [0,1], and a state transition matrix DO1 WD^{−1}_O \ W.Here DO=diag(W1)D_O= diag(W1) is the out-degree diagonal matrix, and 1RN1 ∈ \mathbb{R}^N denotes the all one vector.After many time steps, such Markov process converges to a stationary distribution PRN×NP ∈ \mathbb{R}^{N×N} whose iith row Pi,:RNP_{i,:}∈ \mathbb{R}^N represents the likelihood of diffusion from node viVv_i∈ V, hence the proximity w.r.t.w.r.t. the node viv_i.The following Lemma provides a closed form solution for the stationary distribution. 我們通過將交通流與擴散過程相關聯來對空間依賴性進行建模,該擴散過程明確捕獲了交通動力學的隨機性質。此擴散過程的特徵是在GG上隨機行走,並具有重啓概率 α[0,1]α∈[0,1], 和一個狀態轉移矩陣 DO1 WD^{-1}_O \ W。這裏 DO=diag(W1)D_O = diag(W1) 是度數對角矩陣,1RN1∈\mathbb{R}^N 表示所有一個向量在許多時間步長之後,這樣的馬爾可夫過程收斂到平穩分佈 PRN×NP∈ \mathbb{R}^{N×N},其第i列 iiPi,:RNP_{i,:}∈\mathbb{R}^N 表示從節點 viVv_i∈V 擴散的可能性,因此接近度 w.r.t.w.r.t. 是節點 viv_i。以下引理爲平穩分佈提供了一種封閉形式的解決方案。
Lemma 2.1. (Teng et al., 2016) The stationary distribution of the diffusion process can be represented as a weighted combination of infinite random walks on the graph, and be calculated in closed form: 引理2.1(Teng et al,2016)擴散過程的平穩分佈可以表示爲圖上無限隨機遊動的加權組合,並以封閉形式計算:

在這裏插入圖片描述

where k is the diffusion step. In practice, we use a finite K-step truncation of the diffusion process and assign a trainable weight to each step. We also include the reversed direction diffusion process, such that the bidirectional diffusion offers the model more flexibility to capture the influence from both the upstream and the downstream traffic. 其中 kk 是擴散步驟。在實踐中,我們使用擴散過程的有限K步截斷併爲每個步驟分配可訓練的權重。我們還包括反向擴散過程,以便雙向擴散爲模型提供了更大的靈活性,以捕獲上游和下游流量的影響。

Diffusion Convolution

The resulted diffusion convolution operation over a graph signal XRN×PX ∈ \mathbb{R}^{N×P} and a filter fθf_θ is defined as: 在圖形信號 XRN×PX ∈ \mathbb{R}^{N×P} 和濾波器 fθf_θ 上進行的擴散卷積運算定義爲:

在這裏插入圖片描述

where θRK×2θ ∈ \R^{K×2} are the parameters for the filter and DO1WD^{−1}_OW,DI1WTD^{−1}_I W^T represent the transition matrices of the diffusion process and the reverse one, respectively. In general, computing the convolution can be expensive. However, if GG is sparse, Equation 2 can be calculated efficiently using O(K)O(K) recursive sparse-dense matrix multiplication with total time complexity O(Kε)O(N2)O(K \mid \varepsilon \mid) \ll O(N^2). See Appendix B for more detail. 其中θRK×2θ∈\R^{K×2} 是濾波器的參數,而DO1WD^{-1}_OWDI1WTD^{-1}_I W^T 表示擴散過程的過渡矩陣 和相反的。 通常,計算卷積可能很昂貴。 但是,如果 GG 稀疏,則可以使用總時間複雜度O(KεO(N2)O(K \mid \varepsilon \mid)\ll O(N^ 2)O(K)O(K) 遞歸稀疏矩陣乘法來有效地計算方程2。有關更多詳細信息,請參見附錄B。

Diffusion Convolutional Layer

With the convolution operation defined in Equation 2, we can build a diffusion convolutional layer that maps P-dimensional features to Q-dimensional outputs.Denote the parameter tensor as ΘRQ×P×K×2=[θ]q,pΘ ∈ \R^{Q×P×K×2}= [θ]_{q,p}, where Θq,p,:,:RK×2Θ_{q,p,:,:} ∈ \R^{K×2} parameterizes the convolutional filter for the ppth input and the qqth output. The diffusion convolutional layer is thus: 使用公式2中定義的卷積運算,我們可以構建一個將P維特徵映射到Q維輸出的擴散卷積層。將參數張量表示爲ΘRQ×P×K×2=[θ]q,pΘ ∈ \R^{Q×P×K×2}= [θ]_{q,p}, 其中 Θq,p,:,:RK×2Θ_{q,p,:,:} ∈ \R^{K×2} 參數化第p個輸入和第q個輸出的卷積濾波器。因此,擴散卷積層爲:

在這裏插入圖片描述

where XRN×PX ∈ \R^{N×P} is the input, HRN×QH ∈ \R^{N×Q} is the output, {fΘq,p,,:}\{ f_{Θ_{q,p,,:}} \} are the filters and aa is the activation function (e.g., ReLU, Sigmoid). Diffusion convolutional layer learns the representations for graph structured data and we can train it using stochastic gradient based method. 其中 XRN×PX ∈ \R^{N×P} 爲輸入,HRN×QH ∈ \R^{N×Q} 爲輸出,{fΘq,p,,:}\{ f_{Θ_{q,p,,:}} \} 是濾波器,aa 是激活函數(例如ReLU,Sigmoid)。擴散卷積層學習圖結構化數據的表示形式,我們可以使用基於隨機梯度的方法對其進行訓練。

Relation with Spectral Graph Convolution

Diffusion convolution is defined on both directed and undirected graphs. When applied to undirected graphs, we show that many existing graph structured convolutional operations including the popular spectral graph convolution, i.e., ChebNet (Defferrard et al., 2016), can be considered as a special case of diffusion convolution (up to a similarity transformation). Let DD denote the degree matrix, and L=D12(DW)D12L = D^{− \frac{1}{2}} (D − W)D^{− \frac{1}{2}} be the normalized graph Laplacian, the following Proposition demonstrates the connection. 在有向圖和無向圖上都定義了擴散卷積。當將其應用於無向圖時,我們發現許多現有的圖結構化卷積運算,包括流行的頻譜圖卷積,即ChebNet(Defferrard et al,2016),可以看作是擴散卷積的一種特殊情況(直至相似變換))。令 DD 表示度矩陣,L=D12(DW)D12L = D^{− \frac{1}{2}} (D − W)D^{− \frac{1}{2}} 爲歸一化圖拉普拉斯算子,以下命題證明了這種聯繫。
Proposition 2.2. The spectral graph convolution defined as

在這裏插入圖片描述

with eigenvalue decomposition L=ΦΛΦTL = ΦΛΦ^T and F(θ)=0K1θkΛkF(θ) =\sum^{K−1}_0 θ_kΛ^k, is equivalent to graph diffusion convolution up to a similarity transformation, when the graph GG is undirected. 特徵值分解爲 L=ΦΛΦTL = ΦΛΦ^TF(θ)=0K1θkΛkF(θ) =\sum^{K−1}_0 θ_kΛ^k 的情況下,當圖 GG 無向時,等效於圖擴散卷積直至相似變換。

2.3 TEMPORAL DYNAMICS MODELING🎨

We leverage the recurrent neural networks (RNNs) to model the temporal dependency. In particular, we use Gated Recurrent Units (GRU) (Chung et al., 2014), which is a simple yet powerful variant of RNNs. We replace the matrix multiplications in GRU with the diffusion convolution, which leads to our proposed Diffusion Convolutional Gated Recurrent Unit (DCGRU). 我們利用遞歸神經網絡(RNN)對時間依賴性進行建模。特別是,我們使用門控循環單元(GRU)(Chung等,2014),它是RNN的簡單而強大的變體。我們用擴散卷積代替了GRU中的矩陣乘法,這導致了我們提出的擴散卷積門控遞歸單元(DCGRU)

在這裏插入圖片描述

where X(t),H(t)X^{(t)},H^{(t)} denote the input and output of at time tt, r(t)r^{(t)},u(t)u^{(t)} are reset gate and update gate at time tt, respectively. G\star_G denotes the diffusion convolution defined in Equation 2 and Θr,Θu,ΘCΘ_r,Θ_u,Θ_C are parameters for the corresponding filters. Similar to GRU, DCGRU can be used to build recurrent neural network layers and be trained using backpropagation through time. 其中X(t),H(t)X^{(t)},H^{(t)}表示在時間 tt 的輸入和輸出,r(t)r^{(t)},u(t)u^{(t)} 分別是在時間 tt 的復位門和更新門。 G\star_G 表示在等式 2 中定義的擴散卷積,並且 Θr,Θu,ΘCΘ_r,Θ_u,Θ_C 是對應濾波器的參數。與GRU相似,DCGRU可用於構建遞歸神經網絡層,並使用反向傳播進行訓練。
In multiple step ahead forecasting, we employ the Sequence to Sequence architecture (Sutskever et al., 2014). Both the encoder and the decoder are recurrent neural networks with DCGRU. During training, we feed the historical time series into the encoder and use its final states to initialize the decoder. The decoder generates predictions given previous ground truth observations. At testing time, ground truth observations are replaced by predictions generated by the model itself. The discrepancy between the input distributions of training and testing can cause degraded performance. To mitigate this issue, we integrate scheduled sampling (Bengio et al., 2015) into the model, where we feed the model with either the ground truth observation with probability ϵi\epsilon_i or the prediction by the model with probability 1ϵi1- \epsilon_i at the iith iteration. During the training process, ϵi\epsilon_i gradually decreases to 0 to allow the model to learn the testing distribution. 在多步預測中,我們採用了序列到序列的架構(Sutskever等,2014)。編碼器和解碼器都是具有DCGRU的遞歸神經網絡。在訓練過程中,我們將歷史時間序列輸入編碼器,並使用其最終狀態初始化解碼器。解碼器根據先前的地面實況觀測值生成預測。在測試時,地面真相觀測將由模型本身生成的預測代替。培訓和測試的輸入分佈之間的差異會導致性能下降。爲了緩解這個問題,我們將計劃抽樣(Bengio等人,2015)集成到模型中,在模型中,我們將模型的概率 ϵi\epsilon_i 爲地面實況觀測值,或者將模型的預測概率爲 1ϵi1-\epsilon_i ,並通過第 ii 次迭代。在訓練過程中, ϵi\epsilon_i 逐漸減少到0,以允許模型學習測試分佈。
With both spatial and temporal modeling, we build a Diffusion Convolutional Recurrent Neural Network (DCRNN). The model architecture of DCRNN is shown in Figure 2. The entire network is trained by maximizing the likelihood of generating the target future time series using backpropagation through time. DCRNN is able to capture spatiotemporal dependencies among time series and can be applied to various spatiotemporal forecasting problems. 通過時空建模,我們構建了擴散卷積遞歸神經網絡(DCRNN)。 DCRNN的模型架構如圖2所示。通過最大程度地利用時間反向傳播來生成目標未來時間序列的可能性,對整個網絡進行了訓練。 DCRNN能夠捕獲時間序列之間的時空相關性,並可應用於各種時空預測問題。

在這裏插入圖片描述

Figure 2: System architecture for the Diffusion Convolutional Recurrent Neural Network designed for spatiotemporal traffic forecasting. The historical time series are fed into an encoder whose final states are used to initialize the decoder. The decoder makes predictions based on either previous ground truth or the model output. 圖2:用於時空流量預測的擴散卷積遞歸神經網絡的系統架構。歷史時間序列被饋入編碼器,其最終狀態用於初始化解碼器。解碼器根據先前的地面真實情況或模型輸出進行預測。

3 RELATED WORK

Traffic forecasting is a classic problem in transportation and operational research which are primarily based on queuing theory and simulations (Drew, 1968). Data-driven approaches for traffic forecasting have received considerable attention, and more details can be found in a recent survey paper (Vlahogianni et al., 2014) and the references therein. However, existing machine learning models either impose strong stationary assumptions on the data (e.g., auto-regressive model) or fail to account for highly non-linear temporal dependency (e.g., latent space model Y u et al. (2016); Deng et al. (2016)). Deep learning models deliver new promise for time series forecasting problem. For example, in Y u et al. (2017b); Laptev et al. (2017), the authors study time series forecasting using deep Recurrent Neural Networks (RNN). Convolutional Neural Networks (CNN) have also been applied to traffic forecasting. Zhang et al. (2016; 2017) convert the road network to a regular 2-D grid and apply traditional CNN to predict crowd flow. Cheng et al. (2017) propose DeepTransport which models the spatial dependency by explicitly collecting upstream and downstream neighborhood roads for each individual road and then conduct convolution on these neighborhoods respectively. 交通預測是交通和運籌學中的一個經典問題,主要基於排隊論和模擬(Drew,1968)。數據驅動的交通預測方法已受到相當多的關注,更多詳細信息可以在最新的調查論文(Vlahogianni等,2014)及其參考文獻中找到。然而,現有的機器學習模型要麼對數據強加了平穩的假設(例如自迴歸模型),要麼無法解決高度非線性的時間依賴性(例如潛伏空間模型Yu et al。(2016); Deng等(2016))。深度學習模型爲時間序列預測問題提供了新的希望。例如,在Y u等人中。 (2017b); Laptev等。 (2017),作者研究了使用深度遞歸神經網絡(RNN)的時間序列預測。卷積神經網絡(CNN)也已應用於流量預測。張等。 (2016; 2017)將道路網絡轉換爲規則的2D網格,並應用傳統的CNN預測人羣流量。程等。 (2017)提出了DeepTransport,該模型通過明確收集每條道路的上游和下游鄰里道路,然後分別在這些鄰里進行卷積來對空間依賴性進行建模。
Recently, CNN has been generalized to arbitrary graphs based on the spectral graph theory. Graph convolutional neural networks (GCN) are first introduced in Bruna et al. (2014), which bridges the spectral graph theory and deep neural networks. Defferrard et al. (2016) propose ChebNet which improves GCN with fast localized convolutions filters. Kipf & Welling (2017) simplify ChebNet and achieve state-of-the-art performance in semi-supervised classification tasks. Seo et al. (2016) combine ChebNet with Recurrent Neural Networks (RNN) for structured sequence modeling. Yu et al. (2017a) model the sensor network as a undirected graph and applied ChebNet and convolutional sequence model (Gehring et al., 2017) to do forecasting. One limitation of the mentioned spectral based convolutions is that they generally require the graph to be undirected to calculate meaningful spectral decomposition. Going from spectral domain to vertex domain, Atwood & Towsley (2016) propose diffusion-convolutional neural network (DCNN) which defines convolution as a diffusion process across each node in a graph-structured input. Hechtlinger et al. (2017) propose GraphCNN to generalize convolution to graph by convolving every node with its p nearest neighbors. However, both these methods do not consider the temporal dynamics and mainly deal with static graph settings. 最近,基於頻譜圖理論,CNN已被普遍化爲任意圖。圖卷積神經網絡(GCN)最早是在Bruna等人中引入的(2014),將光譜圖理論與深度神經網絡聯繫起來。 Defferrard等(2016年)提出了ChebNet,它通過快速局部卷積濾波器改善了GCN。 Kipf&Welling(2017)簡化了ChebNet,並在半監督分類任務中實現了最先進的性能。Xu等(2016年)結合ChebNet與遞歸神經網絡(RNN)進行結構化序列建模。Yu等(2017a)將傳感器網絡建模爲無向圖,並應用ChebNet和卷積序列模型(Gehring等人,2017)進行預測。提到的基於頻譜的卷積的一個侷限性在於,它們通常要求圖形是無向的,以計算有意義的頻譜分解。從頻譜域到頂點域,Atwood&Towsley(2016)提出了擴散卷積神經網絡(DCNN),該網絡將卷積定義爲圖結構輸入中每個節點上的擴散過程。 Hechtlinger等(2017)提出GraphCNN通過將每個節點與其p個最近鄰居進行卷積來將卷積泛化爲圖。但是,這兩種方法都沒有考慮時間動態,而是主要處理靜態圖形設置。
Our approach is different from all those methods due to both the problem settings and the formulation of the convolution on the graph. We model the sensor network as a weighted directed graph which is more realistic than grid or undirected graph. Besides, the proposed convolution is defined using bidirectional graph random walk and is further integrated with the sequence to sequence learning framework as well as the scheduled sampling to model the long-term temporal dependency. 我們的方法與所有這些方法都不同,這是因爲問題設置和圖形上的卷積公式都如此。我們將傳感器網絡建模爲加權有向圖,它比網格或無向圖更真實。此外,所提出的卷積是使用雙向圖隨機遊動定義的,並且進一步與序列到序列學習框架以及計劃的採樣進行集成,以對長期時間依賴性進行建模。
Table 1: Performance comparison of different approaches for traffic speed forecasting. DCRNN achieves the best performance with all three metrics for all forecasting horizons, and the advantage becomes more evident with the increase of the forecasting horizon. 表1:各種交通速度預測方法的性能比較。 DCRNN在所有預測範圍內的所有三個指標下均達到最佳性能,並且隨着預測範圍的增加,優勢變得更加明顯。

在這裏插入圖片描述

4 EXPERIMENTS

We conduct experiments on two real-world large-scale datasets: (1) METR-LA This traffic dataset contains traffic information collected from loop detectors in the highway of Los Angeles County (Jagadish et al., 2014). We select 207 sensors and collect 4 months of data ranging from Mar 1st 2012 to Jun 30th 2012 for the experiment. (2) PEMS-BA Y This traffic dataset is collected by California Transportation Agencies (CalTrans) Performance Measurement System (PeMS). We select 325 sensors in the Bay Area and collect 6 months of data ranging from Jan 1st 2017 to May 31th 2017 for the experiment. The sensor distributions of both datasets are visualized in Figure 8 in the Appendix. 我們在兩個現實世界的大規模數據集上進行了實驗:(1)METR-LA此交通數據集包含從洛杉磯縣高速公路上的環路檢測器收集的交通信息(Jagadish等,2014)。我們選擇了207個傳感器,並收集了從2012年3月1日到2012年6月30日的4個月的數據進行實驗。 (2)PEMS-BA Y該交通數據集由加利福尼亞州運輸機構(CalTrans)績效評估系統(PeMS)收集。我們在灣區選擇了325個傳感器,並收集了從2017年1月1日到2017年5月31日的6個月數據進行實驗。附錄中的圖8中顯示了兩個數據集的傳感器分佈。
In both of those datasets, we aggregate traffic speed readings into 5 minutes windows, and apply Z-Score normalization. 70% of data is used for training, 20% are used for testing while the remaining 10% for validation. To construct the sensor graph, we compute the pairwise road network distances between sensors and build the adjacency matrix using thresholded Gaussian kernel (Shuman et al., 2013). 在這兩個數據集中,我們將流量速度讀數彙總到5分鐘的窗口中,然後應用Z-Score歸一化。 70%的數據用於培訓,20%的數據用於測試,其餘10%的數據用於驗證。爲了構造傳感器圖,我們計算傳感器之間的成對道路網絡距離,並使用帶閾值的高斯核建立鄰接矩陣(Shuman等,2013)。
Wij=exp(dist(vi,vj)2σ2)W_{ij} = exp(− \frac{dist(v_i,v_j)^2}{σ^2}) if dist(vi,vj)κdist(v_i, v_j) ≤ κ, otherwise 00, where WijW_{ij} represents the edge weight between sensor viv_i and sensor vjv_j, dist(vi,vj)dist(v_i, v_j) denotes the road network distance from sensor viv_i to sensor vjv_j. σσ is the standard deviation of distances and κκ is the threshold. 如果 dist(vi,vj)κdist(v_i, v_j) ≤ κ,否則爲0,其中 WijW_{ij} 代表傳感器通孔傳感器 vjv_j 之間的邊緣權重,dist(vi,vj)dist(v_i, v_j) 表示距傳感器 viv_i 到傳感器 vjv_j 的路網距離。 σσ 是距離的標準差,而 κκ 是閾值。
Figure 3: Learning curve for DCRNN and DCRNN without diffusion convolution. Removing diffusion convolution results in much higher validation error. Moreover, DCRNN with bidirectional random walk achieves the lowest validation error. Figure 4: Effects of K and the number of units in each layer of DCRNN. K corresponds to the reception field width of the filter, and the number of units corresponds to the number of filters.
圖3:沒有擴散卷積的DCRNN和DCRNN的學習曲線。消除擴散卷積會導致更高的驗證誤差。此外,具有雙向隨機遊走的DCRNN可以實現最低的驗證誤差。 圖4:K和DCRNN每一層中單位數的影響。 K對應於濾波器的接收場寬度,並且單位數量對應於濾波器的數量。

在這裏插入圖片描述

4.1 EXPERIMENTAL SETTINGS

Baselines We compare DCRNN1with widely used time series regression models, including (1) HA: Historical Average, which models the traffic flow as a seasonal process, and uses weighted average of previous seasons as the prediction; (2) ARIMAkal: Auto-Regressive Integrated Moving Average model with Kalman filter which is widely used in time series prediction; (3) VAR: Vector Auto-Regression (Hamilton, 1994). (4) SVR: Support V ector Regression which uses linear support vector machine for the regression task; The following deep neural network based approaches are also included: (5) Feed forward Neural network (FNN): Feed forward neural network with two hidden layers and L2 regularization. (6) Recurrent Neural Network with fully connected LSTM hidden units (FC-LSTM) (Sutskever et al., 2014). 基線我們將DCRNN1與廣泛使用的時間序列迴歸模型進行比較,其中包括:(1)HA:歷史平均值,該模型將交通流量建模爲一個季節性過程,並使用先前季節的加權平均值作爲預測; (2)ARIMAkal:帶有卡爾曼濾波器的自迴歸綜合移動平均模型,廣泛用於時間序列預測; (3)VAR:向量自迴歸(Hamilton,1994)。 (4)SVR:支持向量迴歸,使用線性支持向量機進行迴歸任務;還包括以下基於深度神經網絡的方法:(5)前饋神經網絡(FNN):具有兩個隱藏層和L2正則化的前饋神經網絡。 (6)具有完全連接的LSTM隱藏單元(FC-LSTM)的遞歸神經網絡(Sutskever et al。,2014)。
All neural network based approaches are implemented using Tensorflow (Abadi et al., 2016), and trained using the Adam optimizer with learning rate annealing. The best hyperparameters are chosen using the Tree-structured Parzen Estimator (TPE) (Bergstra et al., 2011) on the validation dataset. Detailed parameter settings for DCRNN as well as baselines are available in Appendix E. 所有基於神經網絡的方法均使用Tensorflow(Abadi等人,2016)實施,並使用具有學習速率退火功能的Adam優化器進行訓練。在驗證數據集上使用樹結構的Parzen估計器(TPE)(Bergstra et al,2011)選擇最佳超參數。附錄E中提供了DCRNN的詳細參數設置以及基線。

4.2 TRAFFIC FORECASTING PERFORMANCE COMPARISON

Table 1 shows the comparison of different approaches for 15 minutes, 30 minutes and 1 hour ahead forecasting on both datasets. These methods are evaluated based on three commonly used metrics in traffic forecasting, including (1) Mean Absolute Error (MAE), (2) Mean Absolute Percentage Error (MAPE), and (3) Root Mean Squared Error (RMSE). Missing values are excluded in calculating these metrics. Detailed formulations of these metrics are provided in Appendix E.2. We observe the following phenomenon in both of these datasets. (1) RNN-based methods, including FC-LSTM and DCRNN, generally outperform other baselines which emphasizes the importance of modeling the temporal dependency. (2) DCRNN achieves the best performance regarding all the metrics for all forecasting horizons, which suggests the effectiveness of spatiotemporal dependency modeling. (3) Deep neural network based methods including FNN, FC-LSTM and DCRNN, tend to have better performance than linear baselines for long-term forecasting, e.g., 1 hour ahead. This is because the temporal dependency becomes increasingly non-linear with the growth of the horizon. Besides, as the historical average method does not depend on short-term data, its performance is invariant to the small increases in the forecasting horizon. 表1顯示了在兩種數據集上分別提前15分鐘,30分鐘和1小時進行預測的不同方法的比較。這些方法是根據流量預測中的三個常用指標進行評估的,包括(1)平均絕對誤差(MAE),(2)平均絕對百分比誤差(MAPE)和(3)均方根誤差(RMSE)。在計算這些指標時,將排除缺失值。附錄E.2中提供了這些指標的詳細公式。我們在這兩個數據集中都觀察到以下現象。 (1)基於RNN的方法(包括FC-LSTM和DCRNN)通常勝過其他基線,這強調了對時間依賴性建模的重要性。 (2)DCRNN在所有預測範圍的所有指標上均達到最佳性能,這表明時空依賴建模的有效性。 (3)基於長期神經網絡的方法(包括FNN,FC-LSTM和DCRNN)在長期預報(例如提前1小時)方面往往比線性基線具有更好的性能。這是因爲時間依賴性隨着水平的增長變得越來越非線性。此外,由於歷史平均法不依賴於短期數據,因此其性能對於預測範圍的小幅增長是不變的。
Note that, traffic forecasting on the METR-LA (Los Angeles, which is known for its complicated traffic conditions) dataset is more challenging than that in the PEMS-BAY (Bay Area) dataset. Thus we use METR-LA as the default dataset for following experiments. 請注意,在METR-LA(以複雜的交通狀況而聞名的洛杉磯)數據集上進行流量預測比在PEMS-BAY(灣區)數據集中進行流量預測更具挑戰性。因此,我們使用METR-LA作爲後續實驗的默認數據集。

4.3 EFFECT OF SPATIAL DEPENDENCY MODELING

To further investigate the effect of spatial dependency modeling, we compare DCRNN with the following variants: (1) DCRNN-NoConv, which ignores spatial dependency by replacing the transition matrices in the diffusion convolution (Equation 2) with identity matrices. This essentially means the forecasting of a sensor can be only be inferred from its own historical readings; (2) DCRNN-UniConv,which only uses the forward random walk transition matrix for diffusion convolution; Figure 3 shows the learning curves of these three models with roughly the same number of parameters. Without diffusion convolution, DCRNN-NoConv has much higher validation error. Moreover, DCRNN achieves the lowest validation error which shows the effectiveness of using bidirectional random walk. The intuition is that the bidirectional random walk gives the model the ability and flexibility to capture the influence from both the upstream and the downstream traffic. 爲了進一步研究空間​​依賴性建模的效果,我們將DCRNN與以下變體進行了比較:(1)DCRNN-NoConv,它通過將擴散卷積中的過渡矩陣(等式2)替換爲恆等矩陣而忽略了空間依賴性。從本質上講,這隻能根據傳感器的歷史讀數來推斷傳感器的預測。 (2)DCRNN-UniConv,它僅使用前向隨機遊走過渡矩陣進行擴散卷積;圖3顯示了這三個模型的學習曲線,它們的參數數量大致相同。如果沒有擴散卷積,DCRNN-NoConv的驗證誤差會更高。而且,DCRNN實現了最低的驗證誤差,這表明了使用雙向隨機遊走的有效性。直覺是雙向隨機遊走爲模型提供了捕獲上游和下游流量影響的能力和靈活性。
which only uses the forward random walk transition matrix for diffusion convolution; Figure 3 shows the learning curves of these three models with roughly the same number of parameters. Without diffusion convolution, DCRNN-NoConv has much higher validation error. Moreover, DCRNN achieves the lowest validation error which shows the effectiveness of using bidirectional random walk. The intuition is that the bidirectional random walk gives the model the ability and flexibility to capture the influence from both the upstream and the downstream traffic. 它僅使用前向隨機遊走過渡矩陣進行擴散卷積;圖3顯示了這三個模型的學習曲線,它們的參數數量大致相同。如果沒有擴散卷積,DCRNN-NoConv的驗證誤差會更高。而且,DCRNN實現了最低的驗證誤差,這表明了使用雙向隨機遊走的有效性。直覺是雙向隨機遊走爲模型提供了捕獲上游和下游流量影響的能力和靈活性。
To investigate the effect of graph construction, we construct a undirected graph by settingc Wij^=max(Wij,Wji)\hat{W_{ij}}= max(W_{ij}, W_{ji}), where W^\hat{W} is the new symmetric weight matrix. Then we develop a variant of DCRNN denotes GCRNN, which uses the sequence to sequence learning with ChebNet graph convolution (Equation 5) with roughly the same amount of parameters. Table 2 shows the comparison between DCRNN and GCRNN in the METR-LA dataset. DCRNN consistently outperforms GCRNN. The intuition is that directed graph better captures the asymmetric correlation between traffic sensors. Figure 4 shows the effects of different parameters. KK roughly corresponds to the size of filters’ reception fields while the number of units corresponds to the number of filters. Larger KK enables the model to capture broader spatial dependency at the cost of increasing learning complexity. We observe that with the increase of KK, the error on the validation dataset first quickly decrease, and then slightly increase. Similar behavior is observed for varying the number of units. 爲了研究圖構造的效果,我們通過設置 Wij^=max(Wij,Wji)\hat{W_{ij}}= max(W_{ij}, W_{ji}), where W^\hat{W} 來構造無向圖,其中 W^\hat{W} 是新的對稱權重矩陣。然後,我們開發出一個表示GCRNN的DCRNN變體,該變體使用該序列對帶有大致相同數量參數的ChebNet圖卷積(等式5)進行序列學習。表2顯示了METR-LA數據集中DCRNN和GCRNN之間的比較。 DCRNN始終優於GCRNN。直覺是有向圖可以更好地捕獲交通傳感器之間的不對稱相關性。圖4顯示了不同參數的影響。 KK 大致對應於過濾器接收字段的大小,而單位數則對應於過濾器的數量。較大的 KK 使模型能夠以增加學習複雜性爲代價捕獲更廣泛的空間依賴性。我們觀察到,隨着 KK 的增加,驗證數據集上的誤差首先迅速減小,然後略有增加。對於改變單元數量,觀察到類似的行爲。

在這裏插入圖片描述
在這裏插入圖片描述
圖5:不同DCRNN變體的性能比較。 DCRNN具有序列到序列框架和計劃的採樣,可在驗證數據集上實現最低的MAE。隨着預測範圍的增加,優勢變得更加明顯。
在這裏插入圖片描述
圖6:交通時間序列預測可視化。 DCRNN可以生成平穩的預測,並且通常在預測高峯時間的開始和結束方面更好。


4.4 EFFECT OF TEMPORAL DEPENDENCY MODELING

To evaluate the effect of temporal modeling including the sequence to sequence framework as well as the scheduled sampling mechanism, we further design three variants of DCRNN: (1) DCNN: in which we concatenate the historical observations as a fixed length vector and feed it into stacked diffusion convolutional layers to predict the future time series. We train a single model for one step ahead prediction, and feed the previous prediction into the model as input to perform multiple steps ahead prediction. (2) DCRNN-SEQ: which uses the encoder-decoder sequence to sequence learning framework to perform multiple steps ahead forecasting. (3) DCRNN: similar to DCRNN-SEQ except for adding scheduled sampling. 爲了評估時間建模的效果(包括從序列到序列的框架以及計劃的採樣機制),我們進一步設計了DCRNN的三個變體:(1)DCNN:其中,我們將歷史觀測值連接爲固定長度的向量,並將其輸入堆疊的擴散卷積層來預測未來的時間序列。我們訓練一個模型進行提前一步的預測,然後將先前的預測作爲輸入提供給模型,以執行提前進行多步預測。 (2)DCRNN-SEQ:使用編碼器-解碼器序列對序列學習框架進行提前預測的多個步驟。 (3)DCRNN:類似於DCRNN-SEQ,只是增加了調度採樣。
Figure 5 shows the comparison of those four methods with regards to MAE for different forecasting horizons. We observe that: (1) DCRNN-SEQ outperforms DCNN by a large margin which conforms the importance of modeling temporal dependency. (2) DCRNN achieves the best result, and its superiority becomes more evident with the increase of the forecasting horizon. This is mainly because the model is trained to deal with its mistakes during multiple steps ahead prediction and thus suffers less from the problem of error propagation. We also train a model that always been fed its output as input for multiple steps ahead prediction. However, its performance is much worse than all the three variants which emphasizes the importance of scheduled sampling. 圖5顯示了這四種方法在不同預測水平下MAE的比較。我們觀察到:(1)DCRNN-SEQ在很大程度上優於DCNN,這符合建模時間依賴性的重要性。 (2)DCRNN取得了最好的效果,其優越性隨着預測範圍的增加而更加明顯。這主要是因爲模型經過訓練以在提前進行多步預測的過程中處理其錯誤,因此較少遭受錯誤傳播的困擾。我們還訓練了一個模型,該模型始終將其輸出作爲輸入進行多步預測。但是,它的性能比所有三個變體都差得多,三個變體都強調了計劃採樣的重要性。

4.5 MODEL INTERPRETATION

To better understand the model, we visualize forecasting results as well as learned filters. Figure 6 shows the visualization of 1 hour ahead forecasting. We have the following observations: (1) DCRNN generates smooth prediction of the mean when small oscillation exists in the traffic speeds (Figure 6(a)). This reflects the robustness of the model. (2) DCRNN is more likely to accurately predict abrupt changes in the traffic speed than baseline methods (e.g., FC-LSTM). As shown in Figure 6(b), DCRNN predicts the start and the end of the peak hours. This is because DCRNN captures the spatial dependency, and is able to utilize the speed changes in neighborhood sensors for more accurate forecasting. Figure 7 visualizes examples of learned filters centered at different nodes. The star denotes the center, and colors denote the weights. We can observe that (1) weights are well localized around the center, and (2) the weights diffuse based on road network distance. More visualizations are provided in Appendix F. 爲了更好地理解該模型,我們將預測結果和學習的過濾器可視化。圖6顯示了提前1小時預測的可視化。我們有以下觀察結果:(1)當交通速度中存在小振盪時,DCRNN會生成均值的平滑預測(圖6(a))。這反映了模型的魯棒性。 (2)與基線方法(例如FC-LSTM)相比,DCRNN更可能準確地預測流量的突然變化。如圖6(b)所示,DCRNN預測高峯時間的開始和結束。這是因爲DCRNN捕獲了空間依賴性,並能夠利用鄰域傳感器中的速度變化進行更準確的預測。圖7可視化了以不同節點爲中心的學習過濾器的示例。星號表示中心,顏色表示權重。我們可以觀察到(1)權重很好地定位在中心周圍,並且(2)權重根據路網距離而擴散。附錄F中提供了更多可視化效果。
Figure 7: Visualization of learned localized filters centered at different nodes with K = 3 on the METR-LA dataset. The star denotes the center, and the colors represent the weights. We observe that weights are localized around the center, and diffuse alongside the road network. 圖7:在METR-LA數據集上以K = 3爲中心,以不同節點爲中心的學習的局部濾波器的可視化。星號表示中心,顏色表示權重。我們觀察到,權重分佈在中心附近,並沿着道路網絡擴散。

在這裏插入圖片描述


5 CONCLUSION

In this paper, we formulated the traffic prediction on road network as a spatiotemporal forecasting problem, and proposed the diffusion convolutional recurrent neural network that captures the spatiotemporal dependencies. Specifically, we use bidirectional graph random walk to model spatial dependency and recurrent neural network to capture the temporal dynamics. We further integrated the encoder-decoder architecture and the scheduled sampling technique to improve the performance for long-term forecasting. When evaluated on two large-scale real-world traffic datasets, our approach obtained significantly better prediction than baselines. For future work, we will investigate the following two aspects (1) applying the proposed model to other spatial-temporal forecasting tasks; (2) modeling the spatiotemporal dependency when the underlying graph structure is evolving, e.g., the K nearest neighbor graph for moving objects. 在本文中,我們將道路網絡的交通量預測公式化爲時空預測問題,並提出了捕獲時空相關性的擴散卷積遞歸神經網絡。具體來說,我們使用雙向圖隨機遊走對空間依賴性進行建模,並使用遞歸神經網絡來捕獲時間動態。我們進一步集成了編碼器-解碼器體系結構和計劃的採樣技術,以提高長期預測的性能。在兩個大規模的現實世界交通數據集上進行評估時,我們的方法比基線獲得了明顯更好的預測。對於未來的工作,我們將研究以下兩個方面:(1)將提出的模型應用於其他時空預測任務; (2)在基礎圖結構正在演變時(例如,運動對象的K最近鄰圖)建模時空依賴關係。
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章