【DCRNN】Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting

论文原文:https://arxiv.org/abs/1707.01926
论文被引:304(06/14/2020)
论文期刊:ICLR 2018



Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting

扩散卷积递归神经网络:数据驱动的流量预测

ABSTRACT

Spatiotemporal forecasting has various applications in neuroscience, climate and transportation domain. Traffic forecasting is one canonical example of such learning task. The task is challenging due to (1) complex spatial dependency on road networks, (2) non-linear temporal dynamics with changing road conditions and (3) inherent difficulty of long-term forecasting. To address these challenges, we propose to model the traffic flow as a diffusion process on a directed graph and introduce Diffusion Convolutional Recurrent Neural Network (DCRNN), a deep learning framework for traffic forecasting that incorporates both spatial and temporal dependency in the traffic flow. Specifically, DCRNN captures the spatial dependency using bidirectional random walks on the graph, and the temporal dependency using the encoder-decoder architecture with scheduled sampling. We evaluate the framework on two real-world large scale road network traffic datasets and observe consistent improvement of 12% - 15% over state-of-the-art baselines. 时空预测在神经科学,气候和运输领域具有多种应用。流量预测是此类学习任务的典型示例。由于(1)对道路网络的复杂空间依赖性,(2)随路况变化的非线性时间动态变化以及(3)长期预报的固有困难,这项任务具有挑战性。为了解决这些挑战,我们建议将交通流建模为有向图上的扩散过程,并引入扩散卷积递归神经网络(DCRNN),这是一种用于交通预测的深度学习框架,在交通流中纳入了时空依赖性。具体而言,DCRNN使用图形上的双向随机游走捕获空间相关性,并使用具有计划采样的编码器-解码器体系结构捕获时间相关性。我们在两个现实世界的大规模道路网络交通数据集上评估了该框架,并观察到与最新基准相比持续改善了12%-15%。

1 INTRODUCTION

Spatiotemporal forecasting is a crucial task for a learning system that operates in a dynamic environment. It has a wide range of applications from autonomous vehicles operations, to energy and smart grid optimization, to logistics and supply chain management. In this paper, we study one important task: traffic forecasting on road networks, the core component of the intelligent transportation systems. The goal of traffic forecasting is to predict the future traffic speeds of a sensor network given historic traffic speeds and the underlying road networks. 时空预测对于在动态环境中运行的学习系统来说是一项至关重要的任务。它具有广泛的应用,从自动驾驶汽车的操作到能源和智能电网优化,再到物流和供应链管理。在本文中,我们研究了一项重要任务:道路交通预测,这是智能交通系统的核心组成部分。交通预测的目的是在历史交通速度和基础道路网络给定的情况下,预测传感器网络的未来交通速度。
This task is challenging mainly due to the complex spatiotemporal dependencies and inherent difficulty in the long term forecasting. On the one hand, traffic time series demonstrate strong temporal dynamics. Recurring incidents such as rush hours or accidents can cause nonstationarity, making it difficult to forecast longterm. On the other hand, sensors on the road network contain complex yet unique spatial correlations. Figure 1 illustrates an example. Road 1 and road 2 are correlated, while road 1 and road 3 are not. Although road 1 and road 3 are close in the Euclidean space, they demonstrate very different behaviors. Moreover, the future traffic speed is influenced more by the downstream traffic than the upstream one. This means that the spatial structure in traffic is nonEuclidean and directional. 这项任务之所以具有挑战性,主要是由于复杂的时空依赖性和长期预报所固有的困难。一方面,交通时间序列显示出强大的时间动态。高峰时间或事故等反复发生的事件可能会导致不稳定,从而难以长期预测。另一方面,道路网络上的传感器包含复杂而独特的空间相关性。图1说明了一个示例。道路1和道路2相关,而道路1和道路3不相关。尽管1号和3号道路在欧几里得空间中很近,但它们表现出截然不同的行为。而且,未来的交通速度受下游交通的影响要大于上游交通的影响。这意味着交通中的空间结构是非欧几里德和定向的。
Traffic forecasting has been studied for decades, falling into two main categories: knowledgedriven approach and data-driven approach. In transportation and operational research, knowledgedriven methods usually apply queuing theory and simulate user behaviors in traffic (Cascetta, 2013). In time series community, data-driven methods such as Auto-Regressive Integrated Moving Average (ARIMA) model and Kalman filtering remain popular (Liu et al., 2011; Lippi et al., 2013). However, simple time series models usually rely on the stationarity assumption, which is often violated by the traffic data. Most recently, deep learning models for traffic forecasting have been developed in Lv et al. (2015); Y u et al. (2017b), but without considering the spatial structure. Wu & Tan (2016) and Ma et al. (2017) model the spatial correlation with Convolutional Neural Networks (CNN), but the spatial structure is in the Euclidean space (e.g., 2D images). Bruna et al. (2014), Defferrard et al. (2016) studied graph convolution, but only for undirected graphs. 流量预测已经研究了数十年,分为两大类:知识驱动方法和数据驱动方法。在交通运输和运筹学中,知识驱动的方法通常采用排队论并模拟交通中的用户行为(Cascetta,2013)。在时间序列社区中,数据驱动方法(例如自回归综合移动平均(ARIMA)模型和卡尔曼滤波)仍然很流行(Liu等人,2011; Lippi等人,2013)。但是,简单的时间序列模型通常依赖于平稳性假设,交通数据经常违反这种假设。最近,Lv等人开发了用于交通预测的深度学习模型(2015);Yu等(2017b),但未考虑空间结构。 Wu&Tan(2016)和Ma等(2017)使用卷积神经网络(CNN)对空间相关性进行建模,但空间结构在欧几里得空间中(例如2D图像)。布鲁纳等(2014),Defferrard等(2016年)研究了图卷积,但仅适用于无向图。
In this work, we represent the pair-wise spatial correlations between traffic sensors using a directed graph whose nodes are sensors and edge weights denote proximity between the sensor pairs measured by the road network distance. We model the dynamics of the traffic flow as a diffusion process and propose the diffusion convolution operation to capture the spatial dependency. We further propose Diffusion Convolutional Recurrent Neural Network (DCRNN) that integrates diffusion convolution, the sequence to sequence architecture and the scheduled sampling technique. When evaluated on realworld traffic datasets, DCRNN consistently outperforms state-of-the-art traffic forecasting baselines by a large margin. In summary: 在这项工作中,我们使用有向图表示交通传感器之间的成对空间关系,该有向图的节点是传感器,边缘权重表示通过路网距离测量的传感器对之间的接近度。我们将交通流的动力学建模为扩散过程,并提出扩散卷积操作以捕获空间依赖性。我们进一步提出了扩散卷积递归神经网络(DCRNN),它集成了扩散卷积,序列到序列的体系结构和调度的采样技术。在现实交通数据集上进行评估时,DCRNN始终在很大程度上领先于最新的交通预测基线。综上所述:
• We study the traffic forecasting problem and model the spatial dependency of traffic as a diffusion process on a directed graph. We propose diffusion convolution, which has an intuitive interpretation and can be computed efficiently. • We propose Diffusion Convolutional Recurrent Neural Network (DCRNN), a holistic approach that captures both spatial and temporal dependencies among time series using diffusion convolution and the sequence to sequence learning framework together with scheduled sampling. DCRNN is not limited to transportation and is readily applicable to other spatiotemporal forecasting tasks. • We conducted extensive experiments on two large-scale real-world datasets, and the proposed approach obtains significant improvement over state-of-the-art baseline methods. •我们研究交通预测问题,并在有向图上将交通的空间依赖性建模为扩散过程。我们提出了扩散卷积,它具有直观的解释并且可以有效地计算。 •我们提出了扩散卷积递归神经网络(DCRNN),这是一种整体方法,它使用扩散卷积和序列到序列学习框架以及计划的采样来捕获时间序列之间的时空依赖性。 DCRNN不仅限于运输,还可以随时应用于其他时空预测任务。 •我们在两个大规模的真实世界数据集上进行了广泛的实验,并且所提出的方法相对于最新的基准方法获得了显着改进。

在这里插入图片描述

Figure 1: Spatial correlation is dominated by road network structure. (1) Traffic speed in road 1 are similar to road 2 as they locate in the same highway. (2) Road 1 and road 3 locate in the opposite directions of the highway. Though close to each other in the Euclidean space, their road network distance is large, and their traffic speeds differ significantly. 图1:空间相关性主要由路网结构决定。 (1)道路1的行车速度与道路2相似,因为它们位于同一条高速公路上。 (2)1号公路和3号公路位于高速公路的相反方向。尽管在欧几里得空间中彼此靠近,但它们的路网距离较大,并且交通速度差异很大。

2 METHODOLOGY

We formalize the learning problem of spatiotemporal traffic forecasting and describe how to model the dependency structures using diffusion convolutional recurrent neural network. 我们将时空交通预测的学习问题形式化,并描述如何使用扩散卷积递归神经网络对依赖结构进行建模。

2.1 TRAFFIC FORECASTING PROBLEM

The goal of traffic forecasting is to predict the future traffic speed given previously observed traffic flow from NN correlated sensors on the road network.We can represent the sensor network as a weighted directed graph G=(ν;ε;W)G = (\nu; \varepsilon;W), where ν\nu is a set of nodes ν=N\mid \nu \mid = N, ε\varepsilon is a set of edges and WRN×NW \in \mathbb{R}^{N \times N} is a weighted adjacency matrix representing the nodes proximity (e.g., a function of their road network distance).Denote the traffic flow observed on GG as a graph signal XRN×PX \in \mathbb{R}^{N \times P}, where PP is the number of features of each node (e.g., velocity, volume). Let X(t)X^{(t)} represent the graph signal observed at time tt, the traffic forecasting problem aims to learn a function h()h(·) that maps TT' historical graph signals to future TT graph signals, given a graph GG: 交通预测的目标是根据道路网络上 NN 个相关传感器的先前观测到的交通流量来预测未来的交通速度。我们可以将传感器网络表示为加权有向图 G=(ν;ε;W)G=(\nu; \varepsilon; W) ,其中 ν\nu 是一组节点 ν=N\mid \nu \mid = Nε\varepsilon 是一组边,WRN×NW \in \mathbb {R}^{N \times N} 是表示节点接近度(例如,其路网距离的函数)的加权邻接矩阵 )。将在 GG上观察到的流量表示为图形信号 XRN×PX \in \mathbb {R}^{N \times P},其中 PP 是每个节点的特征数(例如,速度, 体积)。 令 X(t)X^{(t)} 代表在时间 tt 观察到的图形信号,流量预测问题旨在学习一个函数 h()h(·),该函数将 TT' 历史图形信号映射到未来的 TT 给定图形 GG 的图形信号:

在这里插入图片描述

2.2 SPATIAL DEPENDENCY MODELING

We model the spatial dependency by relating traffic flow to a diffusion process, which explicitly captures the stochastic nature of traffic dynamics.This diffusion process is characterized by a random walk on GG with restart probability α[0,1]α ∈ [0,1], and a state transition matrix DO1 WD^{−1}_O \ W.Here DO=diag(W1)D_O= diag(W1) is the out-degree diagonal matrix, and 1RN1 ∈ \mathbb{R}^N denotes the all one vector.After many time steps, such Markov process converges to a stationary distribution PRN×NP ∈ \mathbb{R}^{N×N} whose iith row Pi,:RNP_{i,:}∈ \mathbb{R}^N represents the likelihood of diffusion from node viVv_i∈ V, hence the proximity w.r.t.w.r.t. the node viv_i.The following Lemma provides a closed form solution for the stationary distribution. 我们通过将交通流与扩散过程相关联来对空间依赖性进行建模,该扩散过程明确捕获了交通动力学的随机性质。此扩散过程的特征是在GG上随机行走,并具有重启概率 α[0,1]α∈[0,1], 和一个状态转移矩阵 DO1 WD^{-1}_O \ W。这里 DO=diag(W1)D_O = diag(W1) 是度数对角矩阵,1RN1∈\mathbb{R}^N 表示所有一个向量在许多时间步长之后,这样的马尔可夫过程收敛到平稳分布 PRN×NP∈ \mathbb{R}^{N×N},其第i列 iiPi,:RNP_{i,:}∈\mathbb{R}^N 表示从节点 viVv_i∈V 扩散的可能性,因此接近度 w.r.t.w.r.t. 是节点 viv_i。以下引理为平稳分布提供了一种封闭形式的解决方案。
Lemma 2.1. (Teng et al., 2016) The stationary distribution of the diffusion process can be represented as a weighted combination of infinite random walks on the graph, and be calculated in closed form: 引理2.1(Teng et al,2016)扩散过程的平稳分布可以表示为图上无限随机游动的加权组合,并以封闭形式计算:

在这里插入图片描述

where k is the diffusion step. In practice, we use a finite K-step truncation of the diffusion process and assign a trainable weight to each step. We also include the reversed direction diffusion process, such that the bidirectional diffusion offers the model more flexibility to capture the influence from both the upstream and the downstream traffic. 其中 kk 是扩散步骤。在实践中,我们使用扩散过程的有限K步截断并为每个步骤分配可训练的权重。我们还包括反向扩散过程,以便双向扩散为模型提供了更大的灵活性,以捕获上游和下游流量的影响。

Diffusion Convolution

The resulted diffusion convolution operation over a graph signal XRN×PX ∈ \mathbb{R}^{N×P} and a filter fθf_θ is defined as: 在图形信号 XRN×PX ∈ \mathbb{R}^{N×P} 和滤波器 fθf_θ 上进行的扩散卷积运算定义为:

在这里插入图片描述

where θRK×2θ ∈ \R^{K×2} are the parameters for the filter and DO1WD^{−1}_OW,DI1WTD^{−1}_I W^T represent the transition matrices of the diffusion process and the reverse one, respectively. In general, computing the convolution can be expensive. However, if GG is sparse, Equation 2 can be calculated efficiently using O(K)O(K) recursive sparse-dense matrix multiplication with total time complexity O(Kε)O(N2)O(K \mid \varepsilon \mid) \ll O(N^2). See Appendix B for more detail. 其中θRK×2θ∈\R^{K×2} 是滤波器的参数,而DO1WD^{-1}_OWDI1WTD^{-1}_I W^T 表示扩散过程的过渡矩阵 和相反的。 通常,计算卷积可能很昂贵。 但是,如果 GG 稀疏,则可以使用总时间复杂度O(KεO(N2)O(K \mid \varepsilon \mid)\ll O(N^ 2)O(K)O(K) 递归稀疏矩阵乘法来有效地计算方程2。有关更多详细信息,请参见附录B。

Diffusion Convolutional Layer

With the convolution operation defined in Equation 2, we can build a diffusion convolutional layer that maps P-dimensional features to Q-dimensional outputs.Denote the parameter tensor as ΘRQ×P×K×2=[θ]q,pΘ ∈ \R^{Q×P×K×2}= [θ]_{q,p}, where Θq,p,:,:RK×2Θ_{q,p,:,:} ∈ \R^{K×2} parameterizes the convolutional filter for the ppth input and the qqth output. The diffusion convolutional layer is thus: 使用公式2中定义的卷积运算,我们可以构建一个将P维特征映射到Q维输出的扩散卷积层。将参数张量表示为ΘRQ×P×K×2=[θ]q,pΘ ∈ \R^{Q×P×K×2}= [θ]_{q,p}, 其中 Θq,p,:,:RK×2Θ_{q,p,:,:} ∈ \R^{K×2} 参数化第p个输入和第q个输出的卷积滤波器。因此,扩散卷积层为:

在这里插入图片描述

where XRN×PX ∈ \R^{N×P} is the input, HRN×QH ∈ \R^{N×Q} is the output, {fΘq,p,,:}\{ f_{Θ_{q,p,,:}} \} are the filters and aa is the activation function (e.g., ReLU, Sigmoid). Diffusion convolutional layer learns the representations for graph structured data and we can train it using stochastic gradient based method. 其中 XRN×PX ∈ \R^{N×P} 为输入,HRN×QH ∈ \R^{N×Q} 为输出,{fΘq,p,,:}\{ f_{Θ_{q,p,,:}} \} 是滤波器,aa 是激活函数(例如ReLU,Sigmoid)。扩散卷积层学习图结构化数据的表示形式,我们可以使用基于随机梯度的方法对其进行训练。

Relation with Spectral Graph Convolution

Diffusion convolution is defined on both directed and undirected graphs. When applied to undirected graphs, we show that many existing graph structured convolutional operations including the popular spectral graph convolution, i.e., ChebNet (Defferrard et al., 2016), can be considered as a special case of diffusion convolution (up to a similarity transformation). Let DD denote the degree matrix, and L=D12(DW)D12L = D^{− \frac{1}{2}} (D − W)D^{− \frac{1}{2}} be the normalized graph Laplacian, the following Proposition demonstrates the connection. 在有向图和无向图上都定义了扩散卷积。当将其应用于无向图时,我们发现许多现有的图结构化卷积运算,包括流行的频谱图卷积,即ChebNet(Defferrard et al,2016),可以看作是扩散卷积的一种特殊情况(直至相似变换))。令 DD 表示度矩阵,L=D12(DW)D12L = D^{− \frac{1}{2}} (D − W)D^{− \frac{1}{2}} 为归一化图拉普拉斯算子,以下命题证明了这种联系。
Proposition 2.2. The spectral graph convolution defined as

在这里插入图片描述

with eigenvalue decomposition L=ΦΛΦTL = ΦΛΦ^T and F(θ)=0K1θkΛkF(θ) =\sum^{K−1}_0 θ_kΛ^k, is equivalent to graph diffusion convolution up to a similarity transformation, when the graph GG is undirected. 特征值分解为 L=ΦΛΦTL = ΦΛΦ^TF(θ)=0K1θkΛkF(θ) =\sum^{K−1}_0 θ_kΛ^k 的情况下,当图 GG 无向时,等效于图扩散卷积直至相似变换。

2.3 TEMPORAL DYNAMICS MODELING🎨

We leverage the recurrent neural networks (RNNs) to model the temporal dependency. In particular, we use Gated Recurrent Units (GRU) (Chung et al., 2014), which is a simple yet powerful variant of RNNs. We replace the matrix multiplications in GRU with the diffusion convolution, which leads to our proposed Diffusion Convolutional Gated Recurrent Unit (DCGRU). 我们利用递归神经网络(RNN)对时间依赖性进行建模。特别是,我们使用门控循环单元(GRU)(Chung等,2014),它是RNN的简单而强大的变体。我们用扩散卷积代替了GRU中的矩阵乘法,这导致了我们提出的扩散卷积门控递归单元(DCGRU)

在这里插入图片描述

where X(t),H(t)X^{(t)},H^{(t)} denote the input and output of at time tt, r(t)r^{(t)},u(t)u^{(t)} are reset gate and update gate at time tt, respectively. G\star_G denotes the diffusion convolution defined in Equation 2 and Θr,Θu,ΘCΘ_r,Θ_u,Θ_C are parameters for the corresponding filters. Similar to GRU, DCGRU can be used to build recurrent neural network layers and be trained using backpropagation through time. 其中X(t),H(t)X^{(t)},H^{(t)}表示在时间 tt 的输入和输出,r(t)r^{(t)},u(t)u^{(t)} 分别是在时间 tt 的复位门和更新门。 G\star_G 表示在等式 2 中定义的扩散卷积,并且 Θr,Θu,ΘCΘ_r,Θ_u,Θ_C 是对应滤波器的参数。与GRU相似,DCGRU可用于构建递归神经网络层,并使用反向传播进行训练。
In multiple step ahead forecasting, we employ the Sequence to Sequence architecture (Sutskever et al., 2014). Both the encoder and the decoder are recurrent neural networks with DCGRU. During training, we feed the historical time series into the encoder and use its final states to initialize the decoder. The decoder generates predictions given previous ground truth observations. At testing time, ground truth observations are replaced by predictions generated by the model itself. The discrepancy between the input distributions of training and testing can cause degraded performance. To mitigate this issue, we integrate scheduled sampling (Bengio et al., 2015) into the model, where we feed the model with either the ground truth observation with probability ϵi\epsilon_i or the prediction by the model with probability 1ϵi1- \epsilon_i at the iith iteration. During the training process, ϵi\epsilon_i gradually decreases to 0 to allow the model to learn the testing distribution. 在多步预测中,我们采用了序列到序列的架构(Sutskever等,2014)。编码器和解码器都是具有DCGRU的递归神经网络。在训练过程中,我们将历史时间序列输入编码器,并使用其最终状态初始化解码器。解码器根据先前的地面实况观测值生成预测。在测试时,地面真相观测将由模型本身生成的预测代替。培训和测试的输入分布之间的差异会导致性能下降。为了缓解这个问题,我们将计划抽样(Bengio等人,2015)集成到模型中,在模型中,我们将模型的概率 ϵi\epsilon_i 为地面实况观测值,或者将模型的预测概率为 1ϵi1-\epsilon_i ,并通过第 ii 次迭代。在训练过程中, ϵi\epsilon_i 逐渐减少到0,以允许模型学习测试分布。
With both spatial and temporal modeling, we build a Diffusion Convolutional Recurrent Neural Network (DCRNN). The model architecture of DCRNN is shown in Figure 2. The entire network is trained by maximizing the likelihood of generating the target future time series using backpropagation through time. DCRNN is able to capture spatiotemporal dependencies among time series and can be applied to various spatiotemporal forecasting problems. 通过时空建模,我们构建了扩散卷积递归神经网络(DCRNN)。 DCRNN的模型架构如图2所示。通过最大程度地利用时间反向传播来生成目标未来时间序列的可能性,对整个网络进行了训练。 DCRNN能够捕获时间序列之间的时空相关性,并可应用于各种时空预测问题。

在这里插入图片描述

Figure 2: System architecture for the Diffusion Convolutional Recurrent Neural Network designed for spatiotemporal traffic forecasting. The historical time series are fed into an encoder whose final states are used to initialize the decoder. The decoder makes predictions based on either previous ground truth or the model output. 图2:用于时空流量预测的扩散卷积递归神经网络的系统架构。历史时间序列被馈入编码器,其最终状态用于初始化解码器。解码器根据先前的地面真实情况或模型输出进行预测。

3 RELATED WORK

Traffic forecasting is a classic problem in transportation and operational research which are primarily based on queuing theory and simulations (Drew, 1968). Data-driven approaches for traffic forecasting have received considerable attention, and more details can be found in a recent survey paper (Vlahogianni et al., 2014) and the references therein. However, existing machine learning models either impose strong stationary assumptions on the data (e.g., auto-regressive model) or fail to account for highly non-linear temporal dependency (e.g., latent space model Y u et al. (2016); Deng et al. (2016)). Deep learning models deliver new promise for time series forecasting problem. For example, in Y u et al. (2017b); Laptev et al. (2017), the authors study time series forecasting using deep Recurrent Neural Networks (RNN). Convolutional Neural Networks (CNN) have also been applied to traffic forecasting. Zhang et al. (2016; 2017) convert the road network to a regular 2-D grid and apply traditional CNN to predict crowd flow. Cheng et al. (2017) propose DeepTransport which models the spatial dependency by explicitly collecting upstream and downstream neighborhood roads for each individual road and then conduct convolution on these neighborhoods respectively. 交通预测是交通和运筹学中的一个经典问题,主要基于排队论和模拟(Drew,1968)。数据驱动的交通预测方法已受到相当多的关注,更多详细信息可以在最新的调查论文(Vlahogianni等,2014)及其参考文献中找到。然而,现有的机器学习模型要么对数据强加了平稳的假设(例如自回归模型),要么无法解决高度非线性的时间依赖性(例如潜伏空间模型Yu et al。(2016); Deng等(2016))。深度学习模型为时间序列预测问题提供了新的希望。例如,在Y u等人中。 (2017b); Laptev等。 (2017),作者研究了使用深度递归神经网络(RNN)的时间序列预测。卷积神经网络(CNN)也已应用于流量预测。张等。 (2016; 2017)将道路网络转换为规则的2D网格,并应用传统的CNN预测人群流量。程等。 (2017)提出了DeepTransport,该模型通过明确收集每条道路的上游和下游邻里道路,然后分别在这些邻里进行卷积来对空间依赖性进行建模。
Recently, CNN has been generalized to arbitrary graphs based on the spectral graph theory. Graph convolutional neural networks (GCN) are first introduced in Bruna et al. (2014), which bridges the spectral graph theory and deep neural networks. Defferrard et al. (2016) propose ChebNet which improves GCN with fast localized convolutions filters. Kipf & Welling (2017) simplify ChebNet and achieve state-of-the-art performance in semi-supervised classification tasks. Seo et al. (2016) combine ChebNet with Recurrent Neural Networks (RNN) for structured sequence modeling. Yu et al. (2017a) model the sensor network as a undirected graph and applied ChebNet and convolutional sequence model (Gehring et al., 2017) to do forecasting. One limitation of the mentioned spectral based convolutions is that they generally require the graph to be undirected to calculate meaningful spectral decomposition. Going from spectral domain to vertex domain, Atwood & Towsley (2016) propose diffusion-convolutional neural network (DCNN) which defines convolution as a diffusion process across each node in a graph-structured input. Hechtlinger et al. (2017) propose GraphCNN to generalize convolution to graph by convolving every node with its p nearest neighbors. However, both these methods do not consider the temporal dynamics and mainly deal with static graph settings. 最近,基于频谱图理论,CNN已被普遍化为任意图。图卷积神经网络(GCN)最早是在Bruna等人中引入的(2014),将光谱图理论与深度神经网络联系起来。 Defferrard等(2016年)提出了ChebNet,它通过快速局部卷积滤波器改善了GCN。 Kipf&Welling(2017)简化了ChebNet,并在半监督分类任务中实现了最先进的性能。Xu等(2016年)结合ChebNet与递归神经网络(RNN)进行结构化序列建模。Yu等(2017a)将传感器网络建模为无向图,并应用ChebNet和卷积序列模型(Gehring等人,2017)进行预测。提到的基于频谱的卷积的一个局限性在于,它们通常要求图形是无向的,以计算有意义的频谱分解。从频谱域到顶点域,Atwood&Towsley(2016)提出了扩散卷积神经网络(DCNN),该网络将卷积定义为图结构输入中每个节点上的扩散过程。 Hechtlinger等(2017)提出GraphCNN通过将每个节点与其p个最近邻居进行卷积来将卷积泛化为图。但是,这两种方法都没有考虑时间动态,而是主要处理静态图形设置。
Our approach is different from all those methods due to both the problem settings and the formulation of the convolution on the graph. We model the sensor network as a weighted directed graph which is more realistic than grid or undirected graph. Besides, the proposed convolution is defined using bidirectional graph random walk and is further integrated with the sequence to sequence learning framework as well as the scheduled sampling to model the long-term temporal dependency. 我们的方法与所有这些方法都不同,这是因为问题设置和图形上的卷积公式都如此。我们将传感器网络建模为加权有向图,它比网格或无向图更真实。此外,所提出的卷积是使用双向图随机游动定义的,并且进一步与序列到序列学习框架以及计划的采样进行集成,以对长期时间依赖性进行建模。
Table 1: Performance comparison of different approaches for traffic speed forecasting. DCRNN achieves the best performance with all three metrics for all forecasting horizons, and the advantage becomes more evident with the increase of the forecasting horizon. 表1:各种交通速度预测方法的性能比较。 DCRNN在所有预测范围内的所有三个指标下均达到最佳性能,并且随着预测范围的增加,优势变得更加明显。

在这里插入图片描述

4 EXPERIMENTS

We conduct experiments on two real-world large-scale datasets: (1) METR-LA This traffic dataset contains traffic information collected from loop detectors in the highway of Los Angeles County (Jagadish et al., 2014). We select 207 sensors and collect 4 months of data ranging from Mar 1st 2012 to Jun 30th 2012 for the experiment. (2) PEMS-BA Y This traffic dataset is collected by California Transportation Agencies (CalTrans) Performance Measurement System (PeMS). We select 325 sensors in the Bay Area and collect 6 months of data ranging from Jan 1st 2017 to May 31th 2017 for the experiment. The sensor distributions of both datasets are visualized in Figure 8 in the Appendix. 我们在两个现实世界的大规模数据集上进行了实验:(1)METR-LA此交通数据集包含从洛杉矶县高速公路上的环路检测器收集的交通信息(Jagadish等,2014)。我们选择了207个传感器,并收集了从2012年3月1日到2012年6月30日的4个月的数据进行实验。 (2)PEMS-BA Y该交通数据集由加利福尼亚州运输机构(CalTrans)绩效评估系统(PeMS)收集。我们在湾区选择了325个传感器,并收集了从2017年1月1日到2017年5月31日的6个月数据进行实验。附录中的图8中显示了两个数据集的传感器分布。
In both of those datasets, we aggregate traffic speed readings into 5 minutes windows, and apply Z-Score normalization. 70% of data is used for training, 20% are used for testing while the remaining 10% for validation. To construct the sensor graph, we compute the pairwise road network distances between sensors and build the adjacency matrix using thresholded Gaussian kernel (Shuman et al., 2013). 在这两个数据集中,我们将流量速度读数汇总到5分钟的窗口中,然后应用Z-Score归一化。 70%的数据用于培训,20%的数据用于测试,其余10%的数据用于验证。为了构造传感器图,我们计算传感器之间的成对道路网络距离,并使用带阈值的高斯核建立邻接矩阵(Shuman等,2013)。
Wij=exp(dist(vi,vj)2σ2)W_{ij} = exp(− \frac{dist(v_i,v_j)^2}{σ^2}) if dist(vi,vj)κdist(v_i, v_j) ≤ κ, otherwise 00, where WijW_{ij} represents the edge weight between sensor viv_i and sensor vjv_j, dist(vi,vj)dist(v_i, v_j) denotes the road network distance from sensor viv_i to sensor vjv_j. σσ is the standard deviation of distances and κκ is the threshold. 如果 dist(vi,vj)κdist(v_i, v_j) ≤ κ,否则为0,其中 WijW_{ij} 代表传感器通孔传感器 vjv_j 之间的边缘权重,dist(vi,vj)dist(v_i, v_j) 表示距传感器 viv_i 到传感器 vjv_j 的路网距离。 σσ 是距离的标准差,而 κκ 是阈值。
Figure 3: Learning curve for DCRNN and DCRNN without diffusion convolution. Removing diffusion convolution results in much higher validation error. Moreover, DCRNN with bidirectional random walk achieves the lowest validation error. Figure 4: Effects of K and the number of units in each layer of DCRNN. K corresponds to the reception field width of the filter, and the number of units corresponds to the number of filters.
图3:没有扩散卷积的DCRNN和DCRNN的学习曲线。消除扩散卷积会导致更高的验证误差。此外,具有双向随机游走的DCRNN可以实现最低的验证误差。 图4:K和DCRNN每一层中单位数的影响。 K对应于滤波器的接收场宽度,并且单位数量对应于滤波器的数量。

在这里插入图片描述

4.1 EXPERIMENTAL SETTINGS

Baselines We compare DCRNN1with widely used time series regression models, including (1) HA: Historical Average, which models the traffic flow as a seasonal process, and uses weighted average of previous seasons as the prediction; (2) ARIMAkal: Auto-Regressive Integrated Moving Average model with Kalman filter which is widely used in time series prediction; (3) VAR: Vector Auto-Regression (Hamilton, 1994). (4) SVR: Support V ector Regression which uses linear support vector machine for the regression task; The following deep neural network based approaches are also included: (5) Feed forward Neural network (FNN): Feed forward neural network with two hidden layers and L2 regularization. (6) Recurrent Neural Network with fully connected LSTM hidden units (FC-LSTM) (Sutskever et al., 2014). 基线我们将DCRNN1与广泛使用的时间序列回归模型进行比较,其中包括:(1)HA:历史平均值,该模型将交通流量建模为一个季节性过程,并使用先前季节的加权平均值作为预测; (2)ARIMAkal:带有卡尔曼滤波器的自回归综合移动平均模型,广泛用于时间序列预测; (3)VAR:向量自回归(Hamilton,1994)。 (4)SVR:支持向量回归,使用线性支持向量机进行回归任务;还包括以下基于深度神经网络的方法:(5)前馈神经网络(FNN):具有两个隐藏层和L2正则化的前馈神经网络。 (6)具有完全连接的LSTM隐藏单元(FC-LSTM)的递归神经网络(Sutskever et al。,2014)。
All neural network based approaches are implemented using Tensorflow (Abadi et al., 2016), and trained using the Adam optimizer with learning rate annealing. The best hyperparameters are chosen using the Tree-structured Parzen Estimator (TPE) (Bergstra et al., 2011) on the validation dataset. Detailed parameter settings for DCRNN as well as baselines are available in Appendix E. 所有基于神经网络的方法均使用Tensorflow(Abadi等人,2016)实施,并使用具有学习速率退火功能的Adam优化器进行训练。在验证数据集上使用树结构的Parzen估计器(TPE)(Bergstra et al,2011)选择最佳超参数。附录E中提供了DCRNN的详细参数设置以及基线。

4.2 TRAFFIC FORECASTING PERFORMANCE COMPARISON

Table 1 shows the comparison of different approaches for 15 minutes, 30 minutes and 1 hour ahead forecasting on both datasets. These methods are evaluated based on three commonly used metrics in traffic forecasting, including (1) Mean Absolute Error (MAE), (2) Mean Absolute Percentage Error (MAPE), and (3) Root Mean Squared Error (RMSE). Missing values are excluded in calculating these metrics. Detailed formulations of these metrics are provided in Appendix E.2. We observe the following phenomenon in both of these datasets. (1) RNN-based methods, including FC-LSTM and DCRNN, generally outperform other baselines which emphasizes the importance of modeling the temporal dependency. (2) DCRNN achieves the best performance regarding all the metrics for all forecasting horizons, which suggests the effectiveness of spatiotemporal dependency modeling. (3) Deep neural network based methods including FNN, FC-LSTM and DCRNN, tend to have better performance than linear baselines for long-term forecasting, e.g., 1 hour ahead. This is because the temporal dependency becomes increasingly non-linear with the growth of the horizon. Besides, as the historical average method does not depend on short-term data, its performance is invariant to the small increases in the forecasting horizon. 表1显示了在两种数据集上分别提前15分钟,30分钟和1小时进行预测的不同方法的比较。这些方法是根据流量预测中的三个常用指标进行评估的,包括(1)平均绝对误差(MAE),(2)平均绝对百分比误差(MAPE)和(3)均方根误差(RMSE)。在计算这些指标时,将排除缺失值。附录E.2中提供了这些指标的详细公式。我们在这两个数据集中都观察到以下现象。 (1)基于RNN的方法(包括FC-LSTM和DCRNN)通常胜过其他基线,这强调了对时间依赖性建模的重要性。 (2)DCRNN在所有预测范围的所有指标上均达到最佳性能,这表明时空依赖建模的有效性。 (3)基于长期神经网络的方法(包括FNN,FC-LSTM和DCRNN)在长期预报(例如提前1小时)方面往往比线性基线具有更好的性能。这是因为时间依赖性随着水平的增长变得越来越非线性。此外,由于历史平均法不依赖于短期数据,因此其性能对于预测范围的小幅增长是不变的。
Note that, traffic forecasting on the METR-LA (Los Angeles, which is known for its complicated traffic conditions) dataset is more challenging than that in the PEMS-BAY (Bay Area) dataset. Thus we use METR-LA as the default dataset for following experiments. 请注意,在METR-LA(以复杂的交通状况而闻名的洛杉矶)数据集上进行流量预测比在PEMS-BAY(湾区)数据集中进行流量预测更具挑战性。因此,我们使用METR-LA作为后续实验的默认数据集。

4.3 EFFECT OF SPATIAL DEPENDENCY MODELING

To further investigate the effect of spatial dependency modeling, we compare DCRNN with the following variants: (1) DCRNN-NoConv, which ignores spatial dependency by replacing the transition matrices in the diffusion convolution (Equation 2) with identity matrices. This essentially means the forecasting of a sensor can be only be inferred from its own historical readings; (2) DCRNN-UniConv,which only uses the forward random walk transition matrix for diffusion convolution; Figure 3 shows the learning curves of these three models with roughly the same number of parameters. Without diffusion convolution, DCRNN-NoConv has much higher validation error. Moreover, DCRNN achieves the lowest validation error which shows the effectiveness of using bidirectional random walk. The intuition is that the bidirectional random walk gives the model the ability and flexibility to capture the influence from both the upstream and the downstream traffic. 为了进一步研究空间​​依赖性建模的效果,我们将DCRNN与以下变体进行了比较:(1)DCRNN-NoConv,它通过将扩散卷积中的过渡矩阵(等式2)替换为恒等矩阵而忽略了空间依赖性。从本质上讲,这只能根据传感器的历史读数来推断传感器的预测。 (2)DCRNN-UniConv,它仅使用前向随机游走过渡矩阵进行扩散卷积;图3显示了这三个模型的学习曲线,它们的参数数量大致相同。如果没有扩散卷积,DCRNN-NoConv的验证误差会更高。而且,DCRNN实现了最低的验证误差,这表明了使用双向随机游走的有效性。直觉是双向随机游走为模型提供了捕获上游和下游流量影响的能力和灵活性。
which only uses the forward random walk transition matrix for diffusion convolution; Figure 3 shows the learning curves of these three models with roughly the same number of parameters. Without diffusion convolution, DCRNN-NoConv has much higher validation error. Moreover, DCRNN achieves the lowest validation error which shows the effectiveness of using bidirectional random walk. The intuition is that the bidirectional random walk gives the model the ability and flexibility to capture the influence from both the upstream and the downstream traffic. 它仅使用前向随机游走过渡矩阵进行扩散卷积;图3显示了这三个模型的学习曲线,它们的参数数量大致相同。如果没有扩散卷积,DCRNN-NoConv的验证误差会更高。而且,DCRNN实现了最低的验证误差,这表明了使用双向随机游走的有效性。直觉是双向随机游走为模型提供了捕获上游和下游流量影响的能力和灵活性。
To investigate the effect of graph construction, we construct a undirected graph by settingc Wij^=max(Wij,Wji)\hat{W_{ij}}= max(W_{ij}, W_{ji}), where W^\hat{W} is the new symmetric weight matrix. Then we develop a variant of DCRNN denotes GCRNN, which uses the sequence to sequence learning with ChebNet graph convolution (Equation 5) with roughly the same amount of parameters. Table 2 shows the comparison between DCRNN and GCRNN in the METR-LA dataset. DCRNN consistently outperforms GCRNN. The intuition is that directed graph better captures the asymmetric correlation between traffic sensors. Figure 4 shows the effects of different parameters. KK roughly corresponds to the size of filters’ reception fields while the number of units corresponds to the number of filters. Larger KK enables the model to capture broader spatial dependency at the cost of increasing learning complexity. We observe that with the increase of KK, the error on the validation dataset first quickly decrease, and then slightly increase. Similar behavior is observed for varying the number of units. 为了研究图构造的效果,我们通过设置 Wij^=max(Wij,Wji)\hat{W_{ij}}= max(W_{ij}, W_{ji}), where W^\hat{W} 来构造无向图,其中 W^\hat{W} 是新的对称权重矩阵。然后,我们开发出一个表示GCRNN的DCRNN变体,该变体使用该序列对带有大致相同数量参数的ChebNet图卷积(等式5)进行序列学习。表2显示了METR-LA数据集中DCRNN和GCRNN之间的比较。 DCRNN始终优于GCRNN。直觉是有向图可以更好地捕获交通传感器之间的不对称相关性。图4显示了不同参数的影响。 KK 大致对应于过滤器接收字段的大小,而单位数则对应于过滤器的数量。较大的 KK 使模型能够以增加学习复杂性为代价捕获更广泛的空间依赖性。我们观察到,随着 KK 的增加,验证数据集上的误差首先迅速减小,然后略有增加。对于改变单元数量,观察到类似的行为。

在这里插入图片描述
在这里插入图片描述
图5:不同DCRNN变体的性能比较。 DCRNN具有序列到序列框架和计划的采样,可在验证数据集上实现最低的MAE。随着预测范围的增加,优势变得更加明显。
在这里插入图片描述
图6:交通时间序列预测可视化。 DCRNN可以生成平稳的预测,并且通常在预测高峰时间的开始和结束方面更好。


4.4 EFFECT OF TEMPORAL DEPENDENCY MODELING

To evaluate the effect of temporal modeling including the sequence to sequence framework as well as the scheduled sampling mechanism, we further design three variants of DCRNN: (1) DCNN: in which we concatenate the historical observations as a fixed length vector and feed it into stacked diffusion convolutional layers to predict the future time series. We train a single model for one step ahead prediction, and feed the previous prediction into the model as input to perform multiple steps ahead prediction. (2) DCRNN-SEQ: which uses the encoder-decoder sequence to sequence learning framework to perform multiple steps ahead forecasting. (3) DCRNN: similar to DCRNN-SEQ except for adding scheduled sampling. 为了评估时间建模的效果(包括从序列到序列的框架以及计划的采样机制),我们进一步设计了DCRNN的三个变体:(1)DCNN:其中,我们将历史观测值连接为固定长度的向量,并将其输入堆叠的扩散卷积层来预测未来的时间序列。我们训练一个模型进行提前一步的预测,然后将先前的预测作为输入提供给模型,以执行提前进行多步预测。 (2)DCRNN-SEQ:使用编码器-解码器序列对序列学习框架进行提前预测的多个步骤。 (3)DCRNN:类似于DCRNN-SEQ,只是增加了调度采样。
Figure 5 shows the comparison of those four methods with regards to MAE for different forecasting horizons. We observe that: (1) DCRNN-SEQ outperforms DCNN by a large margin which conforms the importance of modeling temporal dependency. (2) DCRNN achieves the best result, and its superiority becomes more evident with the increase of the forecasting horizon. This is mainly because the model is trained to deal with its mistakes during multiple steps ahead prediction and thus suffers less from the problem of error propagation. We also train a model that always been fed its output as input for multiple steps ahead prediction. However, its performance is much worse than all the three variants which emphasizes the importance of scheduled sampling. 图5显示了这四种方法在不同预测水平下MAE的比较。我们观察到:(1)DCRNN-SEQ在很大程度上优于DCNN,这符合建模时间依赖性的重要性。 (2)DCRNN取得了最好的效果,其优越性随着预测范围的增加而更加明显。这主要是因为模型经过训练以在提前进行多步预测的过程中处理其错误,因此较少遭受错误传播的困扰。我们还训练了一个模型,该模型始终将其输出作为输入进行多步预测。但是,它的性能比所有三个变体都差得多,三个变体都强调了计划采样的重要性。

4.5 MODEL INTERPRETATION

To better understand the model, we visualize forecasting results as well as learned filters. Figure 6 shows the visualization of 1 hour ahead forecasting. We have the following observations: (1) DCRNN generates smooth prediction of the mean when small oscillation exists in the traffic speeds (Figure 6(a)). This reflects the robustness of the model. (2) DCRNN is more likely to accurately predict abrupt changes in the traffic speed than baseline methods (e.g., FC-LSTM). As shown in Figure 6(b), DCRNN predicts the start and the end of the peak hours. This is because DCRNN captures the spatial dependency, and is able to utilize the speed changes in neighborhood sensors for more accurate forecasting. Figure 7 visualizes examples of learned filters centered at different nodes. The star denotes the center, and colors denote the weights. We can observe that (1) weights are well localized around the center, and (2) the weights diffuse based on road network distance. More visualizations are provided in Appendix F. 为了更好地理解该模型,我们将预测结果和学习的过滤器可视化。图6显示了提前1小时预测的可视化。我们有以下观察结果:(1)当交通速度中存在小振荡时,DCRNN会生成均值的平滑预测(图6(a))。这反映了模型的鲁棒性。 (2)与基线方法(例如FC-LSTM)相比,DCRNN更可能准确地预测流量的突然变化。如图6(b)所示,DCRNN预测高峰时间的开始和结束。这是因为DCRNN捕获了空间依赖性,并能够利用邻域传感器中的速度变化进行更准确的预测。图7可视化了以不同节点为中心的学习过滤器的示例。星号表示中心,颜色表示权重。我们可以观察到(1)权重很好地定位在中心周围,并且(2)权重根据路网距离而扩散。附录F中提供了更多可视化效果。
Figure 7: Visualization of learned localized filters centered at different nodes with K = 3 on the METR-LA dataset. The star denotes the center, and the colors represent the weights. We observe that weights are localized around the center, and diffuse alongside the road network. 图7:在METR-LA数据集上以K = 3为中心,以不同节点为中心的学习的局部滤波器的可视化。星号表示中心,颜色表示权重。我们观察到,权重分布在中心附近,并沿着道路网络扩散。

在这里插入图片描述


5 CONCLUSION

In this paper, we formulated the traffic prediction on road network as a spatiotemporal forecasting problem, and proposed the diffusion convolutional recurrent neural network that captures the spatiotemporal dependencies. Specifically, we use bidirectional graph random walk to model spatial dependency and recurrent neural network to capture the temporal dynamics. We further integrated the encoder-decoder architecture and the scheduled sampling technique to improve the performance for long-term forecasting. When evaluated on two large-scale real-world traffic datasets, our approach obtained significantly better prediction than baselines. For future work, we will investigate the following two aspects (1) applying the proposed model to other spatial-temporal forecasting tasks; (2) modeling the spatiotemporal dependency when the underlying graph structure is evolving, e.g., the K nearest neighbor graph for moving objects. 在本文中,我们将道路网络的交通量预测公式化为时空预测问题,并提出了捕获时空相关性的扩散卷积递归神经网络。具体来说,我们使用双向图随机游走对空间依赖性进行建模,并使用递归神经网络来捕获时间动态。我们进一步集成了编码器-解码器体系结构和计划的采样技术,以提高长期预测的性能。在两个大规模的现实世界交通数据集上进行评估时,我们的方法比基线获得了明显更好的预测。对于未来的工作,我们将研究以下两个方面:(1)将提出的模型应用于其他时空预测任务; (2)在基础图结构正在演变时(例如,运动对象的K最近邻图)建模时空依赖关系。
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章