關於時序和缺失的最新論文略讀記錄

目錄

1 Transformer-based的多變量時序表示學習 (KDD 2021)

2 單變量時序的遷移學習分類模型 (ICML 2021)

3 基於異構圖神經網絡的非完整數據分類 (WWW 2021 Best paper runner up)

4 基於圖神經網絡的多變量時序填補模型 (arXiv 2021)

5 MinRocket:一種快速的時間序列分類模型 (KDD 2021)

 


1 Transformer-based的多變量時序表示學習 (KDD 2021)

原文鏈接

Motivation

Pre-trained models can be potentially used for downstream tasks such as regression and classification, forecasting and missing value imputation.

Meanwhile, the availability of labeled multivariate time series  data in particular is far more limited: extensive data labeling is often prohibitively expensive or impractical, as it may require much time and effort, special infrastructure or domain expertise.

Therefore, it is worth exploring using only a limited amount of labeled data or leveraging the existing plethora of unlabeled data for time series data modeling.

Contribution

In this work, we investigate, for the first time, the use of a transformer encoder for unsupervised representation learning of multivariate time series, as well as for the tasks of time series regression and classification.

Experimental results indicated that transformer models can convincingly outperform all current state-of-the-art modeling approaches, even when only having access to a very limited amount of training data samples (on the order of hundreds of samples), an unprecedented success for deep learning models.

Importantly, we also demonstrate that our models, using at most hundreds of thousands of parameters, can be practically trained even on CPUs; training them on GPUs allows them to be trained as fast as even the fastest and most accurate non-deep learning based approaches.

Model

My thoughts

        本文將Trsansformer模型成功應用到了多變量時間序列數據建模,並通過設計的特別的數據編碼處理模塊,使得本文模型能夠對缺乏標籤的多變量時序數據能夠學習到一組很好的表示,從而有利於下游的迴歸、分類和填補任務。

        本文最大的亮點在於其模型的預訓練能力,降低模型對標籤數據的依賴。但是本文對其學習的表示的可解釋性方面爲多做說明,這一點需要保留應用的謹慎性。

        GitHub: https://github.com/gzerveas/mvts_transformer (目前37 Star)

 


2 單變量時序的遷移學習分類模型 (ICML 2021)

原文鏈接

Motivation

Learning to classify time series with limited data is a practical yet challenging problem. Current methods are primarily based on hand-designed feature extraction rules or domain-specific data augmentation.

Contribution

Motivated by the advances in deep speech processing models and the fact that voice data are univariate temporal signals, in this paper we propose Voice2Series (V2S), a novel end-to-end approach that reprograms acoustic models for time series classification, through input transformation learning and output label mapping.

Leveraging the representation learning power of a large-scale pre-trained speech processing model, on 30 different time series tasks we show that V2S either outperforms or is tied with state-of-the-art methods on 20 tasks, and improves their average accuracy by 1.84%.

Model

 

My thoughts

本文應該是首次把大規模的預訓練模型應用到了時間序列數據的分類任務上,並且在30種單變量時間序列數據集上取得了很好的效果,但是其類別數量在實驗時需要控制在10種以內。此外,由於語音數據集是單變量形式,本文模型也只能應用到單變量上面。但是,本文在可視化分析上做的比較完善,相關可視化分析思路可以借鑑和參考。

GitHub: https://github.com/huckiyang/Voice2Series-Reprogramming (目前28 Star)

 


3 基於異構圖神經網絡的非完整數據分類 (WWW 2021 Best paper runner up)

原文鏈接

Motivation

Heterogeneous information networks (HINs), also called heterogeneous graphs, are composed of multiple types of nodes and edges, and contain comprehensive information and rich semantics.

Graph neural networks (GNNs) based heterogeneous models can not be trained with some nodes with no attributes.

Previous studies take some handcrafted methods to solve this problem, which separate the attribute completion from the graph learning process and, in turn, result in poor performance.

Contribution

In this paper, we hold that missing attributes can be acquired by a learnable manner, and propose an end-to-end framework for Heterogeneous Graph Neural Network via Attribute Completion (HGNN-AC), including pre-learning of topological embedding and attribute completion with attention mechanism.

HGNN-AC first uses existing HIN-Embedding methods to obtain node topological embedding.

Then it uses the topological relationship between nodes as guidance to complete attributes for no-attribute nodes by weighted aggregation of the attributes from these attributed nodes.

Model

 

My thoughts

本文應該是首次把圖神經網絡和缺失數據填補結合在一起,建立了一種端到端的分類模型,相比已有的兩階段方法,能夠取得明顯的分類表現。

本文略有不足的是,在填補方面沒有進行深入的分析和探討即本文模型性能提升的原因相比已有的最新填補方法是填的更好,還是其他方面什麼原因呢?

GitHub:https://github.com/search?q=Heterogeneous+Graph+Neural+Network+via+Attribute+Completion (目前11 Star)

 


4 基於圖神經網絡的多變量時序填補模型 (arXiv 2021)

原文鏈接

Motivation

Dealing with missing values and incomplete multivariate time series is a labor-intensive and time-consuming inevitable task when handling data coming from real-world applications.

Standard methods fall short in capturing the nonlinear timeand space dependencies existing within networks of interconnected sensors and do not take full advantage of the available – and often strong – relational information.

Notably, most of state-of-the-art imputation methods based on deep learning do not explicitly model relational aspects and, in any case, do not exploit processing frameworks able to adequately represent structured spatio-temporal data.

Contribution

In this work, we present the first assessment of graph neural networks in the context of multivariate time series imputation. In particular, we introduce a novel graph neural network architecture, named GRIL , which aims at reconstructing missing data in the different channels of a multivariate time series by learning spatial-temporal representations through message passing.

Model

 

My thoughts

本文最大的亮點在於第一次把圖神經網絡應用到了多變量時間序列數據的建模,但是對於如何建立把不同圖節點之間的依賴關係,本文沒有去詳細討論。

另外,在於實驗環節只進行了填補性能上的衡量分析,並沒有針對圖神經網絡的特性做出一些有說服力的可視化實驗和分析。

GitHub:暫無。目前應該屬於NIPS投稿的階段。

 


5 MinRocket:一種快速的時間序列分類模型 (KDD 2021)

原文鏈接

Motivation       

Until recently, the most accurate methods for time series classification were limited by high computational complexity.

While there have been considerable advances in recent years, computational complexity and a lack of scalability remain persistent problems.

Contribution

We reformulate Rocket into a new method, MiniRocket. MiniRocket is up to 75 times faster than Rocket on larger datasets, and almost deterministic (and optionally, fully deterministic), while maintaining essentially the same accuracy. Using this method, it is possible to train and test a classifier on all of 109 datasets from the UCR archive to state-of-the-art accuracy in under 10 minutes.

GitHub: https://github.com/angus924/minirocket  (目前87 Star)

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章