持續更新收集***
1、Bert系列
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding - NAACL 2019)
- ERNIE 2.0: A Continual Pre-training Framework for Language Understanding - arXiv 2019)
- StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding - arXiv 2019)
- RoBERTa: A Robustly Optimized BERT Pretraining Approach - arXiv 2019)
- ALBERT: A Lite BERT for Self-supervised Learning of Language Representations - arXiv 2019)
- Multi-Task Deep Neural Networks for Natural Language Understanding - arXiv 2019)
- What does BERT learn about the structure of language? (ACL2019)
- Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned (ACL2019) [github]
- Open Sesame: Getting Inside BERT's Linguistic Knowledge (ACL2019 WS)
- Analyzing the Structure of Attention in a Transformer Language Model (ACL2019 WS)
- What Does BERT Look At? An Analysis of BERT's Attention (ACL2019 WS)
- Do Attention Heads in BERT Track Syntactic Dependencies?
- Blackbox meets blackbox: Representational Similarity and Stability Analysis of Neural Language Models and Brains (ACL2019 WS)
- Inducing Syntactic Trees from BERT Representations (ACL2019 WS)
- A Multiscale Visualization of Attention in the Transformer Model (ACL2019 Demo)
- Visualizing and Measuring the Geometry of BERT
- How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings (EMNLP2019)
- Are Sixteen Heads Really Better than One? (NeurIPS2019)
- On the Validity of Self-Attention as Explanation in Transformer Models
- Visualizing and Understanding the Effectiveness of BERT (EMNLP2019)
- Attention Interpretability Across NLP Tasks
- Revealing the Dark Secrets of BERT (EMNLP2019)
- Investigating BERT's Knowledge of Language: Five Analysis Methods with NPIs (EMNLP2019)
- The Bottom-up Evolution of Representations in the Transformer: A Study with Machine Translation and Language Modeling Objectives (EMNLP2019)
- A Primer in BERTology: What we know about how BERT works
- Do NLP Models Know Numbers? Probing Numeracy in Embeddings (EMNLP2019)
- How Does BERT Answer Questions? A Layer-Wise Analysis of Transformer Representations (CIKM2019)
- Whatcha lookin' at? DeepLIFTing BERT's Attention in Question Answering
- What does BERT Learn from Multiple-Choice Reading Comprehension Datasets?
- Calibration of Pre-trained Transformers
- exBERT: A Visual Analysis Tool to Explore Learned Representations in Transformers Models [github]
- MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices [github]
- 最前沿的12個NLP預訓練模型
- NLP預訓練模型:從transformer到albert
- XLNet:運行機制及和Bert的異同比較
- Bert時代的創新(應用篇):Bert在NLP各領域的應用進展
2、Transformer系列
- Attention Is All You Need - arXiv 2017)
- Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context - arXiv 2019)
- Universal Transformers - ICLR 2019)
- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer - arXiv 2019)
- Reformer: The Efficient Transformer - ICLR 2020)
- Adaptive Attention Span in Transformers (ACL2019)
- Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context (ACL2019) [github]
- Generating Long Sequences with Sparse Transformers
- Adaptively Sparse Transformers (EMNLP2019)
- Compressive Transformers for Long-Range Sequence Modelling
- The Evolved Transformer (ICML2019)
- Reformer: The Efficient Transformer (ICLR2020) [github]
- GRET: Global Representation Enhanced Transformer (AAAI2020)
- Transformer on a Diet [github]
- Efficient Content-Based Sparse Attention with Routing Transformers
- BP-Transformer: Modelling Long-Range Context via Binary Partitioning
- Recipes for building an open-domain chatbot
- Longformer: The Long-Document Transformer
- UnifiedQA: Crossing Format Boundaries With a Single QA System [github]
- 《Attention is All You Need》淺讀(簡介+代碼)
- 通俗易懂Transformer
- 放棄幻想,全面擁抱Transformer:自然語言處理三大特徵抽取器(CNN/RNN/TF)比較
3、遷移學習系列(Transfer Learning)
- Deep contextualized word representations - NAACL 2018)
- Universal Language Model Fine-tuning for Text Classification - ACL 2018)
- Improving Language Understanding by Generative Pre-Training - Alec Radford)
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding - NAACL 2019)
- Cloze-driven Pretraining of Self-attention Networks - arXiv 2019)
- Unified Language Model Pre-training for Natural Language Understanding and Generation - arXiv 2019)
- MASS: Masked Sequence to Sequence Pre-training for Language Generation - ICML 2019)
- MPNet: Masked and Permuted Pre-training for Language Understanding)[github]
- UNILMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training)[github]
4、文本摘要系列(Text Summarization)
- Positional Encoding to Control Output Sequence Length - Sho Takase(2019)
- Fine-tune BERT for Extractive Summarization - Yang Liu(2019)
- Language Models are Unsupervised Multitask Learners - Alec Radford(2019)
- A Unified Model for Extractive and Abstractive Summarization using Inconsistency Loss - Wan-Ting Hsu(2018)
- A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents - Arman Cohan(2018)
- GENERATING WIKIPEDIA BY SUMMARIZING LONG SEQUENCES - Peter J. Liu(2018)
- Get To The Point: Summarization with Pointer-Generator Networks - Abigail See(2017) * A Neural Attention Model for Sentence Summarization - Alexander M. Rush(2015)
- HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization (ACL2019)
- Deleter: Leveraging BERT to Perform Unsupervised Successive Text Compression
- Discourse-Aware Neural Extractive Model for Text Summarization
- PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization[github]
- Discourse-Aware Neural Extractive Text Summarization[github]
5、情感分析系列(Sentiment Analysis)
- Multi-Task Deep Neural Networks for Natural Language Understanding - Xiaodong Liu(2019)
- Aspect-level Sentiment Analysis using AS-Capsules - Yequan Wang(2019)
- On the Role of Text Preprocessing in Neural Network Architectures: An Evaluation Study on Text Categorization and Sentiment Analysis - Jose Camacho-Collados(2018)
- Learned in Translation: Contextualized Word Vectors - Bryan McCann(2018)
- Universal Language Model Fine-tuning for Text Classification - Jeremy Howard(2018)
- Convolutional Neural Networks with Recurrent Neural Filters - Yi Yang(2018)
- Information Aggregation via Dynamic Routing for Sequence Encoding - Jingjing Gong(2018)
- Learning to Generate Reviews and Discovering Sentiment - Alec Radford(2017)
- A Structured Self-attentive Sentence Embedding - Zhouhan Lin(2017)
- Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence (NAACL2019)
- BERT Post-Training for Review Reading Comprehension and Aspect-based Sentiment Analysis (NAACL2019)
- Exploiting BERT for End-to-End Aspect-based Sentiment Analysis (EMNLP2019 WS)
- Adapt or Get Left Behind: Domain Adaptation through BERT Language Model Finetuning for Aspect-Target Sentiment Classification
- An Investigation of Transfer Learning-Based Sentiment Analysis in Japanese (ACL2019)
- "Mask and Infill" : Applying Masked Language Model to Sentiment Transfer
- Adversarial Training for Aspect-Based Sentiment Analysis with BERT
- Utilizing BERT Intermediate Layers for Aspect Based Sentiment Analysis and Natural Language Inference
6、問答&閱讀理解&對話系統系列(Question Answering)
- Language Models are Unsupervised Multitask Learners - Alec Radford(2019)
- Improving Language Understanding by Generative Pre-Training - Alec Radford(2018)
- Bidirectional Attention Flow for Machine Comprehension - Minjoon Seo(2018)
- Reinforced Mnemonic Reader for Machine Reading Comprehension - Minghao Hu(2017)
- Neural Variational Inference for Text Processing - Yishu Miao(2015)
- A BERT Baseline for the Natural Questions
- MultiQA: An Empirical Investigation of Generalization and Transfer in Reading Comprehension (ACL2019)
- Unsupervised Domain Adaptation on Reading Comprehension
- BERTQA -- Attention on Steroids
- A Multi-Type Multi-Span Network for Reading Comprehension that Requires Discrete Reasoning (EMNLP2019)
- SDNet: Contextualized Attention-based Deep Network for Conversational Question Answering
- Multi-hop Question Answering via Reasoning Chains
- Select, Answer and Explain: Interpretable Multi-hop Reading Comprehension over Multiple Documents
- Multi-step Entity-centric Information Retrieval for Multi-Hop Question Answering (EMNLP2019 WS)
- End-to-End Open-Domain Question Answering with BERTserini (NAALC2019)
- Latent Retrieval for Weakly Supervised Open Domain Question Answering (ACL2019)
- Multi-passage BERT: A Globally Normalized BERT Model for Open-domain Question Answering (EMNLP2019)
- Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering (ICLR2020)
- Learning to Ask Unanswerable Questions for Machine Reading Comprehension (ACL2019)
- Unsupervised Question Answering by Cloze Translation (ACL2019)
- Reinforcement Learning Based Graph-to-Sequence Model for Natural Question Generation
- A Recurrent BERT-based Model for Question Generation (EMNLP2019 WS)
- Learning to Answer by Learning to Ask: Getting the Best of GPT-2 and BERT Worlds
- Enhancing Pre-Trained Language Representations with Rich Knowledge for Machine Reading Comprehension (ACL2019)
- Incorporating Relation Knowledge into Commonsense Reading Comprehension with Multi-task Learning (CIKM2019)
- SG-Net: Syntax-Guided Machine Reading Comprehension
- MMM: Multi-stage Multi-task Learning for Multi-choice Reading Comprehension
- Cosmos QA: Machine Reading Comprehension with Contextual Commonsense Reasoning (EMNLP2019)
- ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning (ICLR2020)
- Robust Reading Comprehension with Linguistic Constraints via Posterior Regularization
- BAS: An Answer Selection Method Using BERT Language Model
- Beat the AI: Investigating Adversarial Human Annotations for Reading Comprehension
- A Simple but Effective Method to Incorporate Multi-turn Context with BERT for Conversational Machine Comprehension (ACL2019 WS)
- FlowDelta: Modeling Flow Information Gain in Reasoning for Conversational Machine Comprehension (ACL2019 WS)
- BERT with History Answer Embedding for Conversational Question Answering (SIGIR2019)
- GraphFlow: Exploiting Conversation Flow with Graph Neural Networks for Conversational Machine Comprehension (ICML2019 WS)
- Beyond English-only Reading Comprehension: Experiments in Zero-Shot Multilingual Transfer for Bulgarian (RANLP2019)
- XQA: A Cross-lingual Open-domain Question Answering Dataset (ACL2019)
- Cross-Lingual Machine Reading Comprehension (EMNLP2019)
- Zero-shot Reading Comprehension by Cross-lingual Transfer Learning with Multi-lingual Language Representation Model
- Multilingual Question Answering from Formatted Text applied to Conversational Agents
- BiPaR: A Bilingual Parallel Dataset for Multilingual and Cross-lingual Reading Comprehension on Novels (EMNLP2019)
- MLQA: Evaluating Cross-lingual Extractive Question Answering
- Investigating Prior Knowledge for Challenging Chinese Machine Reading Comprehension (TACL)
- SberQuAD - Russian Reading Comprehension Dataset: Description and Analysis
- Giving BERT a Calculator: Finding Operations and Arguments with Reading Comprehension (EMNLP2019)
- BERT-DST: Scalable End-to-End Dialogue State Tracking with Bidirectional Encoder Representations from Transformer (Interspeech2019)
- Dialog State Tracking: A Neural Reading Comprehension Approach
- A Simple but Effective BERT Model for Dialog State Tracking on Resource-Limited Systems (ICASSP2020)
- Fine-Tuning BERT for Schema-Guided Zero-Shot Dialogue State Tracking
- Goal-Oriented Multi-Task BERT-Based Dialogue State Tracker
- Domain Adaptive Training BERT for Response Selection
- BERT Goes to Law School: Quantifying the Competitive Advantage of Access to Large Legal Corpora in Contract Understanding
7、機器翻譯
8、綜述
- Evolution of transfer learning in natural language processing
- Pre-trained Models for Natural Language Processing: A Survey
- A Survey on Contextual Embeddings
9、謂詞填充
- BERT for Joint Intent Classification and Slot Filling
- Multi-lingual Intent Detection and Slot Filling in a Joint BERT-based Model
- A Comparison of Deep Learning Methods for Language Understanding (Interspeech2019)
10、實體識別
- BERT Meets Chinese Word Segmentation
- Toward Fast and Accurate Neural Chinese Word Segmentation with Multi-Criteria Learning
- Establishing Strong Baselines for the New Decade: Sequence Tagging, Syntactic and Semantic Parsing with BERT
- Evaluating Contextualized Embeddings on 54 Languages in POS Tagging, Lemmatization and Dependency Parsing
- NEZHA: Neural Contextualized Representation for Chinese Language Understanding
- Deep Contextualized Word Embeddings in Transition-Based and Graph-Based Dependency Parsing -- A Tale of Two Parsers Revisited (EMNLP2019)
- Is POS Tagging Necessary or Even Helpful for Neural Dependency Parsing?
- Parsing as Pretraining (AAAI2020)
- Cross-Lingual BERT Transformation for Zero-Shot Dependency Parsing
- Recursive Non-Autoregressive Graph-to-Graph Transformer for Dependency Parsing with Iterative Refinement
- Named Entity Recognition -- Is there a glass ceiling? (CoNLL2019)
- A Unified MRC Framework for Named Entity Recognition
- Training Compact Models for Low Resource Entity Tagging using Pre-trained Language Models
- Robust Named Entity Recognition with Truecasing Pretraining (AAAI2020)
- LTP: A New Active Learning Strategy for Bert-CRF Based Named Entity Recognition
- MT-BioNER: Multi-task Learning for Biomedical Named Entity Recognition using Deep Bidirectional Transformers
- Portuguese Named Entity Recognition using BERT-CRF
- Towards Lingua Franca Named Entity Recognition with BERT
11、關係抽取
- Matching the Blanks: Distributional Similarity for Relation Learning (ACL2019)
- BERT-Based Multi-Head Selection for Joint Entity-Relation Extraction (NLPCC2019)
- Enriching Pre-trained Language Model with Entity Information for Relation Classification
- Span-based Joint Entity and Relation Extraction with Transformer Pre-training
- Fine-tune Bert for DocRED with Two-step Process
- Entity, Relation, and Event Extraction with Contextualized Span Representations (EMNLP2019)
12、知識庫
- KG-BERT: BERT for Knowledge Graph Completion
- Language Models as Knowledge Bases? (EMNLP2019) [github]
- BERT is Not a Knowledge Base (Yet): Factual Knowledge vs. Name-Based Reasoning in Unsupervised QA
- Inducing Relational Knowledge from BERT (AAAI2020)
- Latent Relation Language Models (AAAI2020)
- Pretrained Encyclopedia: Weakly Supervised Knowledge-Pretrained Language Model (ICLR2020)
- Zero-shot Entity Linking with Dense Entity Retrieval
- Investigating Entity Knowledge in BERT with Simple Neural End-To-End Entity Linking (CoNLL2019)
- Improving Entity Linking by Modeling Latent Entity Type Information (AAAI2020)
- PEL-BERT: A Joint Model for Protocol Entity Linking
- How Can We Know What Language Models Know?
- REALM: Retrieval-Augmented Language Model Pre-Training
13、文本分類
- How to Fine-Tune BERT for Text Classification?
- X-BERT: eXtreme Multi-label Text Classification with BERT
- DocBERT: BERT for Document Classification
- Enriching BERT with Knowledge Graph Embeddings for Document Classification
- Classification and Clustering of Arguments with Contextualized Word Embeddings (ACL2019)
- BERT for Evidence Retrieval and Claim Verification
- Stacked DeBERT: All Attention in Incomplete Data for Text Classification
- Cost-Sensitive BERT for Generalisable Sentence Classification with Imbalanced Data
14、文本生成
- BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model (NAACL2019 WS)
- Pretraining-Based Natural Language Generation for Text Summarization
- Text Summarization with Pretrained Encoders (EMNLP2019) [github (original)] [github (huggingface)]
- Multi-stage Pretraining for Abstractive Summarization
- PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization
- MASS: Masked Sequence to Sequence Pre-training for Language Generation (ICML2019) [github], [github]
- Unified Language Model Pre-training for Natural Language Understanding and Generation [github] (NeurIPS2019)
- UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training [github]
- ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training
- Towards Making the Most of BERT in Neural Machine Translation
- Improving Neural Machine Translation with Pre-trained Representation
- On the use of BERT for Neural Machine Translation (EMNLP2019 WS)
- Incorporating BERT into Neural Machine Translation (ICLR2020)
- Recycling a Pre-trained BERT Encoder for Neural Machine Translation
- Leveraging Pre-trained Checkpoints for Sequence Generation Tasks
- Mask-Predict: Parallel Decoding of Conditional Masked Language Models (EMNLP2019)
- BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
- ERNIE-GEN: An Enhanced Multi-Flow Pre-training and Fine-tuning Framework for Natural Language Generation
- Cross-Lingual Natural Language Generation via Pre-Training (AAAI2020) [github]
- Multilingual Denoising Pre-training for Neural Machine Translation
- PLATO: Pre-trained Dialogue Generation Model with Discrete Latent Variable
- Unsupervised Pre-training for Natural Language Generation: A Literature Review
15、糾錯(多任務、masking策略等)
- Multi-Task Deep Neural Networks for Natural Language Understanding (ACL2019)
- The Microsoft Toolkit of Multi-Task Deep Neural Networks for Natural Language Understanding
- BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning (ICML2019)
- Unifying Question Answering and Text Classification via Span Extraction
- ERNIE: Enhanced Language Representation with Informative Entities (ACL2019)
- ERNIE: Enhanced Representation through Knowledge Integration
- ERNIE 2.0: A Continual Pre-training Framework for Language Understanding (AAAI2020)
- Pre-Training with Whole Word Masking for Chinese BERT
- SpanBERT: Improving Pre-training by Representing and Predicting Spans [github]
- Blank Language Models
- Efficient Training of BERT by Progressively Stacking (ICML2019) [github]
- RoBERTa: A Robustly Optimized BERT Pretraining Approach [github]
- ALBERT: A Lite BERT for Self-supervised Learning of Language Representations (ICLR2020)
- ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators (ICLR2020) [github] [blog]
- FreeLB: Enhanced Adversarial Training for Language Understanding (ICLR2020)
- KERMIT: Generative Insertion-Based Modeling for Sequences
- DisSent: Sentence Representation Learning from Explicit Discourse Relations (ACL2019)
- StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding (ICLR2020)
- Syntax-Infused Transformer and BERT models for Machine Translation and Natural Language Understanding
- SenseBERT: Driving Some Sense into BERT
- Semantics-aware BERT for Language Understanding (AAAI2020)
- K-BERT: Enabling Language Representation with Knowledge Graph
- Knowledge Enhanced Contextual Word Representations (EMNLP2019)
- KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation
- Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks (EMNLP2019)
- SBERT-WK: A Sentence Embedding Method By Dissecting BERT-based Word Models
- Universal Text Representation from BERT: An Empirical Study
- Symmetric Regularization based BERT for Pair-wise Semantic Reasoning
- Transfer Fine-Tuning: A BERT Case Study (EMNLP2019)
- Improving Pre-Trained Multilingual Models with Vocabulary Expansion (CoNLL2019)
- SesameBERT: Attention for Anywhere
- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer [github]
- SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization
16、多模態系列
- VideoBERT: A Joint Model for Video and Language Representation Learning (ICCV2019)
- ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks (NeurIPS2019)
- VisualBERT: A Simple and Performant Baseline for Vision and Language
- Selfie: Self-supervised Pretraining for Image Embedding
- ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised Image-Text Data
- Contrastive Bidirectional Transformer for Temporal Representation Learning
- M-BERT: Injecting Multimodal Information in the BERT Structure
- LXMERT: Learning Cross-Modality Encoder Representations from Transformers (EMNLP2019)
- Fusion of Detected Objects in Text for Visual Question Answering (EMNLP2019)
- BERT representations for Video Question Answering (WACV2020)
- Unified Vision-Language Pre-Training for Image Captioning and VQA [github]
- Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline
- VL-BERT: Pre-training of Generic Visual-Linguistic Representations (ICLR2020)
- Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training
- UNITER: Learning UNiversal Image-TExt Representations
- Supervised Multimodal Bitransformers for Classifying Images and Text
- Weak Supervision helps Emergence of Word-Object Alignment and improves Vision-Language Tasks
- BERT Can See Out of the Box: On the Cross-modal Transferability of Text Representations
- BERT for Large-scale Video Segment Classification with Test-time Augmentation (ICCV2019WS)
- SpeechBERT: Cross-Modal Pre-trained Language Model for End-to-end Spoken Question Answering
- vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations
- Effectiveness of self-supervised pre-training for speech recognition
- Understanding Semantics from Speech Through Pre-training
- Towards Transfer Learning for End-to-End Speech Synthesis from Deep Pre-Trained Language Models
17、模型壓縮
- Distilling Task-Specific Knowledge from BERT into Simple Neural Networks
- Patient Knowledge Distillation for BERT Model Compression (EMNLP2019)
- Small and Practical BERT Models for Sequence Labeling (EMNLP2019)
- Pruning a BERT-based Question Answering Model
- TinyBERT: Distilling BERT for Natural Language Understanding [github]
- DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter (NeurIPS2019 WS) [github]
- Knowledge Distillation from Internal Representations (AAAI2020)
- PoWER-BERT: Accelerating BERT inference for Classification Tasks
- WaLDORf: Wasteless Language-model Distillation On Reading-comprehension
- Extreme Language Model Compression with Optimal Subwords and Shared Projections
- BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
- Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning
- MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers
- Compressing Large-Scale Transformer-Based Models: A Case Study on BERT
- Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers
- MobileBERT: Task-Agnostic Compression of BERT by Progressive Knowledge Transfer
- Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT
- Q8BERT: Quantized 8Bit BERT (NeurIPS2019 WS)