一周新论文 | 2020年第13周 | 自然语言处理相关

《一周新论文》系列之2020年第13周：自然语言处理相关

本周重点关注：

Google: [38], [40]
Microsoft: [13]
Facebook: [2]
其他: [1]

2020年3月27日

[1]. TLDR: Token Loss Dynamic Reweighting for Reducing Repetitive Utterance Generation
链接 | https://arxiv.org/abs/2003.11963
作者 | Shaojie Jiang, Thomas Wolf, Christof Monz, Maarten de Rijke
单位 | University of Amsterdam; Hugging Face

[2]. Rat big, cat eaten! Ideas for a useful deep-agent protolanguage
链接 | https://arxiv.org/abs/2003.11922
作者 | Marco Baroni
单位 | Facebook AI Research

[3]. Common-Knowledge Concept Recognition for SEVA
链接 | https://arxiv.org/abs/2003.11687
作者 | Jitin Krishnan, Patrick Coronado, Hemant Purohit, Huzefa Rangwala

[4]. Word2Vec: Optimal Hyper-Parameters and Their Impact on NLP Downstream Tasks
链接 | https://arxiv.org/abs/2003.11645
作者 | Tosin P. Adewumi, Foteini Liwicki, Marcus Liwicki

[5]. Multi-Label Text Classification using Attention-based Graph Neural Network
链接 | https://arxiv.org/abs/2003.11644
作者 | Ankit Pal, Muru Selvakumar, Malaikannan Sankarasubbu

[6]. Sentiment Analysis in Drug Reviews using Supervised Machine Learning Algorithms
链接 | https://arxiv.org/abs/2003.11643
作者 | Sairamvinay Vijayaraghavan, Debraj Basu

[7]. Author2Vec: A Framework for Generating User Embedding
链接 | https://arxiv.org/abs/2003.11627
作者 | Xiaodong Wu, Weizhe Lin, Zhilin Wang, Elena Rastorgueva
单位 | University of Cambridge

[8]. Predicting Unplanned Readmissions with Highly Unstructured Data
链接 | https://arxiv.org/abs/2003.11622
作者 | Constanza Fierro, Jorge Pérez, Javier Mora

[9]. Cost-Sensitive BERT for Generalisable Sentence Classification with Imbalanced Data
链接 | https://arxiv.org/abs/2003.11563
作者 | Harish Tayyar Madabushi, Elena Kochkina, Michael Castelle
单位 | University of Birmingham; Alan Turing Institute
备注 | NLP4IF 2019

[10]. Finnish Language Modeling with Deep Transformer Models
链接 | https://arxiv.org/abs/2003.11562
作者 | Abhilash Jain

[11]. Predicting Legal Proceedings Status: an Approach Based on Sequential Text Data
链接 | https://arxiv.org/abs/2003.11561
作者 | Felipe Maia Polo, Itamar Ciochetti, Emerson Bertolo

[12]. Forensic Authorship Analysis of Microblogging Texts Using N-Grams and Stylometric Features
链接 | https://arxiv.org/abs/2003.11545
作者 | Nicole Mariah Sharon Belvisi, Naveed Muhammad, Fernando Alonso-Fernandez
备注 | Accepted for publication at 8th International Workshop on Biometrics and Forensics, IWBF 2020

[13]. VIOLIN: A Large-Scale Dataset for Video-and-Language Inference
链接 | https://arxiv.org/abs/2003.11618
作者 | Jingzhou Liu, Wenhu Chen, Yu Cheng, Zhe Gan, Licheng Yu, Yiming Yang, Jingjing Liu
单位 | Carnegie Mellon University; University of California, Santa Barbara; Microsoft
备注 | Accepted to CVPR2020

[14]. Heavy-tailed Representations, Text Polarity Classification & Data Augmentation
链接 | https://arxiv.org/abs/2003.11593
作者 | Hamid Jalalzai, Pierre Colombo, Chloé Clavel, Eric Gaussier, Giovanna Varni, Emmanuel Vignon, Anne Sabourin

2020年3月26日

[15]. The Medical Scribe: Corpus Development and Model Performance Analyses
链接 | https://arxiv.org/abs/2003.11531
作者 | Izhak Shafran, Nan Du, Linh Tran, Amanda Perry, Lauren Keyes, Mark Knichel, Ashley Domin, Lei Huang, Yuhui Chen, Gang Li, Mingqiu Wang, Laurent El Shafey, Hagen Soltau, Justin S. Paul
单位 | Google
备注 | Extended version of the paper accepted at LREC 2020

[16]. Meta-CoTGAN: A Meta Cooperative Training Paradigm for Improving Adversarial Text Generation
链接 | https://arxiv.org/abs/2003.11530
作者 | Haiyan Yin, Dingcheng Li, Xu Li, Ping Li
单位 | Baidu Research

[17]. Masakhane – Machine Translation For Africa
链接 | https://arxiv.org/abs/2003.11529
作者 | Iroro Orife, Julia Kreutzer, Blessing Sibanda, Daniel Whitenack, Kathleen Siminyu, Laura Martinus, Jamiil Toure Ali, Jade Abbott, Vukosi Marivate, Salomon Kabongo, Musie Meressa, Espoir Murhabazi, Orevaoghene Ahia, Elan van Biljon, Arshath Ramkilowan, Adewale Akinfaderin, Alp Öktem, Wole Akin, Ghollah Kioko, Kevin Degila, Herman Kamper, Bonaventure Dossou, Chris Emezue, Kelechi Ogueji, Abdallah Bashir
备注 | Accepted for the AfricaNLP Workshop, ICLR 2020

[18]. Generating Major Types of Chinese Classical Poetry in a Uniformed Framework
链接 | https://arxiv.org/abs/2003.11528
作者 | Jinyi Hu, Maosong Sun
单位 | Tsinghua University

[19]. Tigrinya Neural Machine Translation with Transfer Learning for Humanitarian Response
链接 | https://arxiv.org/abs/2003.11523
作者 | Alp Öktem, Mirko Plitt, Grace Tang
单位 | University of Oxford; DeepMind; University College London; The Alan Turing Institute
备注 | Pre-print accepted to Africa NLP workshop organized within Eighth International Conference on Learning Representations (ICLR 2020)

[20]. Matching Text with Deep Mutual Information Estimation
链接 | https://arxiv.org/abs/2003.11521
作者 | Xixi Zhou, Chengxi Li, Jiajun Bu, Chengwei Yao, Keyue Shi, Zhi Yu, Zhou Yu
单位 | Zhejiang University; University of California, Davis

[21]. Joint Multiclass Debiasing of Word Embeddings
链接 | https://arxiv.org/abs/2003.11520
作者 | Radomir Popović, Florian Lemmerich, Markus Strohmaier

[22]. Vector logic and counterfactuals
链接 | https://arxiv.org/abs/2003.11519
作者 | Eduardo Mizraji

[23]. Hybrid Attention-Based Transformer Block Model for Distant Supervision Relation Extraction
链接 | https://arxiv.org/abs/2003.11518
作者 | Yan Xiao, Yaochu Jin, Ran Cheng, Kuangrong Hao

[24]. From Algebraic Word Problem to Program: A Formalized Approach
链接 | https://arxiv.org/abs/2003.11517
作者 | Adam Wiemerslage, Shafiuddin Rehan Ahmed
单位 | University of Colorado, Boulder
备注 | 9 pages, 6 figures, Course project of Programming Languages

[25]. Keyword-Attentive Deep Semantic Matching
链接 | https://arxiv.org/abs/2003.11516
作者 | Changyu Miao, Zhen Cao, Yik-Cheung Tam
单位 | WeChat AI

[26]. Hurtful Words: Quantifying Biases in Clinical Contextual Word Embeddings
链接 | https://arxiv.org/abs/2003.11515
作者 | Haoran Zhang, Amy X. Lu, Mohamed Abdalla, Matthew McDermott, Marzyeh Ghassemi
单位 | University of Toronto; Vector Institute; MIT
备注 | Accepted at ACM CHIL 2020 (Spotlight)

[27]. BaitWatcher: A lightweight web interface for the detection of incongruent news headlines
链接 | https://arxiv.org/abs/2003.11459
作者 | Kunwoo Park, Taegyun Kim, Seunghyun Yoon, Meeyoung Cha, Kyomin Jung

[28]. Adversarial Multi-Binary Neural Network for Multi-class Classification
链接 | https://arxiv.org/abs/2003.11184
作者 | Haiyang Xu, Junwen Chen, Kun Han, Xiangang Li
单位 | Didi

[29]. Learning Syntactic and Dynamic Selective Encoding for Document Summarization
链接 | https://arxiv.org/abs/2003.11173
作者 | Haiyang Xu, Yahao He, Kun Han, Junwen Chen, Xiangang Li
备注 | Didi

[30]. Can Embeddings Adequately Represent Medical Terminology? New Large-Scale Medical Term Similarity Datasets Have the Answer!
链接 | https://arxiv.org/abs/2003.11082
作者 | Claudia Schulz, Damir Juric

[31]. XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization
链接 | https://arxiv.org/abs/2003.11080
作者 | Junjie Hu, Sebastian Ruder, Aditya Siddhant, Graham Neubig, Orhan Firat, Melvin Johnson
单位 | Carnegie Mellon University; DeepMind; Google Research

[32]. Utilizing Deep Learning to Identify Drug Use on Twitter Data
链接 | https://arxiv.org/abs/2003.11522
作者 | Joseph Tassone, Peizhi Yan, Mackenzie Simpson, Chetan Mendhe, Vijay Mago, Salimur Choudhury

[33]. COVID-19 and Computer Audition: An Overview on What Speech & Sound Analysis Could Contribute in the SARS-CoV-2 Corona Crisis
链接 | https://arxiv.org/abs/2003.11117
作者 | Björn W. Schuller, Dagmar M. Schuller, Kun Qian, Juan Liu, Huaiyuan Zheng, Xiao Li

[34]. EQL – an extremely easy to learn knowledge graph query language, achieving highspeed and precise search
链接 | https://arxiv.org/abs/2003.11105
作者 | Han Liu, Shantao Liu

2020年3月25日

[35]. Cross-Lingual Adaptation Using Universal Dependencies
链接 | https://arxiv.org/abs/2003.10816
作者 | Nasrin Taghizadeh, Heshaam Faili
单位 |

[36]. Generating Chinese Poetry from Images via Concrete and Abstract Information
链接 | https://arxiv.org/abs/2003.10773
作者 | Yusen Liu, Dayiheng Liu, Jiancheng Lv, Yongsheng Sang
单位 | Sichuan University
备注 | Accepted by the 2020 International Joint Conference on Neural Networks (IJCNN 2020)

[37]. Towards Neural Machine Translation for Edoid Languages
链接 | https://arxiv.org/abs/2003.10704
作者 | Iroro Orife

[38]. Felix: Flexible Text Editing Through Tagging and Insertion
链接 | https://arxiv.org/abs/2003.10687
作者 | Jonathan Mallinson, Aliaksei Severyn, Eric Malmi, Guillermo Garrido
单位 | University of Edinburgh; Google Research

[39]. Improving Yorùbá Diacritic Restoration
链接 | https://arxiv.org/abs/2003.10564
作者 | Iroro Orife, David I. Adelani, Timi Fasubaa, Victor Williamson, Wuraola Fisayo Oyewusi, Olamilekan Wahab, Kola Tubosun
备注 | Accepted to ICLR 2020 AfricaNLP workshop

[40]. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
链接 | https://arxiv.org/abs/2003.10555
作者 | Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning
单位 | Stanford University; Google Brain
备注 | ICLR 2020

[41]. Learning Compact Reward for Image Captioning
链接 | https://arxiv.org/abs/2003.10925
作者 | Nannan Li, Zhenzhong Chen
单位 | Wuhan University

[42]. Investigating Software Usage in the Social Sciences: A Knowledge Graph Approach
链接 | https://arxiv.org/abs/2003.10715
作者 | David Schindler, Benjamin Zapilko, Frank Krüger
备注 | 16 pages, 4 figures, preprint of a full paper at Extended Semantic Web Conference (ESWC 2020)

[43]. Video Object Grounding using Semantic Roles in Language Description
链接 | https://arxiv.org/abs/2003.10606
作者 | Arka Sadhu, Kan Chen, Ram Nevatia
单位 | University of Southern California; Facebook

[44]. ScrabbleGAN: Semi-Supervised Varying Length Handwritten Text Generation
链接 | https://arxiv.org/abs/2003.10557
作者 | Sharon Fogel, Hadar Averbuch-Elor, Sarel Cohen, Shai Mazor, Roee Litman
单位 | Amazon; Cornell University
备注 | in CVPR 2020

[45]. Data-driven models and computational tools for neurolinguistics: a language technology perspective
链接 | https://arxiv.org/abs/2003.10540
作者 | Ekaterina Artemova, Amir Bakarov, Aleksey Artemov, Evgeny Burnaev, Maxim Sharaev

2020年3月24日

[46]. Generating Natural Language Adversarial Examples on a Large Scale with Generative Models
链接 | https://arxiv.org/abs/2003.10388
作者 | Yankun Ren, Jianbin Lin, Siliang Tang, Jun Zhou, Shuang Yang, Yuan Qi, Xiang Ren
单位 | Ant Financial Services Group; Zhejiang University; University of Southern California

[47]. Adaptive Name Entity Recognition under Highly Unbalanced Data
链接 | https://arxiv.org/abs/2003.10296
作者 | Thong Nguyen, Duy Nguyen, Pramod Rao

[48]. PathVQA: 30000+ Questions for Medical Visual Question Answering
链接 | https://arxiv.org/abs/2003.10286
作者 | Xuehai He, Yichen Zhang, Luntian Mou, Eric Xing, Pengtao Xie
单位 | University of California San Diego; Beijing University of Technology; Carnegie Mellon University

[49]. Fast Cross-domain Data Augmentation through Neural Sentence Editing
链接 | https://arxiv.org/abs/2003.10254
作者 | Guillaume Raille, Sandra Djambazovska, Claudiu Musat

[50]. Unsupervised Word Polysemy Quantification with Multiresolution Grids of Contextual Embeddings
链接 | https://arxiv.org/abs/2003.10224
作者 | Christos Xypolopoulos, Antoine J.-P. Tixier, Michalis Vazirgiannis

[51]. E2EET: From Pipeline to End-to-end Entity Typing via Transformer-Based Embeddings
链接 | https://arxiv.org/abs/2003.10097
作者 | Michael Stewart, Wei Liu

[52]. Caption Generation of Robot Behaviors based on Unsupervised Learning of Action Segments
链接 | https://arxiv.org/abs/2003.10066
作者 | Koichiro Yoshino, Kohei Wakimoto, Yuta Nishimura, Satoshi Nakamura
备注 | Will appear in IWSDS2020

[53]. SAC: Accelerating and Structuring Self-Attention via Sparse Adaptive Connection
链接 | https://arxiv.org/abs/2003.09833
作者 | Xiaoya Li, Yuxian Meng, Qinghong Han, Fei Wu, Jiwei Li
单位 | Shannon.AI; Zhejiang University

[54]. Prior Knowledge Driven Label Embedding for Slot Filling in Natural Language Understanding
链接 | https://arxiv.org/abs/2003.07962
作者 | Su Zhu, Zijian Zhao, Rao Ma, Kai Yu
单位 | Shanghai Jiao Tong University
备注 | 11 pages, 6 figures; Accepted for IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING

[55]. A Joint Approach to Compound Splitting and Idiomatic Compound Detection
链接 | https://arxiv.org/abs/2003.09606
作者 | Irina Krotova, Sergey Aksenov, Ekaterina Artemova
备注 | 8 pages, 5 tables, 1 figure, accepted at LREC 2020

[56]. Analyzing Word Translation of Transformer Layers
链接 | https://arxiv.org/abs/2003.09586
作者 | Hongfei Xu, Josef van Genabith, Deyi Xiong, Qiuhui Liu

[57]. A Framework for Generating Explanations from Temporal Personal Health Data
链接 | https://arxiv.org/abs/2003.09530
作者 | Jonathan J. Harris, Ching-Hua Chen, Mohammed J. Zaki
单位 | Rensselaer Polytechnic Institute; IBM Research

[58]. TArC: Incrementally and Semi-Automatically Collecting a Tunisian Arabish Corpus
链接 | https://arxiv.org/abs/2003.08529
作者 | Elisa Gugliotta, Marco Dinarelli
备注 | Paper accepted at the Language Resources and Evaluation Conference (LREC) 2020

[59]. A Better Variant of Self-Critical Sequence Training
链接 | https://arxiv.org/abs/2003.09971
作者 | Ruotian Luo
单位 | TTI-Chicago

[60]. Pairwise Multi-Class Document Classification for Semantic Relations between Wikipedia Articles
链接 | https://arxiv.org/abs/2003.09881
作者 | Malte Ostendorff, Terry Ruas, Moritz Schubotz, Georg Rehm, Bela Gipp
备注 | Accepted at ACM/IEEE Joint Conference on Digital Libraries (JCDL 2020)

[61]. Invariant Rationalization
链接 | https://arxiv.org/abs/2003.09772
作者 | Shiyu Chang, Yang Zhang, Mo Yu, Tommi S. Jaakkola
单位 | MIT; IBM

2020年3月23日

[62]. FedNER: Privacy-preserving Medical Named Entity Recognition with Federated Learning
链接 | https://arxiv.org/abs/2003.09288
作者 | Suyu Ge, Fangzhao Wu, Chuhan Wu, Tao Qi, Yongfeng Huang, Xing Xie
单位 | Tsinghua University; Microsoft Research Asia

[63]. Language Technology Programme for Icelandic 2019-2023
链接 | https://arxiv.org/abs/2003.08717
作者 | Anna Björk Nikulásdóttir, Jón Guðnason, Anton Karl Ingason, Hrafn Loftsson, Eiríkur Rögnvaldsson, Einar Freyr Sigurðsson, Steinþór Steingrímsson
备注 | Accepted at LREC 2020

[64]. Parallel Intent and Slot Prediction using MLB Fusion
链接 | https://arxiv.org/abs/2003.09211
作者 | Anmol Bhasin, Bharatram Natarajan, Gaurav Mathur, Himanshu Mangla

[65]. TNT-KID: Transformer-based Neural Tagger for Keyword Identification
链接 | https://arxiv.org/abs/2003.09166
作者 | Matej Martinc, Blaž Škrlj, Senja Pollak
备注 | Submitted to Natural Language Engineering journal

[66]. NSURL-2019 Task 7: Named Entity Recognition (NER) in Farsi
链接 | https://arxiv.org/abs/2003.09029
作者 | Nasrin Taghizadeh, Zeinab Borhanifard, Melika GolestaniPour, Heshaam Faili

[67]. Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems
链接 | https://arxiv.org/abs/2003.09024
作者 | Nikolay Malkovsky, Vladimir Bataev, Dmitrii Sviridkin, Natalia Kizhaeva, Aleksandr Laptev, Ildar Valiev, Oleg Petrov
备注 | Submitted to Interspeech 2020

[68]. Learning to Encode Position for Transformer with Continuous Dynamical Model
链接 | https://arxiv.org/abs/2003.09229
作者 | Xuanqing Liu, Hsiang-Fu Yu, Inderjit Dhillon, Cho-Jui Hsieh
单位 | UCLA; UT Austin; Amazon

[69]. Detecting Mismatch between Text Script and Voice-over Using Utterance Verification Based on Phoneme Recognition Ranking
链接 | https://arxiv.org/abs/2003.09180
作者 | Yoonjae Jeong, Hoon-Young Cho
备注 | Accepted by ICASSP 2020

[70]. Automatic Identification of Types of Alterations in Historical Manuscripts
链接 | https://arxiv.org/abs/2003.09136
作者 | David Lassner, Anne Baillot, Sergej Dogadov, Klaus-Robert Müller, Shinichi Nakajima

[71]. The value of text for small business default prediction: A deep learning approach
链接 | https://arxiv.org/abs/2003.08964
作者 | Matthew Stevenson, Christophe Mues, Cristián Bravo

想要了解更多的自然语言处理最新进展、技术干货及学习教程，欢迎关注微信公众号“语言智能技术笔记簿”或扫描二维码添加关注。

一周新论文 | 2020年第13周 | 自然语言处理相关

《一周新论文》系列之2020年第13周：自然语言处理相关

本周重点关注：

2020年3月27日

2020年3月26日

2020年3月25日

2020年3月24日

2020年3月23日

工作中用到的脚本合集

微服务实践Aspire项目发布到远程k8s集群

通过f-string编写简洁高效的Python格式化输出代码

[转帖]20个常用的Linux工具命令

[转帖]PostgreSQL从小白到高手教程 - 第46讲：poc-tpch测试

24-5-18 X

頂會速遞 | ICLR 2020錄用論文之自然語言處理篇

一週新論文 | 2020年第11周 | 自然語言處理相關

請查收！頂會AAAI 2020錄用論文之神經架構搜索與推薦系統篇合集

Ubuntu系統搭建深度學習開發環境

一起讀論文 | 高質量的同行評審意見應該寫哪些內容及如何組織？

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結