頂會速遞 | ICLR 2020錄用論文之強化學習篇

抽空爲大家整理了人工智能頂會ICLR 2020錄用的強化學習相關的最新論文,感興趣的朋友們趕緊Mark讀起來吧!

Dynamics-Aware Unsupervised Skill Discovery
鏈接 | https://openreview.net/pdf?id=HJgLZR4KvH
作者 | Archit Sharma, Shixiang Gu, Sergey Levine, Vikash Kumar, Karol Hausman
單位 | Google Brain

Contrastive Learning of Structured World Models
鏈接 | https://openreview.net/pdf?id=H1gax6VtDB
作者 | Thomas Kipf, Elise van der Pol, Max Welling
單位 | University of Amsterdam

Implementation Matters in Deep RL: A Case Study on PPO and TRPO
鏈接 | https://openreview.net/pdf?id=r1etN1rtPB
作者 | Logan Engstrom, Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Firdaus Janoos, Larry Rudolph, Aleksander Madry

GenDICE: Generalized Offline Estimation of Stationary Values
鏈接 | https://openreview.net/pdf?id=HkxlcnVFwB
作者 | Ruiyi Zhang, Bo Dai, Lihong Li, Dale Schuurmans
單位 | Duke University; Google Brain

Causal Discovery with Reinforcement Learning
鏈接 | https://openreview.net/pdf?id=S1g2skStPB
作者 | Shengyu Zhu, Ignavier Ng, Zhitang Chen
Huawei Noah’s Ark Lab; University of Toronto

Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning?
鏈接 | https://openreview.net/pdf?id=r1genAVKPB
作者 | Simon S. Du, Sham M. Kakade, Ruosong Wang, Lin F. Yang
單位 | University of Washington; Carnegie Mellon University; University of California, Los Angles

Harnessing Structures for Value-Based Planning and Reinforcement Learning
鏈接 | https://openreview.net/pdf?id=rklHqRVKvH
作者 | Yuzhe Yang, Guo Zhang, Zhi Xu, Dina Katabi
單位 | MIT

Explain Your Move: Understanding Agent Actions Using Focused Feature Saliency
鏈接 | https://openreview.net/pdf?id=SJgzLkBKPB
作者 | Piyush Gupta, Nikaash Puri, Sukriti Verma, Dhruv Kayastha, Shripad Deshmukh, Balaji Krishnamurthy, Sameer Singh
單位 | Adobe;

Meta-Q-Learning
鏈接 | https://openreview.net/pdf?id=SJeD3CEFPH
作者 | Rasool Fakoor, Pratik Chaudhari, Stefano Soatto, Alexander J. Smola
Amazon; University of Pennsylvania

Discriminative Particle Filter Reinforcement Learning for Complex Partial observations
鏈接 | https://openreview.net/pdf?id=HJl8_eHYvS
作者 | Xiao Ma, Peter Karkus, David Hsu, Wee Sun Lee, Nan Ye
單位 | National Unviersity of Singapore; The University of Queesland

Disagreement-Regularized Imitation Learning
鏈接 | https://openreview.net/pdf?id=rkgbYyHtwB
作者 | Kiante Brantley, Wen Sun, Mikael Henaff
單位 | University of Maryland; Microsoft Research

Doubly Robust Bias Reduction in Infinite Horizon Off-Policy Estimation
鏈接 | https://openreview.net/pdf?id=S1glGANtDr
作者 | Ziyang Tang, Yihao Feng, Lihong Li, Dengyong Zhou, Qiang Liu
單位 | The University of Texas at Austin; Google Research

SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference
鏈接 | https://openreview.net/pdf?id=rkgvXlrKwH
作者 | Lasse Espeholt, Raphaël Marinier, Piotr Stanczyk, Ke Wang, Marcin Michalski
單位 | Google Research

The Ingredients of Real World Robotic Reinforcement Learning
鏈接 | https://openreview.net/pdf?id=rJe2syrtvS
作者 | Henry Zhu, Justin Yu, Abhishek Gupta, Dhruv Shah, Kristian Hartikainen, Avi Singh, Vikash Kumar, Sergey Levine

Watch the Unobserved: A Simple Approach to Parallelizing Monte Carlo Tree Search
鏈接 | https://openreview.net/pdf?id=BJlQtJSKDB
作者 | Anji Liu, Jianshu Chen, Mingze Yu, Yu Zhai, Xuewen Zhou, Ji Liu
單位 | Tencent AI Lab

Meta-Learning Acquisition Functions for Transfer Learning in Bayesian Optimization
鏈接 | https://openreview.net/pdf?id=ryeYpJSKwr
作者 | Michael Volpp, Lukas P. Fröhlich, Kirsten Fischer, Andreas Doerr, Stefan Falkner, Frank Hutter, Christian Daniel

A Closer Look at Deep Policy Gradients
鏈接 | https://openreview.net/pdf?id=ryxdEkHtPS
作者 | Andrew Ilyas, Logan Engstrom, Shibani Santurkar, Dimitris Tsipras, Firdaus Janoos, Larry Rudolph, Aleksander Madry

Fast Task Inference with Variational Intrinsic Successor Features
鏈接 | https://openreview.net/pdf?id=BJeAHkrYDS
作者 | Steven Hansen, Will Dabney, Andre Barreto, David Warde-Farley, Tom Van de Wiele, Volodymyr Mnih
單位 | DeepMind

Learning to Plan in High Dimensions via Neural Exploration-Exploitation Trees
鏈接 | https://openreview.net/pdf?id=rJgJDAVKvB
作者 | Binghong Chen, Bo Dai, Qinjie Lin, Guo Ye, Han Liu, Le Song
單位 | Georgia Institute of Technology; Google Research; Northwestern University

Dream to Control: Learning Behaviors by Latent Imagination
鏈接 | https://openreview.net/pdf?id=S1lOTC4tDS
作者 | Danijar Hafner, Timothy Lillicrap, Jimmy Ba, Mohammad Norouzi
單位 | University of Toronto; DeepMind; Google Brain

Making Efficient Use of Demonstrations to Solve Hard Exploration Problems
鏈接 | https://openreview.net/pdf?id=SygKyeHKDH
作者 | Caglar Gulcehre, Tom Le Paine, Bobak Shahriari, Misha Denil, Matt Hoffman, Hubert Soyer, Richard Tanburn, Steven Kapturowski, Neil Rabinowitz, Duncan Williams, Gabriel Barth-Maron, Ziyu Wang, Nando de Freitas, Worlds Team
單位 | DeepMind

Intrinsic Motivation for Encouraging Synergistic Behavior
鏈接 | https://openreview.net/pdf?id=SJleNCNtDH
作者 | Rohan Chitnis, Shubham Tulsiani, Saurabh Gupta, Abhinav Gupta
單位 | MIT; Facebook AI Research

SQIL: Imitation Learning via Reinforcement Learning with Sparse Rewards
鏈接 | https://openreview.net/pdf?id=S1xKd24twB
作者 | Siddharth Reddy, Anca D. Dragan, Sergey Levine
單位 | UC Berkeley

Reinforcement Learning with Competitive Ensembles of Information-Constrained Primitives
鏈接 | https://openreview.net/pdf?id=ryxgJTEYDr
作者 | Anirudh Goyal, Shagun Sodhani, Jonathan Binas, Xue Bin Peng, Sergey Levine, Yoshua Bengio

Multi-Agent Interactions Modeling with Correlated Policies
鏈接 | https://openreview.net/pdf?id=B1gZV1HYvS
作者 | Minghuan Liu, Ming Zhou, Weinan Zhang, Yuzheng Zhuang, Jun Wang, Wulong Liu, Yong Yu
單位 | Shanghai Jiaotong University; Huawei Noah’s Ark Lab

Influence-Based Multi-Agent Exploration
鏈接 | https://openreview.net/pdf?id=BJgy96EYvr
作者 | Tonghan Wang, Jianhao Wang, Yi Wu, Chongjie Zhang
單位 | Tsinghua University

Learning the Arrow of Time for Problems in Reinforcement Learning
鏈接 | https://openreview.net/pdf?id=rylJkpEtwS
作者 | Nasim Rahaman, Steffen Wolf, Anirudh Goyal, Roman Remme, Yoshua Bengio
單位 | MILA

AMRL: Aggregated Memory For Reinforcement Learning
鏈接 | https://openreview.net/pdf?id=Bkl7bREtDr
作者 | Jacob Beck, Kamil Ciosek, Sam Devlin, Sebastian Tschiatschek, Cheng Zhang, Katja Hofmann
單位 | Microsoft Research

Model Based Reinforcement Learning for Atari
鏈接 | https://openreview.net/pdf?id=S1xCPJHtDB
作者 | Łukasz Kaiser, Mohammad Babaeizadeh, Piotr Miłos, Błażej Osiński, Roy H Campbell, Konrad Czechowski, Dumitru Erhan, Chelsea Finn, Piotr Kozakowski, Sergey Levine, Afroz Mohiuddin, Ryan Sepassi, George Tucker, Henryk Michalewski
單位 | Google Brain

Variational Recurrent Models for Solving Partially Observable Control Tasks
鏈接 | https://openreview.net/pdf?id=r1lL4a4tDB
作者 | Dongqi Han, Kenji Doya, Jun Tani

Sample Efficient Policy Gradient Methods with Recursive Variance Reduction
鏈接 | https://openreview.net/pdf?id=HJlxIJBFDr
作者 | Pan Xu, Felicia Gao, Quanquan Gu
單位 | University of California, Los Angeles

Exploring Model-based Planning with Policy Networks
鏈接 | https://openreview.net/pdf?id=H1exf64KwH
作者 | Tingwu Wang, Jimmy Ba
單位 | University of Toronto; Vector Institute

Reinforcement Learning Based Graph-to-Sequence Model for Natural Question Generation
鏈接 | https://openreview.net/pdf?id=HygnDhEtvr
作者 | Yu Chen, Lingfei Wu, Mohammed J. Zaki
單位 | Rensselaer Polytechnic Institute; IBM Research

RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments
鏈接 | https://openreview.net/pdf?id=rkg-TJBFPB
作者 | Roberta Raileanu, Tim Rocktäschel
單位 | New York University; University College London

Learning Expensive Coordination: An Event-Based Deep RL Approach
鏈接 | https://openreview.net/pdf?id=ryeG924twB
作者 | Zhenyu Shi, Runsheng Yu, Xinrun Wang, Rundong Wang, Youzhi Zhang, Hanjiang Lai, Bo An
單位 | Nanyang Technological University; Sun Yat-sen University

Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning
鏈接 | https://openreview.net/pdf?id=SJxbHkrKDH
作者 | Qian Long, Zihan Zhou, Abhinav Gupta, Fei Fang, Yi Wu, Xiaolong Wang
單位 | CMU; OpenAI; Facebook AI Research; SJTU; UCSD

Making Sense of Reinforcement Learning and Probabilistic Inference
鏈接 | https://openreview.net/pdf?id=S1xitgHtvS
作者 | Brendan O’Donoghue, Ian Osband, Catalin Ionescu

Reinforced Genetic Algorithm Learning for Optimizing Computation Graphs
鏈接 | https://openreview.net/pdf?id=rkxDoJBYPB
作者 | Aditya Paliwal, Felix Gimeno, Vinod Nair, Yujia Li, Miles Lubin, Pushmeet Kohli, Oriol Vinyals
單位 | Google Research; DeepMind;

Never Give Up: Learning Directed Exploration Strategies
鏈接 | https://openreview.net/pdf?id=Sye57xStvB
作者 | Adrià Puigdomènech Badia, Pablo Sprechmann, Alex Vitvitskyi, Daniel Guo, Bilal Piot, Steven Kapturowski, Olivier Tieleman, Martin Arjovsky, Alexander Pritzel, Andrew Bolt, Charles Blundell
單位 | DeepMind

Robust Reinforcement Learning for Continuous Control with Model Misspecification
鏈接 | https://openreview.net/pdf?id=HJgC60EtwB
作者 | Daniel J. Mankowitz, Nir Levine, Rae Jeong, Abbas Abdolmaleki, Jost Tobias Springenberg, Yuanyuan Shi, Jackie Kay, Todd Hester, Timothy Mann, Martin Riedmiller
單位 | DeepMind

Synthesizing Programmatic Policies that Inductively Generalize
鏈接 | https://openreview.net/pdf?id=S1l8oANFDH
作者 | Jeevana Priya Inala, Osbert Bastani, Zenna Tavares, Armando Solar-Lezama
單位 | MIT; University of Pennsylvania

Adaptive Correlated Monte Carlo for Contextual Categorical Sequence Generation
鏈接 | https://openreview.net/pdf?id=r1lOgyrKDS
作者 | Xinjie Fan, Yizhe Zhang, Zhendong Wang, Mingyuan Zhou
單位 | University of Texas at Austin; Microsoft Research; Columbia University

Improving Generalization in Meta Reinforcement Learning using Learned Objectives
鏈接 | https://openreview.net/pdf?id=S1evHerYPr
作者 | Louis Kirsch, Sjoerd van Steenkiste, Juergen Schmidhuber

Single Episode Policy Transfer in Reinforcement Learning
鏈接 | https://openreview.net/pdf?id=rJeQoCNYDS
作者 | Jiachen Yang, Brenden Petersen, Hongyuan Zha, Daniel Faissol
單位 | Georgia Institute of Technology

DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames
鏈接 | https://openreview.net/pdf?id=H1gX8C4YPr
作者 | Erik Wijmans, Abhishek Kadian, Ari Morcos, Stefan Lee, Irfan Essa, Devi Parikh, Manolis Savva, Dhruv Batra
單位 | Georgia Institute of Technology; Facebook AI Research

Geometric Insights into the Convergence of Nonlinear TD Learning
鏈接 | https://openreview.net/pdf?id=SJezGp4YPr
作者 | David Brandfonbrener, Joan Bruna
單位 | New York University

Dynamics-Aware Embeddings
鏈接 | https://openreview.net/pdf?id=BJgZGeHFPH
作者 | William Whitney, Rajat Agarwal, Kyunghyun Cho, Abhinav Gupta
單位 | New York University; Carnegie Mellon University; Facebook AI Research

Reanalysis of Variance Reduced Temporal Difference Learning
鏈接 | https://openreview.net/pdf?id=S1ly10EKDS
作者 | Tengyu Xu, Zhe Wang, Yi Zhou, Yingbin Liang
單位 | Ohio State University; University of Utah

Q-learning with UCB Exploration is Sample Efficient for Infinite-Horizon MDP
鏈接 | https://openreview.net/pdf?id=BkglSTNFDB
作者 | Yuanhao Wang, Kefan Dong, Xiaoyu Chen, Liwei Wang
單位 | Tsinghua University; Peking University

Automated curriculum generation through setter-solver interactions
鏈接 | https://openreview.net/pdf?id=H1e0Wp4KvH
作者 | Sebastien Racaniere, Andrew Lampinen, Adam Santoro, David Reichert, Vlad Firoiu, Timothy Lillicrap
單位 | DeepMind

Optimistic Exploration even with a Pessimistic Initialisation
鏈接 | https://openreview.net/pdf?id=r1xGP6VYwH
作者 | Tabish Rashid, Bei Peng, Wendelin Boehmer, Shimon Whiteson
單位 | University of Oxford

Multi-agent Reinforcement Learning for Networked System Control
鏈接 | https://openreview.net/pdf?id=Syx7A3NFvH
作者 | Tianshu Chu, Sandeep Chinchali, Sachin Katti
單位 | Stanford University

A Learning-based Iterative Method for Solving Vehicle Routing Problems
鏈接 | https://openreview.net/pdf?id=BJe1334YDH
作者 | Hao Lu, Xingwen Zhang, Shuang Yang
單位 | Princeton University

Sharing Knowledge in Multi-Task Deep Reinforcement Learning
鏈接 | https://openreview.net/pdf?id=rkgpv2VFvr
作者 | Carlo D’Eramo, Davide Tateo, Andrea Bonarini, Marcello Restelli, Jan Peters

RTFM: Generalising to New Environment Dynamics via Reading
鏈接 | https://openreview.net/pdf?id=SJgob6NKvH
作者 | Victor Zhong, Tim Rocktäschel, Edward Grefenstette
單位 | University of Washington; University College London; Facebook AI Research

Meta Reinforcement Learning with Autonomous Inference of Subtask Dependencies
鏈接 | https://openreview.net/pdf?id=HkgsWxrtPB
作者 | Sungryull Sohn, Hyunjae Woo, Jongwook Choi, Honglak Lee
單位 | University of Michigan; Google Brain

Projection-Based Constrained Policy Optimization
鏈接 | https://openreview.net/pdf?id=rke3TJrtPS
作者 | Tsung-Yen Yang, Justinian Rosca, Karthik Narasimhan, Peter J. Ramadge
單位 | Princeton University;

Graph Constrained Reinforcement Learning for Natural Language Action Spaces
鏈接 | https://openreview.net/pdf?id=B1x6w0EtwH
作者 | Prithviraj Ammanabrolu, Matthew Hausknecht
單位 | Georgia Institute of Technology; Microsoft Research

V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control
鏈接 | https://openreview.net/pdf?id=SylOlp4FvH
作者 | H. Francis Song, Abbas Abdolmaleki, Jost Tobias Springenberg, Aidan Clark, Hubert Soyer, Jack W. Rae, Seb Noury, Arun Ahuja, Siqi Liu, Dhruva Tirumala, Nicolas Heess, Dan Belov, Martin Riedmiller, Matthew M. Botvinick
單位 | DeepMind

Thinking While Moving: Deep Reinforcement Learning with Concurrent Control
鏈接 | https://openreview.net/pdf?id=Hke0V1rKPS
作者 | Ted Xiao, Eric Jang, Dmitry Kalashnikov, Sergey Levine, Julian Ibarz, Karol Hausman, Alexander Herzog
單位 | Nanyang Technological University; MILA

Keep Doing What Worked: Behavior Modelling Priors for Offline Reinforcement Learning
鏈接 | https://openreview.net/pdf?id=rke7geHtwH
作者 | Noah Siegel, Jost Tobias Springenberg, Felix Berkenkamp, Abbas Abdolmaleki, Michael Neunert, Thomas Lampe, Roland Hafner, Nicolas Heess, Martin Riedmiller
單位 | DeepMind

Imitation Learning via Off-Policy Distribution Matching
鏈接 | https://openreview.net/pdf?id=Hyg-JC4FDr
作者 | Ilya Kostrikov, Ofir Nachum, Jonathan Tompson
單位 | Google Research

Adversarial AutoAugment
鏈接 | https://openreview.net/pdf?id=ByxdUySKvS
作者 | Xinyu Zhang, Qiang Wang, Jian Zhang, Zhao Zhong

Option Discovery using Deep Skill Chaining
鏈接 | https://openreview.net/pdf?id=B1gqipNYwH
作者 | Akhil Bagaria, George Konidaris
單位 | Brown University

State-only Imitation with Transition Dynamics Mismatch
鏈接 | https://openreview.net/pdf?id=HJgLLyrYwB
作者 | Tanmay Gangwani, Jian Peng
單位 | University of Illinois, Urbana-Champaign

The Gambler’s Problem and Beyond
鏈接 | https://openreview.net/pdf?id=HyxnMyBKwB
作者 | Baoxiang Wang, Shuai Li, Jiajin Li, Siu On Chan
單位 | Chinese University of Hong Kong; Shanghai Jiao Tong University

Structured Object-Aware Physics Prediction for Video Modeling and Planning
鏈接 | https://openreview.net/pdf?id=B1e-kxSKDH
作者 | Jannik Kossen, Karl Stelzner, Marcel Hussing, Claas Voelcker, Kristian Kersting

Dynamical Distance Learning for Semi-Supervised and Unsupervised Skill Discovery
鏈接 | https://openreview.net/pdf?id=H1lmhaVtvr
作者 | Kristian Hartikainen, Xinyang Geng, Tuomas Haarnoja, Sergey Levine

Exploration in Reinforcement Learning with Deep Covering Options
鏈接 | https://openreview.net/pdf?id=SkeIyaVtwB
作者 | Yuu Jinnai, Jee Won Park, Marlos C. Machado, George Konidaris
單位 | Brown University; Google Brain

CM3: Cooperative Multi-goal Multi-stage Multi-agent Reinforcement Learning
鏈接 | https://openreview.net/pdf?id=S1lEX04tPr
作者 | Jiachen Yang, Alireza Nakhaei, David Isele, Kikuo Fujimura, Hongyuan Zha
單位 | Georgia Institute of Technology

Learning to Coordinate Manipulation Skills via Skill Behavior Diversification
鏈接 | https://openreview.net/pdf?id=ryxB2lBtvH
作者 | Youngwoon Lee, Jingyun Yang, Joseph J. Lim
單位 | University of Southern California

Composing Task-Agnostic Policies with Deep Reinforcement Learning
鏈接 | https://openreview.net/pdf?id=H1ezFREtwH
作者 | Ahmed H. Qureshi, Jacob J. Johnson, Yuzhe Qin, Taylor Henderson, Byron Boots, Michael C. Yip
單位 | UC San Diego; University of Washington

Frequency-based Search-control in Dyna
鏈接 | https://openreview.net/pdf?id=B1gskyStwr
作者 | Yangchen Pan, Jincheng Mei, Amir-massoud Farahmand
單位 | University of Alberta; Vector Institute; University of Toronto

Black-box Off-policy Estimation for Infinite-Horizon Reinforcement Learning
鏈接 | https://openreview.net/pdf?id=S1ltg1rFDS
作者 | Ali Mousavi, Lihong Li, Qiang Liu, Denny Zhou
單位 | Google Research; University of Texas, Austin

CAQL: Continuous Action Q-Learning
鏈接 | https://openreview.net/pdf?id=BkxXe0Etwr
作者 | Moonkyung Ryu, Yinlam Chow, Ross Anderson, Christian Tjandraatmadja, Craig Boutilier
單位 | Google Research

Reinforced active learning for image segmentation
鏈接 | https://openreview.net/pdf?id=SkgC6TNFvr
作者 | Arantxa Casanova, Pedro O. Pinheiro, Negar Rostamzadeh, Christopher J. Pal
單位 | MILA; Element AI

The Variational Bandwidth Bottleneck: Stochastic Evaluation on an Information Budget
鏈接 | https://openreview.net/pdf?id=Hye1kTVFDS
作者 | Anirudh Goyal, Yoshua Bengio, Matthew Botvinick, Sergey Levine

Hierarchical Foresight: Self-Supervised Learning of Long-Horizon Tasks via Visual Subgoal Generation
鏈接 | https://openreview.net/pdf?id=H1gzR2VKDH
作者 | Suraj Nair, Chelsea Finn
單位 | Stanford University; Google Brain

Maximum Likelihood Constraint Inference for Inverse Reinforcement Learning
鏈接 | https://openreview.net/pdf?id=BJliakStvH
作者 | Dexter R.R. Scobee, S. Shankar Sastry
單位 | UC Berkeley

AutoQ: Automated Kernel-Wise Neural Network Quantization
鏈接 | https://openreview.net/pdf?id=rygfnn4twS
作者 | Qian Lou, Feng Guo, Minje Kim, Lantao Liu, Lei Jiang.

VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning
鏈接 | https://openreview.net/pdf?id=Hkl9JlBYvr
作者 | Luisa Zintgraf, Kyriacos Shiarlis, Maximilian Igl, Sebastian Schulze, Yarin Gal, Katja Hofmann, Shimon Whiteson
單位 | University of Oxford; Microsoft Research

Watch, Try, Learn: Meta-Learning from Demonstrations and Rewards
鏈接 | https://openreview.net/pdf?id=SJg5J6NtDr
作者 | Allan Zhou, Eric Jang, Daniel Kappler, Alex Herzog, Mohi Khansari, Paul Wohlhart, Yunfei Bai, Mrinal Kalakrishnan, Sergey Levine, Chelsea Finn
單位 | Google Brain; UC Berkeley; Stanford

Population-Guided Parallel Policy Search for Reinforcement Learning
鏈接 | https://openreview.net/pdf?id=rJeINp4KwH
作者 | Whiyoung Jung, Giseung Park, Youngchul Sung

Network Randomization: A Simple Technique for Generalization in Deep Reinforcement Learning
鏈接 | https://openreview.net/pdf?id=HJgcvJBFvB
作者 | Kimin Lee, Kibok Lee, Jinwoo Shin, Honglak Lee
單位 | University of Michigan; Google Brain

On the Weaknesses of Reinforcement Learning for Neural Machine Translation
鏈接 | https://openreview.net/pdf?id=H1eCw3EKvH
作者 | Leshem Choshen, Lior Fox, Zohar Aizenbud, Omri Abend

State Alignment-based Imitation Learning
鏈接 | https://openreview.net/pdf?id=rylrdxHFDr
作者 | Fangchen Liu, Zhan Ling, Tongzhou Mu, Hao Su
單位 | University of California San Diego

Finding and Visualizing Weaknesses of Deep Reinforcement Learning Agents
鏈接 | https://openreview.net/pdf?id=rylvYaNYDH
作者 | Christian Rupprecht, Cyril Ibrahim, Christopher J. Pal
單位 | University of Oxford; Element AI; MILA

Model-Augmented Actor-Critic: Backpropagating through Paths
鏈接 | https://openreview.net/pdf?id=Skln2A4YDB
作者 | Ignasi Clavera, Yao Fu, Pieter Abbeel

Behaviour Suite for Reinforcement Learning
鏈接 | https://openreview.net/pdf?id=rygf-kSYwH
作者 | Ian Osband, Yotam Doron, Matteo Hessel, John Aslanides, Eren Sezener, Andre Saraiva, Katrina McKinney, Tor Lattimore, Csaba Szepesvari, Satinder Singh, Benjamin Van Roy, Richard Sutton, David Silver, Hado Van Hasselt
單位 | DeepMind

Learning Heuristics for Quantified Boolean Formulas through Reinforcement Learning
鏈接 | https://openreview.net/pdf?id=BJluxREKDB
作者 | Gil Lederman, Markus Rabe, Sanjit Seshia, Edward A. Lee
單位 | UC Berkeley; Google Research

Maxmin Q-learning: Controlling the Estimation Bias of Q-learning
鏈接 | https://openreview.net/pdf?id=Bkg0u3Etwr
作者 | Qingfeng Lan, Yangchen Pan, Alona Fyshe, Martha White
單位 | University of Alberta

Hypermodels for Exploration
鏈接 | https://openreview.net/pdf?id=ryx6WgStPB
作者 | Vikranth Dwaracherla, Xiuyuan Lu, Morteza Ibrahimi, Ian Osband, Zheng Wen, Benjamin Van Roy

Sub-policy Adaptation for Hierarchical Reinforcement Learning
鏈接 | https://openreview.net/pdf?id=ByeWogStDS
作者 | Alexander Li, Carlos Florensa, Ignasi Clavera, Pieter Abbeel
單位 | UC Berkeley

SVQN: Sequential Variational Soft Q-Learning Networks
鏈接 | https://openreview.net/pdf?id=r1xPh2VtPB
作者 | Shiyu Huang, Hang Su, Jun Zhu, Ting Chen
單位 | Tsinghua University

IMPACT: Importance Weighted Asynchronous Architectures with Clipped Target Networks
鏈接 | https://openreview.net/pdf?id=BJeGlJStPr
作者 | Michael Luo, Jiahao Yao, Richard Liaw, Eric Liang, Ion Stoica
單位 | UC Berkeley

Ranking Policy Gradient
鏈接 | https://openreview.net/pdf?id=rJld3hEYvS
作者 | Kaixiang Lin, Jiayu Zhou
單位 | Michigan State University

Model-based reinforcement learning for biological sequence design
鏈接 | https://openreview.net/pdf?id=HklxbgBKvr
作者 | Christof Angermueller, David Dohan, David Belanger, Ramya Deshpande, Kevin Murphy, Lucy Colwell
單位 | Google Research; Caltech

Learning Nearly Decomposable Value Functions Via Communication Minimization
鏈接 | https://openreview.net/pdf?id=HJx-3grYDB
作者 | Tonghan Wang, Jianhao Wang, Chongyi Zheng, Chongjie Zhang
單位 | Tsinghua University

Implementing Inductive bias for different navigation tasks through diverse RNN attrractors
鏈接 | https://openreview.net/pdf?id=Byx4NkrtDS
作者 | Tie XU, Omri Barak

Toward Evaluating Robustness of Deep Reinforcement Learning with Continuous Control
鏈接 | https://openreview.net/pdf?id=SylL0krYPS
作者 | Tsui-Wei Weng, Krishnamurthy (Dj) Dvijotham, Jonathan Uesato, Kai Xiao, Sven Gowal, Robert Stanforth, Pushmeet Kohli
單位 | MIT; DeepMind

Learning Efficient Parameter Server Synchronization Policies for Distributed SGD
鏈接 | https://openreview.net/pdf?id=rJxX8T4Kvr
作者 | Rong Zhu, Sheng Yang, Andreas Pfadler, Zhengping Qian, Jingren Zhou

Episodic Reinforcement Learning with Associative Memory
鏈接 | https://openreview.net/pdf?id=HkxjqxBYDB
作者 | Guangxiang Zhu, Zichuan Lin, Guangwen Yang, Chongjie Zhang
單位 | Tsinghua University

Logic and the 2-Simplicial Transformer
鏈接 | https://openreview.net/pdf?id=rkecJ6VFvr
作者 | James Clift, Dmitry Doryn, Daniel Murfet, James Wallbridge
單位 | University of Melbourne

Exploratory Not Explanatory: Counterfactual Analysis of Saliency Maps for Deep Reinforcement Learning
鏈接 | https://openreview.net/pdf?id=rkl3m1BFDB
作者 | Akanksha Atrey, Kaleigh Clary, David Jensen
單位 | University of Massachusetts Amherst

Playing the lottery with rewards and multiple languages: lottery tickets in RL and NLP
鏈接 | https://openreview.net/pdf?id=S1xnXRVFwH
作者 | Haonan Yu, Sergey Edunov, Yuandong Tian, Ari S. Morcos
單位 | Facebook AI Research


想要了解更多的自然語言處理最新進展、技術乾貨及學習教程,歡迎關注微信公衆號“語言智能技術筆記簿”或掃描二維碼添加關注。
在這裏插入圖片描述

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章