Hot Papers 2020-08-31

1. On the model-based stochastic value gradient for continuous reinforcement learning

Brandon Amos, Samuel Stanton, Denis Yarats, Andrew Gordon Wilson

retweets: 35, favorites: 155 (09/01/2020 08:26:39)
links: abs | pdf
cs.LG | cs.AI | cs.RO | stat.ML

Model-based reinforcement learning approaches add explicit domain knowledge to agents in hopes of improving the sample-efficiency in comparison to model-free agents. However, in practice model-based methods are unable to achieve the same asymptotic performance on challenging continuous control tasks due to the complexity of learning and controlling an explicit world model. In this paper we investigate the stochastic value gradient (SVG), which is a well-known family of methods for controlling continuous systems which includes model-based approaches that distill a model-based value expansion into a model-free policy. We consider a variant of the model-based SVG that scales to larger systems and uses 1) an entropy regularization to help with exploration, 2) a learned deterministic world model to improve the short-horizon value estimate, and 3) a learned model-free value estimate after the model’s rollout. This SVG variation captures the model-free soft actor-critic method as an instance when the model rollout horizon is zero, and otherwise uses short-horizon model rollouts to improve the value estimate for the policy update. We surpass the asymptotic performance of other model-based methods on the proprioceptive MuJoCo locomotion tasks from the OpenAI gym, including a humanoid. We notably achieve these results with a simple deterministic world model without requiring an ensemble.

In our new paper we scale model-based reinforcement learning to the gym humanoid by using short-horizon model rollouts followed by a learned model-free value estimate.

Paper: https://t.co/UytqwsqKdz
Videos: https://t.co/cjZ60UJg6Z

With @sam_d_stanton @denisyarats @andrewgwils pic.twitter.com/6RIZ3YuIHh
— Brandon Amos (@brandondamos) August 31, 2020

2. Causal blankets: Theory and algorithmic framework

Fernando E. Rosas, Pedro A.M. Mediano, Martin Biehl, Shamil Chandaria, Daniel Polani

retweets: 23, favorites: 79 (09/01/2020 08:26:39)
links: abs | pdf
nlin.AO | cs.AI | q-bio.NC

We introduce a novel framework to identify perception-action loops (PALOs) directly from data based on the principles of computational mechanics. Our approach is based on the notion of causal blanket, which captures sensory and active variables as dynamical sufficient statistics — i.e. as the “differences that make a difference.” Moreover, our theory provides a broadly applicable procedure to construct PALOs that requires neither a steady-state nor Markovian dynamics. Using our theory, we show that every bipartite stochastic process has a causal blanket, but the extent to which this leads to an effective PALO formulation varies depending on the integrated information of the bipartition.

Preprint time: “Causal Blankets: Theory and algorithmic framework”https://t.co/DAicqZpNKX

Imagine having data from two interactive systems, and wondering if it can be described as a perception-action loop? We think it can always be, but depends on its integrated information.
— Fernando Rosas (@_fernando_rosas) August 31, 2020

3. AllenAct: A Framework for Embodied AI Research

Luca Weihs, Jordi Salvador, Klemen Kotar, Unnat Jain, Kuo-Hao Zeng, Roozbeh Mottaghi, Aniruddha Kembhavi

retweets: 7, favorites: 61 (09/01/2020 08:26:39)
links: abs | pdf
cs.CV | cs.AI | cs.LG | cs.MA | cs.RO

The domain of Embodied AI, in which agents learn to complete tasks through interaction with their environment from egocentric observations, has experienced substantial growth with the advent of deep reinforcement learning and increased interest from the computer vision, NLP, and robotics communities. This growth has been facilitated by the creation of a large number of simulated environments (such as AI2-THOR, Habitat and CARLA), tasks (like point navigation, instruction following, and embodied question answering), and associated leaderboards. While this diversity has been beneficial and organic, it has also fragmented the community: a huge amount of effort is required to do something as simple as taking a model trained in one environment and testing it in another. This discourages good science. We introduce AllenAct, a modular and flexible learning framework designed with a focus on the unique requirements of Embodied AI research. AllenAct provides first-class support for a growing collection of embodied environments, tasks and algorithms, provides reproductions of state-of-the-art models and includes extensive documentation, tutorials, start-up code, and pre-trained models. We hope that our framework makes Embodied AI more accessible and encourages new researchers to join this exciting area. The framework can be accessed at: https://allenact.org/

This graph shows the growth of Embodied AI over the past 5 years. We used the data from @SemanticScholar to create it.

More details here: https://t.co/zokjsbKHK1

figure credit: @anikembhavi pic.twitter.com/qeN7pAPsqC
— Roozbeh Mottaghi (@RoozbehMottaghi) August 31, 2020

Published 1 Sep 2020

ML Lead at Beatrust. (https://beatrust.com)Tatsuya Shirakawa on Twitter