1. SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning
Kimin Lee, Michael Laskin, Aravind Srinivas, Pieter Abbeel
Model-free deep reinforcement learning (RL) has been successful in a range of challenging domains. However, there are some remaining issues, such as stabilizing the optimization of nonlinear function approximators, preventing error propagation due to the Bellman backup in Q-learning, and efficient exploration. To mitigate these issues, we present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy RL algorithms. SUNRISE integrates three key ingredients: (a) bootstrap with random initialization which improves the stability of the learning process by training a diverse ensemble of agents, (b) weighted Bellman backups, which prevent error propagation in Q-learning by reweighing sample transitions based on uncertainty estimates from the ensembles, and (c) an inference method that selects actions using highest upper-confidence bounds for efficient exploration. Our experiments show that SUNRISE significantly improves the performance of existing off-policy RL algorithms, such as Soft Actor-Critic and Rainbow DQN, for both continuous and discrete control tasks on both low-dimensional and high-dimensional environments. Our training code is available at https://github.com/pokaxpoka/sunrise.
Can ensemble improve off-policy RL by handling various issues?
— Kimin (@kimin_le2) July 10, 2020
Yes! We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy RL algorithms. New work with @MishaLaskin @AravSrinivas and @pabbeel
Paper: https://t.co/RMIGNVcqGU
1/N
SUNRISE can easily be applied to off-policy methods, like SAC and Rainbow DQN, improving their performance on OpenAI Gym, DM Control, and Atari.
— Pieter Abbeel (@pabbeel) July 10, 2020
Paper: https://t.co/LXGi6nAzTZ
Code: https://t.co/p4kVU5CkW1
w/@kimin_le2 @MishaLaskin @AravSrinivas
2. Reformulation of the No-Free-Lunch Theorem for Entangled Data Sets
Kunal Sharma, M. Cerezo, Zoë Holmes, Lukasz Cincio, Andrew Sornborger, Patrick J. Coles
The No-Free-Lunch (NFL) theorem is a celebrated result in learning theory that limits one’s ability to learn a function with a training data set. With the recent rise of quantum machine learning, it is natural to ask whether there is a quantum analog of the NFL theorem, which would restrict a quantum computer’s ability to learn a unitary process (the quantum analog of a function) with quantum training data. However, in the quantum setting, the training data can possess entanglement, a strong correlation with no classical analog. In this work, we show that entangled data sets lead to an apparent violation of the (classical) NFL theorem. This motivates a reformulation that accounts for the degree of entanglement in the training set. As our main result, we prove a quantum NFL theorem whereby the fundamental limit on the learnability of a unitary is reduced by entanglement. We employ Rigetti’s quantum computer to test both the classical and quantum NFL theorems. Our work establishes that entanglement is a commodity in quantum machine learning.
New work: Reformulating the Quantum No-Free-Lunch Theorem for Entangled Data Setshttps://t.co/EF4j5LMqiv
— Marco Cerezo (@MvsCerezo) July 10, 2020
In collaboration w/ @kunal_phy, Z. Holmes, and L. Cincio, @sornborg, and @PatrickColes314
Our results where verified w/ @rigetti's device!
👇See thread for details 👇 pic.twitter.com/tB0cPrfTwb
Our new paper is best summarized by the cartoon below.
— Patrick Coles (@PatrickColes314) July 10, 2020
Joint work with @kunal_phy, @MvsCerezo, Z. Holmes, L. Cincio, @sornborg.
Scirate link: https://t.co/Jlibl14cfj pic.twitter.com/gyDu9JAypP
check out our new results:
— Kunal Sharma (@kunal_phy) July 10, 2020
"Reformulating the Quantum No-Free-Lunch Theorem for Entangled Data Sets"https://t.co/NwAjQWI8Js
in collaboration with @MvsCerezo, Zoe, Lukasz, @sornborg, and @PatrickColes314
see @MvsCerezo's thread for a short summary of our results. 👇 https://t.co/ovAOiKnItx
3. Learning Graph Structure With A Finite-State Automaton Layer
Daniel D. Johnson, Hugo Larochelle, Daniel Tarlow
Graph-based neural network models are producing strong results in a number of domains, in part because graphs provide flexibility to encode domain knowledge in the form of relational structure (edges) between nodes in the graph. In practice, edges are used both to represent intrinsic structure (e.g., abstract syntax trees of programs) and more abstract relations that aid reasoning for a downstream task (e.g., results of relevant program analyses). In this work, we study the problem of learning to derive abstract relations from the intrinsic graph structure. Motivated by their power in program analyses, we consider relations defined by paths on the base graph accepted by a finite-state automaton. We show how to learn these relations end-to-end by relaxing the problem into learning finite-state automata policies on a graph-based POMDP and then training these policies using implicit differentiation. The result is a differentiable Graph Finite-State Automaton (GFSA) layer that adds a new edge type (expressed as a weighted adjacency matrix) to a base graph. We demonstrate that this layer can find shortcuts in grid-world graphs and reproduce simple static analyses on Python programs. Additionally, we combine the GFSA layer with a larger graph-based model trained end-to-end on the variable misuse program understanding task, and find that using the GFSA layer leads to better performance than using hand-engineered semantic edges or other baseline methods for adding learned edge types.
We are excited to present the Graph Finite-State Automaton (GFSA) layer, which learns to add long-distance edges to graphs end-to-end based on a downstream objective!https://t.co/o81LidfMRD
— Daniel Johnson (@hexahedria) July 10, 2020
(With @numbercrunching and @hugo_larochelle. 1/9) pic.twitter.com/5IC4SLbwZd
4. Words as Art Materials: Generating Paintings with Sequential GANs
Azmi Can Özgen, Hazım Kemal Ekenel
Converting text descriptions into images using Generative Adversarial Networks has become a popular research area. Visually appealing images have been generated successfully in recent years. Inspired by these studies, we investigated the generation of artistic images on a large variance dataset. This dataset includes images with variations, for example, in shape, color, and content. These variations in images provide originality which is an important factor for artistic essence. One major characteristic of our work is that we used keywords as image descriptions, instead of sentences. As the network architecture, we proposed a sequential Generative Adversarial Network model. The first stage of this sequential model processes the word vectors and creates a base image whereas the next stages focus on creating high-resolution artistic-style images without working on word vectors. To deal with the unstable nature of GANs, we proposed a mixture of techniques like Wasserstein loss, spectral normalization, and minibatch discrimination. Ultimately, we were able to generate painting images, which have a variety of styles. We evaluated our results by using the Fr’echet Inception Distance score and conducted a user study with 186 participants.
Words as Art Materials: Generating Paintings with Sequential GANs
— AK (@ak92501) July 10, 2020
pdf: https://t.co/pcAoNebBUY
abs: https://t.co/H7bW3MmhXe
github: https://t.co/6nsj0uiNWq
demo: https://t.co/iAyp4glCn9 pic.twitter.com/WElYGl1EUB
5. ThreeDWorld: A Platform for Interactive Multi-Modal Physical Simulation
Chuang Gan, Jeremy Schwartz, Seth Alter, Martin Schrimpf, James Traer, Julian De Freitas, Jonas Kubilius, Abhishek Bhandwaldar, Nick Haber, Megumi Sano, Kuno Kim, Elias Wang, Damian Mrowca, Michael Lingelbach, Aidan Curtis, Kevin Feigelis, Daniel M. Bear, Dan Gutfreund, David Cox, James J. DiCarlo, Josh McDermott, Joshua B. Tenenbaum, Daniel L.K. Yamins
We introduce ThreeDWorld (TDW), a platform for interactive multi-modal physical simulation. With TDW, users can simulate high-fidelity sensory data and physical interactions between mobile agents and objects in a wide variety of rich 3D environments. TDW has several unique properties: 1) realtime near photo-realistic image rendering quality; 2) a library of objects and environments with materials for high-quality rendering, and routines enabling user customization of the asset library; 3) generative procedures for efficiently building classes of new environments 4) high-fidelity audio rendering; 5) believable and realistic physical interactions for a wide variety of material types, including cloths, liquid, and deformable objects; 6) a range of “avatar” types that serve as embodiments of AI agents, with the option for user avatar customization; and 7) support for human interactions with VR devices. TDW also provides a rich API enabling multiple agents to interact within a simulation and return a range of sensor and physics data representing the state of the world. We present initial experiments enabled by the platform around emerging research directions in computer vision, machine learning, and cognitive science, including multi-modal physical scene understanding, multi-agent interactions, models that “learn like a child”, and attention studies in humans and neural networks. The simulation platform will be made publicly available.
ThreeDWorld: A Platform for Interactive Multi-Modal Physical Simulationhttps://t.co/A5M4AUBdKJ pic.twitter.com/iZ6I23YpY8
— sim2real (@sim2realAIorg) July 10, 2020
Physics + Sound+ Interaction! Check out our TDW platform- ThreeDWorld: A Platform for Interactive Multi-Modal Physical Simulation. Project page: https://t.co/yuryI8YQvS Paper link:https://t.co/UTboxDYGLu
— Chuang Gan (@gan_chuang) July 10, 2020
6. Finite mixture models are typically inconsistent for the number of components
Diana Cai, Trevor Campbell, Tamara Broderick
Scientists and engineers are often interested in learning the number of subpopulations (or components) present in a data set. Practitioners commonly use a Dirichlet process mixture model (DPMM) for this purpose; in particular, they count the number of clusters---i.e. components containing at least one data point---in the DPMM posterior. But Miller and Harrison (2013) warn that the DPMM cluster-count posterior is severely inconsistent for the number of latent components when the data are truly generated from a finite mixture; that is, the cluster-count posterior probability on the true generating number of components goes to zero in the limit of infinite data. A potential alternative is to use a finite mixture model (FMM) with a prior on the number of components. Past work has shown the resulting FMM component-count posterior is consistent. But existing results crucially depend on the assumption that the component likelihoods are perfectly specified. In practice, this assumption is unrealistic, and empirical evidence (Miller and Dunson, 2019) suggests that the FMM posterior on the number of components is sensitive to the likelihood choice. In this paper, we add rigor to data-analysis folk wisdom by proving that under even the slightest model misspecification, the FMM posterior on the number of components is ultraseverely inconsistent: for any finite , the posterior probability that the number of components is converges to 0 in the limit of infinite data. We illustrate practical consequences of our theory on simulated and real data sets.
New preprint with Trevor Campbell and Tamara Broderick: https://t.co/1yiCsx0OZf
— Diana Cai (@dianarycai) July 10, 2020
Finite mixture models are typically inconsistent for the posterior # of components -- we say "typically" because misspecification of the component distributions is almost unavoidable in practice
7. Improving Style-Content Disentanglement in Image-to-Image Translation
Aviv Gabbay, Yedid Hoshen
Unsupervised image-to-image translation methods have achieved tremendous success in recent years. However, it can be easily observed that their models contain significant entanglement which often hurts the translation performance. In this work, we propose a principled approach for improving style-content disentanglement in image-to-image translation. By considering the information flow into each of the representations, we introduce an additional loss term which serves as a content-bottleneck. We show that the results of our method are significantly more disentangled than those produced by current methods, while further improving the visual quality and translation diversity.
Improving Style-Content Disentanglement in Image-to-Image Translation
— AK (@ak92501) July 10, 2020
pdf: https://t.co/86UflGKakj
abs: https://t.co/rBR4t1PdgT
project page: https://t.co/f2O8lOxF9q pic.twitter.com/OcghxoOIZz