All Articles

Hot Papers 2020-10-28

1. COG: Connecting New Skills to Past Experience with Offline Reinforcement Learning

Avi Singh, Albert Yu, Jonathan Yang, Jesse Zhang, Aviral Kumar, Sergey Levine

  • retweets: 556, favorites: 131 (10/29/2020 09:41:19)
  • links: abs | pdf
  • cs.LG | cs.RO

Reinforcement learning has been applied to a wide variety of robotics problems, but most of such applications involve collecting data from scratch for each new task. Since the amount of robot data we can collect for any single task is limited by time and cost considerations, the learned behavior is typically narrow: the policy can only execute the task in a handful of scenarios that it was trained on. What if there was a way to incorporate a large amount of prior data, either from previously solved tasks or from unsupervised or undirected environment interaction, to extend and generalize learned behaviors? While most prior work on extending robotic skills using pre-collected data focuses on building explicit hierarchies or skill decompositions, we show in this paper that we can reuse prior data to extend new skills simply through dynamic programming. We show that even when the prior data does not actually succeed at solving the new task, it can still be utilized for learning a better policy, by providing the agent with a broader understanding of the mechanics of its environment. We demonstrate the effectiveness of our approach by chaining together several behaviors seen in prior datasets for solving a new task, with our hardest experimental setting involving composing four robotic skills in a row: picking, placing, drawer opening, and grasping, where a +1/0 sparse reward is provided only on task completion. We train our policies in an end-to-end fashion, mapping high-dimensional image observations to low-level robot control commands, and present results in both simulated and real world domains. Additional materials and source code can be found on our project website: https://sites.google.com/view/cog-rl

2. Scientific intuition inspired by machine learning generated hypotheses

Pascal Friederich, Mario Krenn, Isaac Tamblyn, Alan Aspuru-Guzik

Machine learning with application to questions in the physical sciences has become a widely used tool, successfully applied to classification, regression and optimization tasks in many areas. Research focus mostly lies in improving the accuracy of the machine learning models in numerical predictions, while scientific understanding is still almost exclusively generated by human researchers analysing numerical results and drawing conclusions. In this work, we shift the focus on the insights and the knowledge obtained by the machine learning models themselves. In particular, we study how it can be extracted and used to inspire human scientists to increase their intuitions and understanding of natural systems. We apply gradient boosting in decision trees to extract human interpretable insights from big data sets from chemistry and physics. In chemistry, we not only rediscover widely know rules of thumb but also find new interesting motifs that tell us how to control solubility and energy levels of organic molecules. At the same time, in quantum physics, we gain new understanding on experiments for quantum entanglement. The ability to go beyond numerics and to enter the realm of scientific insight and hypothesis generation opens the door to use machine learning to accelerate the discovery of conceptual understanding in some of the most challenging domains of science.

3. MELD: Meta-Reinforcement Learning from Images via Latent State Models

Tony Z. Zhao, Anusha Nagabandi, Kate Rakelly, Chelsea Finn, Sergey Levine

Meta-reinforcement learning algorithms can enable autonomous agents, such as robots, to quickly acquire new behaviors by leveraging prior experience in a set of related training tasks. However, the onerous data requirements of meta-training compounded with the challenge of learning from sensory inputs such as images have made meta-RL challenging to apply to real robotic systems. Latent state models, which learn compact state representations from a sequence of observations, can accelerate representation learning from visual inputs. In this paper, we leverage the perspective of meta-learning as task inference to show that latent state models can \emph{also} perform meta-learning given an appropriately defined observation space. Building on this insight, we develop meta-RL with latent dynamics (MELD), an algorithm for meta-RL from images that performs inference in a latent state model to quickly acquire new skills given observations and rewards. MELD outperforms prior meta-RL methods on several simulated image-based robotic control problems, and enables a real WidowX robotic arm to insert an Ethernet cable into new locations given a sparse task completion signal after only 88 hours of real world meta-training. To our knowledge, MELD is the first meta-RL algorithm trained in a real-world robotic control setting from images.

4. Wavelet Flow: Fast Training of High Resolution Normalizing Flows

Jason J. Yu, Konstantinos G. Derpanis, Marcus A. Brubaker

  • retweets: 121, favorites: 69 (10/29/2020 09:41:19)
  • links: abs | pdf
  • cs.CV | cs.LG

Normalizing flows are a class of probabilistic generative models which allow for both fast density computation and efficient sampling and are effective at modelling complex distributions like images. A drawback among current methods is their significant training cost, sometimes requiring months of GPU training time to achieve state-of-the-art results. This paper introduces Wavelet Flow, a multi-scale, normalizing flow architecture based on wavelets. A Wavelet Flow has an explicit representation of signal scale that inherently includes models of lower resolution signals and conditional generation of higher resolution signals, i.e., super resolution. A major advantage of Wavelet Flow is the ability to construct generative models for high resolution data (e.g., 1024 x 1024 images) that are impractical with previous models. Furthermore, Wavelet Flow is competitive with previous normalizing flows in terms of bits per dimension on standard (low resolution) benchmarks while being up to 15x faster to train.

5. Toward Better Generalization Bounds with Locally Elastic Stability

Zhun Deng, Hangfeng He, Weijie J. Su

  • retweets: 42, favorites: 51 (10/29/2020 09:41:19)
  • links: abs | pdf
  • cs.LG | cs.NE

Classical approaches in learning theory are often seen to yield very loose generalization bounds for deep neural networks. Using the example of “stability and generalization” \citep{bousquet2002stability}, however, we demonstrate that generalization bounds can be significantly improved by taking into account refined characteristics of modern neural networks. Specifically, this paper proposes a new notion of algorithmic stability termed \textit{locally elastic stability} in light of a certain phenomenon in the training of neural networks \citep{he2020local}. We prove that locally elastic stability implies a tighter generalization bound than that derived based on uniform stability in many situations. When applied to deep neural networks, our new generalization bound attaches much more meaningful confidence statements to the performance on unseen data than existing algorithmic stability notions, thereby shedding light on the effectiveness of modern neural networks in real-world applications.

6. Apps Against the Spread: Privacy Implications and User Acceptance of COVID-19-Related Smartphone Apps on Three Continents

Christine Utz, Steffen Becker, Theodor Schnitzler, Florian M. Farke, Franziska Herbert, Leonie Schaewitz, Martin Degeling, Markus DĂŒrmuth

  • retweets: 72, favorites: 12 (10/29/2020 09:41:20)
  • links: abs | pdf
  • cs.HC | cs.CY

The COVID-19 pandemic has fueled the development of smartphone applications to assist disease management. These “corona apps” may require widespread adoption to be effective, which has sparked public debates about the privacy, security, and societal implications of government-backed health applications. We conducted a representative online study in Germany (n = 1,003), the US (n = 1,003), and China (n = 1,019) to investigate user acceptance of corona apps, using a vignette design based on the contextual integrity framework. We explored apps for contact tracing, symptom checks, quarantine enforcement, health certificates, and mere information. Our results provide insights into data processing practices that foster adoption and reveal significant differences between countries, with user acceptance being highest in China and lowest in the US. Chinese participants prefer the collection of personalized data, while German and US participants favor anonymity. Across countries, contact tracing is viewed more positively than quarantine enforcement, and technical malfunctions negatively impact user acceptance.

7. FragmentVC: Any-to-Any Voice Conversion by End-to-End Extracting and Fusing Fine-Grained Voice Fragments With Attention

Yist Y. Lin, Chung-Ming Chien, Jheng-Hao Lin, Hung-yi Lee, Lin-shan Lee

Any-to-any voice conversion aims to convert the voice from and to any speakers even unseen during training, which is much more challenging compared to one-to-one or many-to-many tasks, but much more attractive in real-world scenarios. In this paper we proposed FragmentVC, in which the latent phonetic structure of the utterance from the source speaker is obtained from Wav2Vec 2.0, while the spectral features of the utterance(s) from the target speaker are obtained from log mel-spectrograms. By aligning the hidden structures of the two different feature spaces with a two-stage training process, FragmentVC is able to extract fine-grained voice fragments from the target speaker utterance(s) and fuse them into the desired utterance, all based on the attention mechanism of Transformer as verified with analysis on attention maps, and is accomplished end-to-end. This approach is trained with reconstruction loss only without any disentanglement considerations between content and speaker information and doesn’t require parallel data. Objective evaluation based on speaker verification and subjective evaluation with MOS both showed that this approach outperformed SOTA approaches, such as AdaIN-VC and AutoVC.

8. Generating 3D Molecular Structures Conditional on a Receptor Binding Site with Deep Generative Models

Tomohide Masuda, Matthew Ragoza, David Ryan Koes

Deep generative models have been applied with increasing success to the generation of two dimensional molecules as SMILES strings and molecular graphs. In this work we describe for the first time a deep generative model that can generate 3D molecular structures conditioned on a three-dimensional (3D) binding pocket. Using convolutional neural networks, we encode atomic density grids into separate receptor and ligand latent spaces. The ligand latent space is variational to support sampling of new molecules. A decoder network generates atomic densities of novel ligands conditioned on the receptor. Discrete atoms are then fit to these continuous densities to create molecular structures. We show that valid and unique molecules can be readily sampled from the variational latent space defined by a reference `seed’ structure and generated structures have reasonable interactions with the binding site. As structures are sampled farther in latent space from the seed structure, the novelty of the generated structures increases, but the predicted binding affinity decreases. Overall, we demonstrate the feasibility of conditional 3D molecular structure generation and provide a starting point for methods that also explicitly optimize for desired molecular properties, such as high binding affinity.

9. Parallel waveform synthesis based on generative adversarial networks with voicing-aware conditional discriminators

Ryuichi Yamamoto, Eunwoo Song, Min-Jae Hwang, Jae-Min Kim

This paper proposes voicing-aware conditional discriminators for Parallel WaveGAN-based waveform synthesis systems. In this framework, we adopt a projection-based conditioning method that can significantly improve the discriminator’s performance. Furthermore, the conventional discriminator is separated into two waveform discriminators for modeling voiced and unvoiced speech. As each discriminator learns the distinctive characteristics of the harmonic and noise components, respectively, the adversarial training process becomes more efficient, allowing the generator to produce more realistic speech waveforms. Subjective test results demonstrate the superiority of the proposed method over the conventional Parallel WaveGAN and WaveNet systems. In particular, our speaker-independently trained model within a FastSpeech 2 based text-to-speech framework achieves the mean opinion scores of 4.20, 4.18, 4.21, and 4.31 for four Japanese speakers, respectively.

10. Random walks and community detection in hypergraphs

Timoteo Carletti, Duccio Fanelli, Renaud Lambiotte

We propose a one parameter family of random walk processes on hypergraphs, where a parameter biases the dynamics of the walker towards hyperedges of low or high cardinality. We show that for each value of the parameter the resulting process defines its own hypergraph projection on a weighted network. We then explore the differences between them by considering the community structure associated to each random walk process. To do so, we generalise the Markov stability framework to hypergraphs and test it on artificial and real-world hypergraphs.

11. Examining the causal structures of deep neural networks using information theory

Simon Mattsson, Eric J. Michaud, Erik Hoel

  • retweets: 30, favorites: 25 (10/29/2020 09:41:20)
  • links: abs | pdf
  • cs.LG | cs.AI

Deep Neural Networks (DNNs) are often examined at the level of their response to input, such as analyzing the mutual information between nodes and data sets. Yet DNNs can also be examined at the level of causation, exploring “what does what” within the layers of the network itself. Historically, analyzing the causal structure of DNNs has received less attention than understanding their responses to input. Yet definitionally, generalizability must be a function of a DNN’s causal structure since it reflects how the DNN responds to unseen or even not-yet-defined future inputs. Here, we introduce a suite of metrics based on information theory to quantify and track changes in the causal structure of DNNs during training. Specifically, we introduce the effective information (EI) of a feedforward DNN, which is the mutual information between layer input and output following a maximum-entropy perturbation. The EI can be used to assess the degree of causal influence nodes and edges have over their downstream targets in each layer. We show that the EI can be further decomposed in order to examine the sensitivity of a layer (measured by how well edges transmit perturbations) and the degeneracy of a layer (measured by how edge overlap interferes with transmission), along with estimates of the amount of integrated information of a layer. Together, these properties define where each layer lies in the “causal plane” which can be used to visualize how layer connectivity becomes more sensitive or degenerate over time, and how integration changes during training, revealing how the layer-by-layer causal structure differentiates. These results may help in understanding the generalization capabilities of DNNs and provide foundational tools for making DNNs both more generalizable and more explainable.

12. Benchmarking Deep Learning Interpretability in Time Series Predictions

Aya Abdelsalam Ismail, Mohamed Gunady, HĂ©ctor Corrada Bravo, Soheil Feizi

Saliency methods are used extensively to highlight the importance of input features in model predictions. These methods are mostly used in vision and language tasks, and their applications to time series data is relatively unexplored. In this paper, we set out to extensively compare the performance of various saliency-based interpretability methods across diverse neural architectures, including Recurrent Neural Network, Temporal Convolutional Networks, and Transformers in a new benchmark of synthetic time series data. We propose and report multiple metrics to empirically evaluate the performance of saliency methods for detecting feature importance over time using both precision (i.e., whether identified features contain meaningful signals) and recall (i.e., the number of features with signal identified as important). Through several experiments, we show that (i) in general, network architectures and saliency methods fail to reliably and accurately identify feature importance over time in time series data, (ii) this failure is mainly due to the conflation of time and feature domains, and (iii) the quality of saliency maps can be improved substantially by using our proposed two-step temporal saliency rescaling (TSR) approach that first calculates the importance of each time step before calculating the importance of each feature at a time step.