All Articles

Hot Papers 2020-11-04

1. Learning Deformable Tetrahedral Meshes for 3D Reconstruction

Jun Gao, Wenzheng Chen, Tommy Xiang, Alec Jacobson, Morgan McGuire, Sanja Fidler

  • retweets: 1625, favorites: 310 (11/05/2020 11:13:31)
  • links: abs | pdf
  • cs.CV

3D shape representations that accommodate learning-based 3D reconstruction are an open problem in machine learning and computer graphics. Previous work on neural 3D reconstruction demonstrated benefits, but also limitations, of point cloud, voxel, surface mesh, and implicit function representations. We introduce Deformable Tetrahedral Meshes (DefTet) as a particular parameterization that utilizes volumetric tetrahedral meshes for the reconstruction problem. Unlike existing volumetric approaches, DefTet optimizes for both vertex placement and occupancy, and is differentiable with respect to standard 3D reconstruction loss functions. It is thus simultaneously high-precision, volumetric, and amenable to learning-based neural architectures. We show that it can represent arbitrary, complex topology, is both memory and computationally efficient, and can produce high-fidelity reconstructions with a significantly smaller grid size than alternative volumetric approaches. The predicted surfaces are also inherently defined as tetrahedral meshes, thus do not require post-processing. We demonstrate that DefTet matches or exceeds both the quality of the previous best approaches and the performance of the fastest ones. Our approach obtains high-quality tetrahedral meshes computed directly from noisy point clouds, and is the first to showcase high-quality 3D tet-mesh results using only a single image as input.

2. Sample-efficient reinforcement learning using deep Gaussian processes

Charles Gadd, Markus Heinonen, Harri Lähdesmäki, Samuel Kaski

Reinforcement learning provides a framework for learning to control which actions to take towards completing a task through trial-and-error. In many applications observing interactions is costly, necessitating sample-efficient learning. In model-based reinforcement learning efficiency is improved by learning to simulate the world dynamics. The challenge is that model inaccuracies rapidly accumulate over planned trajectories. We introduce deep Gaussian processes where the depth of the compositions introduces model complexity while incorporating prior knowledge on the dynamics brings smoothness and structure. Our approach is able to sample a Bayesian posterior over trajectories. We demonstrate highly improved early sample-efficiency over competing methods. This is shown across a number of continuous control tasks, including the half-cheetah whose contact dynamics have previously posed an insurmountable problem for earlier sample-efficient Gaussian process based models.

3. StyleMelGAN: An Efficient High-Fidelity Adversarial Vocoder with Temporal Adaptive Normalization

Ahmed Mustafa, Nicola Pia, Guillaume Fuchs

In recent years, neural vocoders have surpassed classical speech generation approaches in naturalness and perceptual quality of the synthesized speech. Computationally heavy models like WaveNet and WaveGlow achieve best results, while lightweight GAN models, e.g. MelGAN and Parallel WaveGAN, remain inferior in terms of perceptual quality. We therefore propose StyleMelGAN, a lightweight neural vocoder allowing synthesis of high-fidelity speech with low computational complexity. StyleMelGAN employs temporal adaptive normalization to style a low-dimensional noise vector with the acoustic features of the target speech. For efficient training, multiple random-window discriminators adversarially evaluate the speech signal analyzed by a filter bank, with regularization provided by a multi-scale spectral reconstruction loss. The highly parallelizable speech generation is several times faster than real-time on CPUs and GPUs. MUSHRA and P.800 listening tests show that StyleMelGAN outperforms prior neural vocoders in copy-synthesis and Text-to-Speech scenarios.

4. Recommendations for Bayesian hierarchical model specifications for case-control studies in mental health

Vincent Valton, Toby Wise, Oliver J. Robinson

Hierarchical model fitting has become commonplace for case-control studies of cognition and behaviour in mental health. However, these techniques require us to formalise assumptions about the data-generating process at the group level, which may not be known. Specifically, researchers typically must choose whether to assume all subjects are drawn from a common population, or to model them as deriving from separate populations. These assumptions have profound implications for computational psychiatry, as they affect the resulting inference (latent parameter recovery) and may conflate or mask true group-level differences. To test these assumptions we ran systematic simulations on synthetic multi-group behavioural data from a commonly used multi-armed bandit task (reinforcement learning task). We then examined recovery of group differences in latent parameter space under the two commonly used generative modelling assumptions: (1) modelling groups under a common shared group-level prior (assuming all participants are generated from a common distribution, and are likely to share common characteristics); (2) modelling separate groups based on symptomatology or diagnostic labels, resulting in separate group-level priors. We evaluated the robustness of these approaches to variations in data quality and prior specifications on a variety of metrics. We found that fitting groups separately (assumptions 2), provided the most accurate and robust inference across all conditions. Our results suggest that when dealing with data from multiple clinical groups, researchers should analyse patient and control groups separately as it provides the most accurate and robust recovery of the parameters of interest.

5. The Complexity of Gradient Descent: CLS = PPAD \cap PLS

John Fearnley, Paul W. Goldberg, Alexandros Hollender, Rahul Savani

We study search problems that can be solved by performing Gradient Descent on a bounded convex polytopal domain and show that this class is equal to the intersection of two well-known classes: PPAD and PLS. As our main underlying technical contribution, we show that computing a Karush-Kuhn-Tucker (KKT) point of a continuously differentiable function over the domain [0,1]2[0,1]^2 is PPAD \cap PLS-complete. This is the first natural problem to be shown complete for this class. Our results also imply that the class CLS (Continuous Local Search) - which was defined by Daskalakis and Papadimitriou as a more “natural” counterpart to PPAD \cap PLS and contains many interesting problems - is itself equal to PPAD \cap PLS.

6. CharBERT: Character-aware Pre-trained Language Model

Wentao Ma, Yiming Cui, Chenglei Si, Ting Liu, Shijin Wang, Guoping Hu

  • retweets: 90, favorites: 50 (11/05/2020 11:13:33)
  • links: abs | pdf
  • cs.CL

Most pre-trained language models (PLMs) construct word representations at subword level with Byte-Pair Encoding (BPE) or its variations, by which OOV (out-of-vocab) words are almost avoidable. However, those methods split a word into subword units and make the representation incomplete and fragile. In this paper, we propose a character-aware pre-trained language model named CharBERT improving on the previous methods (such as BERT, RoBERTa) to tackle these problems. We first construct the contextual word embedding for each token from the sequential character representations, then fuse the representations of characters and the subword representations by a novel heterogeneous interaction module. We also propose a new pre-training task named NLM (Noisy LM) for unsupervised character representation learning. We evaluate our method on question answering, sequence labeling, and text classification tasks, both on the original datasets and adversarial misspelling test sets. The experimental results show that our method can significantly improve the performance and robustness of PLMs simultaneously. Pretrained models, evaluation sets, and code are available at https://github.com/wtma/CharBERT

7. Supervised Contrastive Learning for Pre-trained Language Model Fine-tuning

Beliz Gunel, Jingfei Du, Alexis Conneau, Ves Stoyanov

  • retweets: 62, favorites: 40 (11/05/2020 11:13:33)
  • links: abs | pdf
  • cs.CL | cs.LG

State-of-the-art natural language understanding classification models follow two-stages: pre-training a large language model on an auxiliary task, and then fine-tuning the model on a task-specific labeled dataset using cross-entropy loss. Cross-entropy loss has several shortcomings that can lead to sub-optimal generalization and instability. Driven by the intuition that good generalization requires capturing the similarity between examples in one class and contrasting them with examples in other classes, we propose a supervised contrastive learning (SCL) objective for the fine-tuning stage. Combined with cross-entropy, the SCL loss we propose obtains improvements over a strong RoBERTa-Large baseline on multiple datasets of the GLUE benchmark in both the high-data and low-data regimes, and it does not require any specialized architecture, data augmentation of any kind, memory banks, or additional unsupervised data. We also demonstrate that the new objective leads to models that are more robust to different levels of noise in the training data, and can generalize better to related tasks with limited labeled task data.

8. Transforming Gaussian Processes With Normalizing Flows

Juan Maroñas, Oliver Hamelijnck, Jeremias Knoblauch, Theodoros Damoulas

  • retweets: 64, favorites: 35 (11/05/2020 11:13:33)
  • links: abs | pdf
  • cs.LG

Gaussian Processes (GPs) can be used as flexible, non-parametric function priors. Inspired by the growing body of work on Normalizing Flows, we enlarge this class of priors through a parametric invertible transformation that can be made input-dependent. Doing so also allows us to encode interpretable prior knowledge (e.g., boundedness constraints). We derive a variational approximation to the resulting Bayesian inference problem, which is as fast as stochastic variational GP regression (Hensman et al., 2013; Dezfouli and Bonilla,2015). This makes the model a computationally efficient alternative to other hierarchical extensions of GP priors (Lazaro-Gredilla,2012; Damianou and Lawrence, 2013). The resulting algorithm’s computational and inferential performance is excellent, and we demonstrate this on a range of data sets. For example, even with only 5 inducing points and an input-dependent flow, our method is consistently competitive with a standard sparse GP fitted using 100 inducing points.

9. Revisiting Adaptive Convolutions for Video Frame Interpolation

Simon Niklaus, Long Mai, Oliver Wang

  • retweets: 56, favorites: 28 (11/05/2020 11:13:33)
  • links: abs | pdf
  • cs.CV

Video frame interpolation, the synthesis of novel views in time, is an increasingly popular research direction with many new papers further advancing the state of the art. But as each new method comes with a host of variables that affect the interpolation quality, it can be hard to tell what is actually important for this task. In this work, we show, somewhat surprisingly, that it is possible to achieve near state-of-the-art results with an older, simpler approach, namely adaptive separable convolutions, by a subtle set of low level improvements. In doing so, we propose a number of intuitive but effective techniques to improve the frame interpolation quality, which also have the potential to other related applications of adaptive convolutions such as burst image denoising, joint image filtering, or video prediction.

10. Levels of Coupling in Dyadic Interaction: An Analysis of Neural and Behavioral Complexity

Georgina Montserrat Reséndiz-Benhumea, Ekaterina Sangati, Tom Froese

  • retweets: 30, favorites: 20 (11/05/2020 11:13:33)
  • links: abs | pdf
  • cs.MA

From an enactive approach, some previous studies have demonstrated that social interaction plays a fundamental role in the dynamics of neural and behavioral complexity of embodied agents. In particular, it has been shown that agents with a limited internal structure (2-neuron brains) that evolve in interaction can overcome this limitation and exhibit chaotic neural activity, typically associated with more complex dynamical systems (at least 3-dimensional). In the present paper we make two contributions to this line of work. First, we propose a conceptual distinction in levels of coupling between agents that could have an effect on neural and behavioral complexity. Second, we test the generalizability of previous results by testing agents with richer internal structure and evolving them in a richer, yet non-social, environment. We demonstrate that such agents can achieve levels of complexity comparable to agents that evolve in interactive settings. We discuss the significance of this result for the study of interaction.