All Articles

Hot Papers 2020-07-27

1. The Surprising Effectiveness of Linear Unsupervised Image-to-Image Translation

Eitan Richardson, Yair Weiss

  • retweets: 74, favorites: 361 (07/28/2020 10:32:23)
  • links: abs | pdf
  • cs.CV

Unsupervised image-to-image translation is an inherently ill-posed problem. Recent methods based on deep encoder-decoder architectures have shown impressive results, but we show that they only succeed due to a strong locality bias, and they fail to learn very simple nonlocal transformations (e.g. mapping upside down faces to upright faces). When the locality bias is removed, the methods are too powerful and may fail to learn simple local transformations. In this paper we introduce linear encoder-decoder architectures for unsupervised image to image translation. We show that learning is much easier and faster with these architectures and yet the results are surprisingly effective. In particular, we show a number of local problems for which the results of the linear methods are comparable to those of state-of-the-art architectures but with a fraction of the training time, and a number of nonlocal problems for which the state-of-the-art fails while linear methods succeed.

2. Multi-view adaptive graph convolutions for graph classification

Nikolas Adaloglou, Nicholas Vretos, Petros Daras

  • retweets: 21, favorites: 118 (07/28/2020 10:32:24)
  • links: abs | pdf
  • cs.CV | cs.LG

In this paper, a novel multi-view methodology for graph-based neural networks is proposed. A systematic and methodological adaptation of the key concepts of classical deep learning methods such as convolution, pooling and multi-view architectures is developed for the context of non-Euclidean manifolds. The aim of the proposed work is to present a novel multi-view graph convolution layer, as well as a new view pooling layer making use of: a) a new hybrid Laplacian that is adjusted based on feature distance metric learning, b) multiple trainable representations of a feature matrix of a graph, using trainable distance matrices, adapting the notion of views to graphs and c) a multi-view graph aggregation scheme called graph view pooling, in order to synthesise information from the multiple generated views. The aforementioned layers are used in an end-to-end graph neural network architecture for graph classification and show competitive results to other state-of-the-art methods.

3. The Representation Theory of Neural Networks

Marco Antonio Armenta, Pierre-Marc Jodoin

In this work, we show that neural networks can be represented via the mathematical theory of quiver representations. More specifically, we prove that a neural network is a quiver representation with activation functions, a mathematical object that we represent using a {\em network quiver}. Also, we show that network quivers gently adapt to common neural network concepts such as fully-connected layers, convolution operations, residual connections, batch normalization, and pooling operations. We show that this mathematical representation is by no means an approximation of what neural networks are as it exactly matches reality. This interpretation is algebraic and can be studied with algebraic methods. We also provide a quiver representation model to understand how a neural network creates representations from the data. We show that a neural network saves the data as quiver representations, and maps it to a geometrical space called the {\em moduli space}, which is given in terms of the underlying oriented graph of the network. This results as a consequence of our defined objects and of understanding how the neural network computes a prediction in a combinatorial and algebraic way. Overall, representing neural networks through the quiver representation theory leads to 13 consequences that we believe are of great interest to better understand what neural networks are and how they work.

4. Body2Hands: Learning to Infer 3D Hands from Conversational Gesture Body Dynamics

Evonne Ng, Hanbyul Joo, Shiry Ginosar, Trevor Darrell

  • retweets: 17, favorites: 59 (07/28/2020 10:32:24)
  • links: abs | pdf
  • cs.CV

We propose a novel learned deep prior of body motion for 3D hand shape synthesis and estimation in the domain of conversational gestures. Our model builds upon the insight that body motion and hand gestures are strongly correlated in non-verbal communication settings. We formulate the learning of this prior as a prediction task of 3D hand shape over time given body motion input alone. Trained with 3D pose estimations obtained from a large-scale dataset of internet videos, our hand prediction model produces convincing 3D hand gestures given only the 3D motion of the speaker’s arms as input. We demonstrate the efficacy of our method on hand gesture synthesis from body motion input, and as a strong body prior for single-view image-based 3D hand pose estimation. We demonstrate that our method outperforms previous state-of-the-art approaches and can generalize beyond the monologue-based training data to multi-person conversations. Video results are available at http://people.eecs.berkeley.edu/~evonne_ng/projects/body2hands/.

5. Unsupervised Discovery of 3D Physical Objects from Video

Yilun Du, Kevin Smith, Tomer Ulman, Joshua Tenenbaum, Jiajun Wu

  • retweets: 10, favorites: 44 (07/28/2020 10:32:24)
  • links: abs | pdf
  • cs.CV | cs.LG

We study the problem of unsupervised physical object discovery. Unlike existing frameworks that aim to learn to decompose scenes into 2D segments purely based on each object’s appearance, we explore how physics, especially object interactions, facilitates learning to disentangle and segment instances from raw videos, and to infer the 3D geometry and position of each object, all without supervision. Drawing inspiration from developmental psychology, our Physical Object Discovery Network (POD-Net) uses both multi-scale pixel cues and physical motion cues to accurately segment observable and partially occluded objects of varying sizes, and infer properties of those objects. Our model reliably segments objects on both synthetic and real scenes. The discovered object properties can also be used to reason about physical events.