All Articles

Hot Papers 2020-07-23

1. DeepSVG: A Hierarchical Generative Network for Vector Graphics Animation

Alexandre Carlier, Martin Danelljan, Alexandre Alahi, Radu Timofte

  • retweets: 93, favorites: 422 (07/24/2020 21:17:58)
  • links: abs | pdf
  • cs.CV

Scalable Vector Graphics (SVG) are ubiquitous in modern 2D interfaces due to their ability to scale to different resolutions. However, despite the success of deep learning-based models applied to rasterized images, the problem of vector graphics representation learning and generation remains largely unexplored. In this work, we propose a novel hierarchical generative network, called DeepSVG, for complex SVG icons generation and interpolation. Our architecture effectively disentangles high-level shapes from the low-level commands that encode the shape itself. The network directly predicts a set of shapes in a non-autoregressive fashion. We introduce the task of complex SVG icons generation by releasing a new large-scale dataset along with an open-source library for SVG manipulation. We demonstrate that our network learns to accurately reconstruct diverse vector graphics, and can serve as a powerful animation tool by performing interpolations and other latent space operations. Our code is available at https://github.com/alexandre01/deepsvg.

2. CrossTransformers: spatially-aware few-shot transfer

Carl Doersch, Ankush Gupta, Andrew Zisserman

  • retweets: 45, favorites: 212 (07/24/2020 21:17:59)
  • links: abs | pdf
  • cs.CV

Given new tasks with very little data—such as new classes in a classification problem or a domain shift in the input—performance of modern vision systems degrades remarkably quickly. In this work, we illustrate how the neural network representations which underpin modern vision systems are subject to supervision collapse, whereby they lose any information that is not necessary for performing the training task, including information that may be necessary for transfer to new tasks or domains. We then propose two methods to mitigate this problem. First, we employ self-supervised learning to encourage general-purpose features that transfer better. Second, we propose a novel Transformer based neural network architecture called CrossTransformers, which can take a small number of labeled images and an unlabeled query, find coarse spatial correspondence between the query and the labeled images, and then infer class membership by computing distances between spatially-corresponding features. The result is a classifier that is more robust to task and domain shift, which we demonstrate via state-of-the-art performance on Meta-Dataset, a recent dataset for evaluating transfer from ImageNet to many other vision datasets.

3. Unsupervised Shape and Pose Disentanglement for 3D Meshes

Keyang Zhou, Bharat Lal Bhatnagar, Gerard Pons-Moll

  • retweets: 38, favorites: 195 (07/24/2020 21:17:59)
  • links: abs | pdf
  • cs.CV

Parametric models of humans, faces, hands and animals have been widely used for a range of tasks such as image-based reconstruction, shape correspondence estimation, and animation. Their key strength is the ability to factor surface variations into shape and pose dependent components. Learning such models requires lots of expert knowledge and hand-defined object-specific constraints, making the learning approach unscalable to novel objects. In this paper, we present a simple yet effective approach to learn disentangled shape and pose representations in an unsupervised setting. We use a combination of self-consistency and cross-consistency constraints to learn pose and shape space from registered meshes. We additionally incorporate as-rigid-as-possible deformation(ARAP) into the training loop to avoid degenerate solutions. We demonstrate the usefulness of learned representations through a number of tasks including pose transfer and shape retrieval. The experiments on datasets of 3D humans, faces, hands and animals demonstrate the generality of our approach. Code is made available at https://virtualhumans.mpi-inf.mpg.de/unsup_shape_pose/.

4. Neural Sparse Voxel Fields

Lingjie Liu, Jiatao Gu, Kyaw Zaw Lin, Tat-Seng Chua, Christian Theobalt

Photo-realistic free-viewpoint rendering of real-world scenes using classical computer graphics techniques is challenging, because it requires the difficult step of capturing detailed appearance and geometry models. Recent studies have demonstrated promising results by learning scene representations that implicitly encode both geometry and appearance without 3D supervision. However, existing approaches in practice often show blurry renderings caused by the limited network capacity or the difficulty in finding accurate intersections of camera rays with the scene geometry. Synthesizing high-resolution imagery from these representations often requires time-consuming optical ray marching. In this work, we introduce Neural Sparse Voxel Fields (NSVF), a new neural scene representation for fast and high-quality free-viewpoint rendering. NSVF defines a set of voxel-bounded implicit fields organized in a sparse voxel octree to model local properties in each cell. We progressively learn the underlying voxel structures with a diffentiable ray-marching operation from only a set of posed RGB images. With the sparse voxel octree structure, rendering novel views can be accelerated by skipping the voxels containing no relevant scene content. Our method is over 10 times faster than the state-of-the-art (namely, NeRF) at inference time while achieving higher quality results. Furthermore, by utilizing an explicit sparse voxel representation, our method can easily be applied to scene editing and scene composition. We also demonstrate several challenging tasks, including multi-scene learning, free-viewpoint rendering of a moving human, and large-scale scene rendering.

5. Coarse Graining Molecular Dynamics with Graph Neural Networks

Brooke E. Husic, Nicholas E. Charron, Dominik Lemm, Jiang Wang, Adrià Pérez, Andreas Krämer, Yaoyi Chen, Simon Olsson, Gianni de Fabritiis, Frank Noé, Cecilia Clementi

Coarse graining enables the investigation of molecular dynamics for larger systems and at longer timescales than is possible at atomic resolution. However, a coarse graining model must be formulated such that the conclusions we draw from it are consistent with the conclusions we would draw from a model at a finer level of detail. It has been proven that a force matching scheme defines a thermodynamically consistent coarse-grained model for an atomistic system in the variational limit. Wang et al. [ACS Cent. Sci. 5, 755 (2019)] demonstrated that the existence of such a variational limit enables the use of a supervised machine learning framework to generate a coarse-grained force field, which can then be used for simulation in the coarse-grained space. Their framework, however, requires the manual input of molecular features upon which to machine learn the force field. In the present contribution, we build upon the advance of Wang et al.and introduce a hybrid architecture for the machine learning of coarse-grained force fields that learns their own features via a subnetwork that leverages continuous filter convolutions on a graph neural network architecture. We demonstrate that this framework succeeds at reproducing the thermodynamics for small biomolecular systems. Since the learned molecular representations are inherently transferable, the architecture presented here sets the stage for the development of machine-learned, coarse-grained force fields that are transferable across molecular systems.

6. Exploratory Search with Sentence Embeddings

Austin Silveria

  • retweets: 15, favorites: 73 (07/24/2020 21:18:00)
  • links: abs | pdf
  • cs.CL | cs.IR

Exploratory search aims to guide users through a corpus rather than pinpointing exact information. We propose an exploratory search system based on hierarchical clusters and document summaries using sentence embeddings. With sentence embeddings, we represent documents as the mean of their embedded sentences, extract summaries containing sentences close to this document representation and extract keyphrases close to the document representation. To evaluate our search system, we scrape our personal search history over the past year and report our experience with the system. We then discuss motivating use cases of an exploratory search system of this nature and conclude with possible directions of future work.

7. Online Monitoring of Global Attitudes Towards Wildlife

Joss Wright, Robert Lennox, Diogo Veríssimo

  • retweets: 19, favorites: 63 (07/24/2020 21:18:00)
  • links: abs | pdf
  • cs.CY

Human factors are increasingly recognised as central to conservation of biodiversity. Despite this, there are no existing systematic efforts to monitor global trends in perceptions of wildlife. With traditional news reporting now largely online, the internet presents a powerful means to monitor global attitudes towards species. In this work we develop a method using the Global Database of Events, Language, and Tone (GDELT) to scan global news media, allowing us to identify and download conservation-related articles. Applying supervised machine learning techniques, we filter irrelevant articles to create a continually updated global dataset of news coverage for seven target taxa: lion, tiger, saiga, rhinoceros, pangolins, elephants, and orchids, and observe that over two-thirds of articles matching a simple keyword search were irrelevant. We examine coverage of each taxa in different regions, and find that elephants, rhinos, tigers, and lions receive the most coverage, with daily peaks of around 200 articles. Mean sentiment was positive for all taxa, except saiga for which it was neutral. Coverage was broadly distributed, with articles from 73 countries across all continents. Elephants and tigers received coverage in the most countries overall, whilst orchids and saiga were mentioned in the smallest number of countries. We further find that sentiment towards charismatic megafauna is most positive in non-range countries, with the opposite being true for pangolins and orchids. Despite promising results, there remain substantial obstacles to achieving globally representative results. Disparities in internet access between low and high income countries and users is a major source of bias, with the need to focus on a diversity of data sources and languages, presenting sizable technical challenges…

8. Undercutting Bitcoin Is Not Profitable

Tiantian Gong, Mohsen Minaei, Wenhai Sun, Aniket Kate

  • retweets: 17, favorites: 47 (07/24/2020 21:18:00)
  • links: abs | pdf
  • cs.CR | cs.GT

A fixed block reward and voluntary transaction fees are two sources of economic incentives for mining in Bitcoin and other cryptocurrencies. For Bitcoin, the block reward halves every 210,000 blocks and it is supposed to vanish gradually. The remaining incentive of transaction fees is optional and arbitrary, and an undercutting attack becomes a potential threat, where the attacker deliberately forks an existing chain by leaving wealthy transactions unclaimed to attract other miners. We look into the profitability of the undercutting attack in this work. Our numerical simulations and experiments demonstrate that (i) only miners with mining power > 40% have a reasonable probability of successfully undercutting. (ii) As honest miners do not shift to the fork immediately in the first round, an undercutter’s profit drops with the number of honest miners. Given the current transaction fee rate distribution in Bitcoin, with half of the miners being honest, undercutting cannot be profitable at all; With 25% honest mining power, an undercutter with > 45% mining power can expect income more than its “fair share”; With no honest miners present, the threshold mining power for a profitable undercutting is 42%. (iii) For the current largest Bitcoin mining pool with 17.2% mining power, the probability of successfully launching an undercutting attack is tiny and the expected returns are far below honest mining gains. (iv) While the larger the prize the undercutter left unclaimed, the higher is the probability of the attack succeeding but the attack’s profits also go down. Finally, we analyze the best responses to undercutting for other rational miners. (v) For two rational miners and one of them being the potential undercutter with 45% mining power, we find the dominant strategy for the responding rational miner is to typical rational.

9. IBM Federated Learning: an Enterprise Framework White Paper V0.1

Heiko Ludwig, Nathalie Baracaldo, Gegi Thomas, Yi Zhou, Ali Anwar, Shashank Rajamoni, Yuya Ong, Jayaram Radhakrishnan, Ashish Verma, Mathieu Sinn, Mark Purcell, Ambrish Rawat, Tran Minh, Naoise Holohan, Supriyo Chakraborty, Shalisha Whitherspoon, Dean Steuer, Laura Wynter, Hifaz Hassan, Sean Laguna, Mikhail Yurochkin, Mayank Agarwal, Ebube Chuba, Annie Abay

Federated Learning (FL) is an approach to conduct machine learning without centralizing training data in a single place, for reasons of privacy, confidentiality or data volume. However, solving federated machine learning problems raises issues above and beyond those of centralized machine learning. These issues include setting up communication infrastructure between parties, coordinating the learning process, integrating party results, understanding the characteristics of the training data sets of different participating parties, handling data heterogeneity, and operating with the absence of a verification data set. IBM Federated Learning provides infrastructure and coordination for federated learning. Data scientists can design and run federated learning jobs based on existing, centralized machine learning models and can provide high-level instructions on how to run the federation. The framework applies to both Deep Neural Networks as well as traditional” approaches for the most common machine learning libraries. {\proj} enables data scientists to expand their scope from centralized to federated machine learning, minimizing the learning curve at the outset while also providing the flexibility to deploy to different compute environments and design custom fusion algorithms.

10. EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL

Seyed Kamyar Seyed Ghasemipour, Dale Schuurmans, Shixiang Shane Gu

Off-policy reinforcement learning (RL) holds the promise of sample-efficient learning of decision-making policies by leveraging past experience. However, in the offline RL setting — where a fixed collection of interactions are provided and no further interactions are allowed — it has been shown that standard off-policy RL methods can significantly underperform. Recently proposed methods aim to address this shortcoming by regularizing learned policies to remain close to the given dataset of interactions. However, these methods involve several configurable components such as learning a separate policy network on top of a behavior cloning actor, and explicitly constraining action spaces through clipping or reward penalties. Striving for simultaneous simplicity and performance, in this work we present a novel backup operator, Expected-Max Q-Learning (EMaQ), which naturally restricts learned policies to remain within the support of the offline dataset \emph{without any explicit regularization}, while retaining desirable theoretical properties such as contraction. We demonstrate that EMaQ is competitive with Soft Actor Critic (SAC) in online RL, and surpasses SAC in the deployment-efficient setting. In the offline RL setting — the main focus of this work — through EMaQ we are able to make important observations regarding key components of offline RL, and the nature of standard benchmark tasks. Lastly but importantly, we observe that EMaQ achieves state-of-the-art performance with fewer moving parts such as one less function approximation, making it a strong, yet easy to implement baseline for future work.

11. Interpolating GANs to Scaffold Autotelic Creativity

Ziv Epstein, Océane Boulais, Skylar Gordon, Matt Groh

The latent space modeled by generative adversarial networks (GANs) represents a large possibility space. By interpolating categories generated by GANs, it is possible to create novel hybrid images. We present “Meet the Ganimals,” a casual creator built on interpolations of BigGAN that can generate novel, hybrid animals called ganimals by efficiently searching this possibility space. Like traditional casual creators, the system supports a simple creative flow that encourages rapid exploration of the possibility space. Users can discover new ganimals, create their own, and share their reactions to aesthetic, emotional, and morphological characteristics of the ganimals. As users provide input to the system, the system adapts and changes the distribution of categories upon which ganimals are generated. As one of the first GAN-based casual creators, Meet the Ganimals is an example how casual creators can leverage human curation and citizen science to discover novel artifacts within a large possibility space.