All Articles

Hot Papers 2020-12-17

1. Point Transformer

Hengshuang Zhao, Li Jiang, Jiaya Jia, Philip Torr, Vladlen Koltun

  • retweets: 5798, favorites: 399 (12/18/2020 18:07:55)
  • links: abs | pdf
  • cs.CV

Self-attention networks have revolutionized natural language processing and are making impressive strides in image analysis tasks such as image classification and object detection. Inspired by this success, we investigate the application of self-attention networks to 3D point cloud processing. We design self-attention layers for point clouds and use these to construct self-attention networks for tasks such as semantic scene segmentation, object part segmentation, and object classification. Our Point Transformer design improves upon prior work across domains and tasks. For example, on the challenging S3DIS dataset for large-scale semantic scene segmentation, the Point Transformer attains an mIoU of 70.4% on Area 5, outperforming the strongest prior model by 3.3 absolute percentage points and crossing the 70% mIoU threshold for the first time.

2. Analysing the Social Spread of Behaviour: Integrating Complex Contagions into Network Based Diffusions

Josh A. Firth, Gregory F. Albery, Kristina B. Beck, Ivan Jarić, Lewis G. Spurgin, Ben C. Sheldon, Will Hoppitt

The spread of socially-learnt behaviours occurs in many animal species, and understanding how behaviours spread can provide novel insights into the causes and consequences of sociality. Within wild populations, behaviour spread is often assumed to occur as a “simple contagion”. Yet, emerging evidence suggests behaviours may frequently spread as “complex contagions”, and this holds significant ramifications for the modes and extent of transmission. We present a new framework enabling comprehensive examination of behavioural contagions by integrating social-learning strategies into network-based diffusion analyses. We show how our approach allows determination of the relationship between social bonds and behavioural transmission, identification of individual-level transmission rules, and examination of population-level social structure effects. We provide resources that allow general applications across diverse systems, and demonstrate how further study-specific developments can be made. Finally, we outline the new opportunities this framework facilitates, the conceptual contributions to understanding sociality, and its applications across fields.

3. Learning Continuous Image Representation with Local Implicit Image Function

Yinbo Chen, Sifei Liu, Xiaolong Wang

  • retweets: 3310, favorites: 345 (12/18/2020 18:07:56)
  • links: abs | pdf
  • cs.CV | cs.LG

How to represent an image? While the visual world is presented in a continuous manner, machines store and see the images in a discrete way with 2D arrays of pixels. In this paper, we seek to learn a continuous representation for images. Inspired by the recent progress in 3D reconstruction with implicit function, we propose Local Implicit Image Function (LIIF), which takes an image coordinate and the 2D deep features around the coordinate as inputs, predicts the RGB value at a given coordinate as an output. Since the coordinates are continuous, LIIF can be presented in an arbitrary resolution. To generate the continuous representation for pixel-based images, we train an encoder and LIIF representation via a self-supervised task with super-resolution. The learned continuous representation can be presented in arbitrary resolution even extrapolate to ×30\times 30 higher resolution, where the training tasks are not provided. We further show that LIIF representation builds a bridge between discrete and continuous representation in 2D, it naturally supports the learning tasks with size-varied image ground-truths and significantly outperforms the method with resizing the ground-truths. Our project page with code is at https://yinboc.github.io/liif/ .

4. Sketch Generation with Drawing Process Guided by Vector Flow and Grayscale

Zhengyan Tong, Xuanhong Chen, Bingbing Ni, Xiaohang Wang

  • retweets: 1804, favorites: 246 (12/18/2020 18:07:56)
  • links: abs | pdf
  • cs.CV

We propose a novel image-to-pencil translation method that could not only generate high-quality pencil sketches but also offer the drawing process. Existing pencil sketch algorithms are based on texture rendering rather than the direct imitation of strokes, making them unable to show the drawing process but only a final result. To address this challenge, we first establish a pencil stroke imitation mechanism. Next, we develop a framework with three branches to guide stroke drawing: the first branch guides the direction of the strokes, the second branch determines the shade of the strokes, and the third branch enhances the details further. Under this framework’s guidance, we can produce a pencil sketch by drawing one stroke every time. Our method is fully interpretable. Comparison with existing pencil drawing algorithms shows that our method is superior to others in terms of texture quality, style, and user evaluation.

5. Pre-Training Transformers as Energy-Based Cloze Models

Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning

  • retweets: 499, favorites: 177 (12/18/2020 18:07:57)
  • links: abs | pdf
  • cs.CL

We introduce Electric, an energy-based cloze model for representation learning over text. Like BERT, it is a conditional generative model of tokens given their contexts. However, Electric does not use masking or output a full distribution over tokens that could occur in a context. Instead, it assigns a scalar energy score to each input token indicating how likely it is given its context. We train Electric using an algorithm based on noise-contrastive estimation and elucidate how this learning objective is closely related to the recently proposed ELECTRA pre-training method. Electric performs well when transferred to downstream tasks and is particularly effective at producing likelihood scores for text: it re-ranks speech recognition n-best lists better than language models and much faster than masked language models. Furthermore, it offers a clearer and more principled view of what ELECTRA learns during pre-training.

6. Beyond pairwise network similarity: exploring Mediation and Suppression between networks

Lucas Lacasa, Sebastiano Stramaglia, Daniele Marinazzo

Network similarity measures quantify how and when two networks are symmetrically related, including measures of statistical association such as pairwise distance or other correlation measures between networks or between the layers of a multiplex network, but neither can directly unveil whether there are hidden confounding network factors nor can they estimate when such correlation is underpinned by a causal relation. In this work we extend this pairwise conceptual framework to triplets of networks and quantify how and when a network is related to a second network directly or via the indirect mediation or interaction with a third network. Accordingly, we develop a simple and intuitive set-theoretic approach to quantify mediation and suppression between networks. We validate our theory with synthetic models and further apply it to triplets of real-world networks, unveiling mediation and suppression effects which emerge when considering different modes of interaction in online social networks and different routes of information processing in the brain.

7. C2F-FWN: Coarse-to-Fine Flow Warping Network for Spatial-Temporal Consistent Motion Transfer

Dongxu Wei, Xiaowei Xu, Haibin Shen, Kejie Huang

  • retweets: 184, favorites: 87 (12/18/2020 18:07:57)
  • links: abs | pdf
  • cs.CV

Human video motion transfer (HVMT) aims to synthesize videos that one person imitates other persons’ actions. Although existing GAN-based HVMT methods have achieved great success, they either fail to preserve appearance details due to the loss of spatial consistency between synthesized and exemplary images, or generate incoherent video results due to the lack of temporal consistency among video frames. In this paper, we propose Coarse-to-Fine Flow Warping Network (C2F-FWN) for spatial-temporal consistent HVMT. Particularly, C2F-FWN utilizes coarse-to-fine flow warping and Layout-Constrained Deformable Convolution (LC-DConv) to improve spatial consistency, and employs Flow Temporal Consistency (FTC) Loss to enhance temporal consistency. In addition, provided with multi-source appearance inputs, C2F-FWN can support appearance attribute editing with great flexibility and efficiency. Besides public datasets, we also collected a large-scale HVMT dataset named SoloDance for evaluation. Extensive experiments conducted on our SoloDance dataset and the iPER dataset show that our approach outperforms state-of-art HVMT methods in terms of both spatial and temporal consistency. Source code and the SoloDance dataset are available at https://github.com/wswdx/C2F-FWN.

8. Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts

Ji Hou, Benjamin Graham, Matthias Nießner, Saining Xie

  • retweets: 162, favorites: 57 (12/18/2020 18:07:57)
  • links: abs | pdf
  • cs.CV

The rapid progress in 3D scene understanding has come with growing demand for data; however, collecting and annotating 3D scenes (e.g. point clouds) are notoriously hard. For example, the number of scenes (e.g. indoor rooms) that can be accessed and scanned might be limited; even given sufficient data, acquiring 3D labels (e.g. instance masks) requires intensive human labor. In this paper, we explore data-efficient learning for 3D point cloud. As a first step towards this direction, we propose Contrastive Scene Contexts, a 3D pre-training method that makes use of both point-level correspondences and spatial contexts in a scene. Our method achieves state-of-the-art results on a suite of benchmarks where training data or labels are scarce. Our study reveals that exhaustive labelling of 3D point clouds might be unnecessary; and remarkably, on ScanNet, even using 0.1% of point labels, we still achieve 89% (instance segmentation) and 96% (semantic segmentation) of the baseline performance that uses full annotations.

9. Causality is Graphically Simple

Carlos Baquero

  • retweets: 156, favorites: 45 (12/18/2020 18:07:57)
  • links: abs | pdf
  • cs.DC

Events in distributed systems include sending or receiving messages, or changing some state in a node. Not all events are related, but some events can cause and influence how other, later events, occur. For instance, a reply to a received mail message is influenced by that message, and maybe by other prior messages also received. This article brings an introduction to classic causality tracking mechanisms and covers some more recent developments. The presentation is supported by a new graphical notation that allows an intuitive interpretation of the causality relations described.

10. Network experimentation at scale

Brian Karrer, Liang Shi, Monica Bhole, Matt Goldman, Tyrone Palmer, Charlie Gelman, Mikael Konutgan, Feng Sun

We describe our framework, deployed at Facebook, that accounts for interference between experimental units through cluster-randomized experiments. We document this system, including the design and estimation procedures, and detail insights we have gained from the many experiments that have used this system at scale. We introduce a cluster-based regression adjustment that substantially improves precision for estimating global treatment effects as well as testing for interference as part of our estimation procedure. With this regression adjustment, we find that imbalanced clusters can better account for interference than balanced clusters without sacrificing accuracy. In addition, we show how logging exposure to a treatment can be used for additional variance reduction. Interference is a widely acknowledged issue with online field experiments, yet there is less evidence from real-world experiments demonstrating interference in online settings. We fill this gap by describing two case studies that capture significant network effects and highlight the value of this experimentation framework.

11. On Avoiding the Union Bound When Answering Multiple Differentially Private Queries

Badih Ghazi, Ravi Kumar, Pasin Manurangsi

In this work, we study the problem of answering kk queries with (ϵ,δ)(\epsilon, \delta)-differential privacy, where each query has sensitivity one. We give an algorithm for this task that achieves an expected \ell_\infty error bound of O(1ϵklog1δ)O(\frac{1}{\epsilon}\sqrt{k \log \frac{1}{\delta}}), which is known to be tight (Steinke and Ullman, 2016). A very recent work by Dagan and Kur (2020) provides a similar result, albeit via a completely different approach. One difference between our work and theirs is that our guarantee holds even when δ<2Ω(k/(logk)8)\delta < 2^{-\Omega(k/(\log k)^8)} whereas theirs does not apply in this case. On the other hand, the algorithm of Dagan and Kur has a remarkable advantage that the \ell_{\infty} error bound of O(1ϵklog1δ)O(\frac{1}{\epsilon}\sqrt{k \log \frac{1}{\delta}}) holds not only in expectation but always (i.e., with probability one) while we can only get a high probability (or expected) guarantee on the error.

12. Improved StyleGAN Embedding: Where are the Good Latents?

Peihao Zhu, Rameen Abdal, Yipeng Qin, Peter Wonka

  • retweets: 30, favorites: 60 (12/18/2020 18:07:58)
  • links: abs | pdf
  • cs.CV | cs.GR

StyleGAN is able to produce photorealistic images almost indistinguishable from real ones. Embedding images into the StyleGAN latent space is not a trivial task due to the reconstruction quality and editing quality trade-off. In this paper, we first introduce a new normalized space to analyze the diversity and the quality of the reconstructed latent codes. This space can help answer the question of where good latent codes are located in latent space. Second, we propose a framework to analyze the quality of different embedding algorithms. Third, we propose an improved embedding algorithm based on our analysis. We compare our results with the current state-of-the-art methods and achieve a better trade-off between reconstruction quality and editing quality.

13. DECOR-GAN: 3D Shape Detailization by Conditional Refinement

Zhiqin Chen, Vladimir Kim, Matthew Fisher, Noam Aigerman, Hao Zhang, Siddhartha Chaudhuri

We introduce a deep generative network for 3D shape detailization, akin to stylization with the style being geometric details. We address the challenge of creating large varieties of high-resolution and detailed 3D geometry from a small set of exemplars by treating the problem as that of geometric detail transfer. Given a low-resolution coarse voxel shape, our network refines it, via voxel upsampling, into a higher-resolution shape enriched with geometric details. The output shape preserves the overall structure (or content) of the input, while its detail generation is conditioned on an input “style code” corresponding to a detailed exemplar. Our 3D detailization via conditional refinement is realized by a generative adversarial network, coined DECOR-GAN. The network utilizes a 3D CNN generator for upsampling coarse voxels and a 3D PatchGAN discriminator to enforce local patches of the generated model to be similar to those in the training detailed shapes. During testing, a style code is fed into the generator to condition the refinement. We demonstrate that our method can refine a coarse shape into a variety of detailed shapes with different styles. The generated results are evaluated in terms of content preservation, plausibility, and diversity. Comprehensive ablation studies are conducted to validate our network designs.

14. StrokeGAN: Reducing Mode Collapse in Chinese Font Generation via Stroke Encoding

Jinshan Zeng, Qi Chen, Yunxin Liu, Mingwen Wang, Yuan Yao

The generation of stylish Chinese fonts is an important problem involved in many applications. Most of existing generation methods are based on the deep generative models, particularly, the generative adversarial networks (GAN) based models. However, these deep generative models may suffer from the mode collapse issue, which significantly degrades the diversity and quality of generated results. In this paper, we introduce a one-bit stroke encoding to capture the key mode information of Chinese characters and then incorporate it into CycleGAN, a popular deep generative model for Chinese font generation. As a result we propose an efficient method called StrokeGAN, mainly motivated by the observation that the stroke encoding contains amount of mode information of Chinese characters. In order to reconstruct the one-bit stroke encoding of the associated generated characters, we introduce a stroke-encoding reconstruction loss imposed on the discriminator. Equipped with such one-bit stroke encoding and stroke-encoding reconstruction loss, the mode collapse issue of CycleGAN can be significantly alleviated, with an improved preservation of strokes and diversity of generated characters. The effectiveness of StrokeGAN is demonstrated by a series of generation tasks over nine datasets with different fonts. The numerical results demonstrate that StrokeGAN generally outperforms the state-of-the-art methods in terms of content and recognition accuracies, as well as certain stroke error, and also generates more realistic characters.

15. No Budget? Don’t Flex! Cost Consideration when Planning to Adopt NLP for Your Business

Made Nindyatama Nityasya, Haryo Akbarianto Wibowo, Radityo Eko Prasojo, Alham Fikri Aji

  • retweets: 32, favorites: 21 (12/18/2020 18:07:58)
  • links: abs | pdf
  • cs.CL

Recent advances in Natural Language Processing (NLP) have largely pushed deep transformer-based models as the go-to state-of-the-art technique without much regard to the production and utilization cost. Companies planning to adopt these methods into their business face difficulties because of the lack of machine and human resources to build them. In this work, we compare both the performance and the cost of classical learning algorithms to the latest ones in common sequence and text labeling tasks. We find that classical models often perform on par with deep neural ones despite the lower cost. We argue that under many circumstances the smaller and lighter models fit better for AI-pivoting businesses and that we call for more research into low-cost models, especially for under-resourced languages.

16. A Closer Look at the Robustness of Vision-and-Language Pre-trained Models

Linjie Li, Zhe Gan, Jingjing Liu

  • retweets: 25, favorites: 27 (12/18/2020 18:07:58)
  • links: abs | pdf
  • cs.CV | cs.CL

Large-scale pre-trained multimodal transformers, such as ViLBERT and UNITER, have propelled the state of the art in vision-and-language (V+L) research to a new level. Although achieving impressive performance on standard tasks, to date, it still remains unclear how robust these pre-trained models are. To investigate, we conduct a host of thorough evaluations on existing pre-trained models over 4 different types of V+L specific model robustness: (i) Linguistic Variation; (ii) Logical Reasoning; (iii) Visual Content Manipulation; and (iv) Answer Distribution Shift. Interestingly, by standard model finetuning, pre-trained V+L models already exhibit better robustness than many task-specific state-of-the-art methods. To further enhance model robustness, we propose Mango, a generic and efficient approach that learns a Multimodal Adversarial Noise GeneratOr in the embedding space to fool pre-trained V+L models. Differing from previous studies focused on one specific type of robustness, Mango is task-agnostic, and enables universal performance lift for pre-trained models over diverse tasks designed to evaluate broad aspects of robustness. Comprehensive experiments demonstrate that Mango achieves new state of the art on 7 out of 9 robustness benchmarks, surpassing existing methods by a significant margin. As the first comprehensive study on V+L robustness, this work puts robustness of pre-trained models into sharper focus, pointing new directions for future study.