Hot Papers 2020-12-17

1. Point Transformer

Hengshuang Zhao, Li Jiang, Jiaya Jia, Philip Torr, Vladlen Koltun

retweets: 5798, favorites: 399 (12/18/2020 18:07:55)
links: abs | pdf
cs.CV

Self-attention networks have revolutionized natural language processing and are making impressive strides in image analysis tasks such as image classification and object detection. Inspired by this success, we investigate the application of self-attention networks to 3D point cloud processing. We design self-attention layers for point clouds and use these to construct self-attention networks for tasks such as semantic scene segmentation, object part segmentation, and object classification. Our Point Transformer design improves upon prior work across domains and tasks. For example, on the challenging S3DIS dataset for large-scale semantic scene segmentation, the Point Transformer attains an mIoU of 70.4% on Area 5, outperforming the strongest prior model by 3.3 absolute percentage points and crossing the 70% mIoU threshold for the first time.

Transformers for point clouds https://t.co/DvoTCnc92y

They outperform all previous methods on semantic segmentation, shape classification and object part segmentation. pic.twitter.com/kP1VH92DER
— Ankur Handa (@ankurhandos) December 17, 2020

真のPointTransformerの座をかけた戦いが始まる
(雑に強いタイトル狙いに行くのやめてくれー)https://t.co/FKhHOKq813 https://t.co/tUJxTYiudv pic.twitter.com/B2ArmV034b
— K.Ogaki (@Hi_king) December 17, 2020

Josh A. Firth, Gregory F. Albery, Kristina B. Beck, Ivan Jarić, Lewis G. Spurgin, Ben C. Sheldon, Will Hoppitt

retweets: 3600, favorites: 204 (12/18/2020 18:07:56)
links: abs | pdf
cs.SI | q-bio.QM

The spread of socially-learnt behaviours occurs in many animal species, and understanding how behaviours spread can provide novel insights into the causes and consequences of sociality. Within wild populations, behaviour spread is often assumed to occur as a “simple contagion”. Yet, emerging evidence suggests behaviours may frequently spread as “complex contagions”, and this holds significant ramifications for the modes and extent of transmission. We present a new framework enabling comprehensive examination of behavioural contagions by integrating social-learning strategies into network-based diffusion analyses. We show how our approach allows determination of the relationship between social bonds and behavioural transmission, identification of individual-level transmission rules, and examination of population-level social structure effects. We provide resources that allow general applications across diverse systems, and demonstrate how further study-specific developments can be made. Finally, we outline the new opportunities this framework facilitates, the conceptual contributions to understanding sociality, and its applications across fields.

Interested in how behaviours spread on social networks in natural populations? Our new paper outlines the importance of social learning strategies & presents a new method for examining different forms of behavioural contagions. Online now:https://t.co/hd3UxsRUoo pic.twitter.com/yt3NHASMwc
— Josh Firth (@JoshAFirth) December 17, 2020

3. Learning Continuous Image Representation with Local Implicit Image Function

Yinbo Chen, Sifei Liu, Xiaolong Wang

retweets: 3310, favorites: 345 (12/18/2020 18:07:56)
links: abs | pdf
cs.CV | cs.LG

How to represent an image? While the visual world is presented in a continuous manner, machines store and see the images in a discrete way with 2D arrays of pixels. In this paper, we seek to learn a continuous representation for images. Inspired by the recent progress in 3D reconstruction with implicit function, we propose Local Implicit Image Function (LIIF), which takes an image coordinate and the 2D deep features around the coordinate as inputs, predicts the RGB value at a given coordinate as an output. Since the coordinates are continuous, LIIF can be presented in an arbitrary resolution. To generate the continuous representation for pixel-based images, we train an encoder and LIIF representation via a self-supervised task with super-resolution. The learned continuous representation can be presented in arbitrary resolution even extrapolate to $\times 30$ higher resolution, where the training tasks are not provided. We further show that LIIF representation builds a bridge between discrete and continuous representation in 2D, it naturally supports the learning tasks with size-varied image ground-truths and significantly outperforms the method with resizing the ground-truths. Our project page with code is at https://yinboc.github.io/liif/ .

Learning Continuous Image Representation with Local Implicit Image Function
pdf: https://t.co/7vbPTVlZ4J
abs: https://t.co/S01c0mVUOj
project page: https://t.co/fzclFcwg73
github: https://t.co/B2HyUqNDXG pic.twitter.com/NkCdtuApo1
— AK (@ak92501) December 17, 2020

4. Sketch Generation with Drawing Process Guided by Vector Flow and Grayscale

Zhengyan Tong, Xuanhong Chen, Bingbing Ni, Xiaohang Wang

retweets: 1804, favorites: 246 (12/18/2020 18:07:56)
links: abs | pdf
cs.CV

We propose a novel image-to-pencil translation method that could not only generate high-quality pencil sketches but also offer the drawing process. Existing pencil sketch algorithms are based on texture rendering rather than the direct imitation of strokes, making them unable to show the drawing process but only a final result. To address this challenge, we first establish a pencil stroke imitation mechanism. Next, we develop a framework with three branches to guide stroke drawing: the first branch guides the direction of the strokes, the second branch determines the shade of the strokes, and the third branch enhances the details further. Under this framework’s guidance, we can produce a pencil sketch by drawing one stroke every time. Our method is fully interpretable. Comparison with existing pencil drawing algorithms shows that our method is superior to others in terms of texture quality, style, and user evaluation.

Sketch Generation with Drawing Process Guided by Vector Flow and Grayscale
pdf: https://t.co/t6N1Ng5HsO
abs: https://t.co/2nf5nvGYJE
github: https://t.co/1IBICMKs6d pic.twitter.com/PDEmYJmFvP
— AK (@ak92501) December 17, 2020

5. Pre-Training Transformers as Energy-Based Cloze Models

Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning

retweets: 499, favorites: 177 (12/18/2020 18:07:57)
links: abs | pdf
cs.CL

We introduce Electric, an energy-based cloze model for representation learning over text. Like BERT, it is a conditional generative model of tokens given their contexts. However, Electric does not use masking or output a full distribution over tokens that could occur in a context. Instead, it assigns a scalar energy score to each input token indicating how likely it is given its context. We train Electric using an algorithm based on noise-contrastive estimation and elucidate how this learning objective is closely related to the recently proposed ELECTRA pre-training method. Electric performs well when transferred to downstream tasks and is particularly effective at producing likelihood scores for text: it re-ranks speech recognition n-best lists better than language models and much faster than masked language models. Furthermore, it offers a clearer and more principled view of what ELECTRA learns during pre-training.

Check out our #emnlp2020 paper now on arXiv: we develop an energy-based version of BERT called "Electric" and show this essentially re-invents the ELECTRA pre-training method. Electric is useful for re-ranking the outputs of text generation systems. https://t.co/PY5FYuTnSI. pic.twitter.com/Nxwg3Rbq3P
— Kevin Clark (@clark_kev) December 17, 2020

Pre-Training Transformers as Energy-Based Cloze Models
pdf: https://t.co/Eitx7GHnj3
abs: https://t.co/mthkQEc1CE pic.twitter.com/KOSP4eYjVC
— AK (@ak92501) December 17, 2020

6. Beyond pairwise network similarity: exploring Mediation and Suppression between networks

Lucas Lacasa, Sebastiano Stramaglia, Daniele Marinazzo

retweets: 252, favorites: 54 (12/18/2020 18:07:57)
links: abs | pdf
cs.SI | physics.data-an | physics.soc-ph

Network similarity measures quantify how and when two networks are symmetrically related, including measures of statistical association such as pairwise distance or other correlation measures between networks or between the layers of a multiplex network, but neither can directly unveil whether there are hidden confounding network factors nor can they estimate when such correlation is underpinned by a causal relation. In this work we extend this pairwise conceptual framework to triplets of networks and quantify how and when a network is related to a second network directly or via the indirect mediation or interaction with a third network. Accordingly, we develop a simple and intuitive set-theoretic approach to quantify mediation and suppression between networks. We validate our theory with synthetic models and further apply it to triplets of real-world networks, unveiling mediation and suppression effects which emerge when considering different modes of interaction in online social networks and different routes of information processing in the brain.

📢🔔 New preprint!! 🔔📢

Beyond pairwise network similarity: exploring Mediation and Suppression between networks

with @wetuad and @SebinoStram https://t.co/KgzhqWPvji

thread follows 👇 pic.twitter.com/h0pFwaCQxh
— daniele marinazzo (@dan_marinazzo) December 17, 2020

7. C2F-FWN: Coarse-to-Fine Flow Warping Network for Spatial-Temporal Consistent Motion Transfer

Dongxu Wei, Xiaowei Xu, Haibin Shen, Kejie Huang

retweets: 184, favorites: 87 (12/18/2020 18:07:57)
links: abs | pdf
cs.CV

Human video motion transfer (HVMT) aims to synthesize videos that one person imitates other persons’ actions. Although existing GAN-based HVMT methods have achieved great success, they either fail to preserve appearance details due to the loss of spatial consistency between synthesized and exemplary images, or generate incoherent video results due to the lack of temporal consistency among video frames. In this paper, we propose Coarse-to-Fine Flow Warping Network (C2F-FWN) for spatial-temporal consistent HVMT. Particularly, C2F-FWN utilizes coarse-to-fine flow warping and Layout-Constrained Deformable Convolution (LC-DConv) to improve spatial consistency, and employs Flow Temporal Consistency (FTC) Loss to enhance temporal consistency. In addition, provided with multi-source appearance inputs, C2F-FWN can support appearance attribute editing with great flexibility and efficiency. Besides public datasets, we also collected a large-scale HVMT dataset named SoloDance for evaluation. Extensive experiments conducted on our SoloDance dataset and the iPER dataset show that our approach outperforms state-of-art HVMT methods in terms of both spatial and temporal consistency. Source code and the SoloDance dataset are available at https://github.com/wswdx/C2F-FWN.

C2F-FWN: Coarse-to-Fine Flow Warping Network for
Spatial-Temporal Consistent Motion Transfer
pdf: https://t.co/P2zDFJAtbX
abs: https://t.co/5SujqHMGsp pic.twitter.com/JMqFcPnk3L
— AK (@ak92501) December 17, 2020

8. Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts

Ji Hou, Benjamin Graham, Matthias Nießner, Saining Xie

retweets: 162, favorites: 57 (12/18/2020 18:07:57)
links: abs | pdf
cs.CV

The rapid progress in 3D scene understanding has come with growing demand for data; however, collecting and annotating 3D scenes (e.g. point clouds) are notoriously hard. For example, the number of scenes (e.g. indoor rooms) that can be accessed and scanned might be limited; even given sufficient data, acquiring 3D labels (e.g. instance masks) requires intensive human labor. In this paper, we explore data-efficient learning for 3D point cloud. As a first step towards this direction, we propose Contrastive Scene Contexts, a 3D pre-training method that makes use of both point-level correspondences and spatial contexts in a scene. Our method achieves state-of-the-art results on a suite of benchmarks where training data or labels are scarce. Our study reveals that exhaustive labelling of 3D point clouds might be unnecessary; and remarkably, on ScanNet, even using 0.1% of point labels, we still achieve 89% (instance segmentation) and 96% (semantic segmentation) of the baseline performance that uses full annotations.

Sharing our new work Contrastive Scene Contexts, a new pre-training method for data-efficient learning in 3D.

New ScanNet benchmark coming up soon!https://t.co/eMctdvFWio
Project: https://t.co/1a0M2jrlMr
Paper: https://t.co/zZRVvOZPif

(w/ BenGraham @MattNiessner @sainingxie)
— Ji Hou (@sekunde_) December 17, 2020

9. Causality is Graphically Simple

Carlos Baquero

retweets: 156, favorites: 45 (12/18/2020 18:07:57)
links: abs | pdf
cs.DC

Events in distributed systems include sending or receiving messages, or changing some state in a node. Not all events are related, but some events can cause and influence how other, later events, occur. For instance, a reply to a received mail message is influenced by that message, and maybe by other prior messages also received. This article brings an introduction to classic causality tracking mechanisms and covers some more recent developments. The presentation is supported by a new graphical notation that allows an intuitive interpretation of the causality relations described.

Distributed systems causality is only hard if you miss the right visual abstraction. Posted a new ArXiv tutorial/dissemination paper on making causality graphically easy. https://t.co/SZstmUPHnU pic.twitter.com/xWKIgEqCA5
— Carlos Baquero (@xmal) December 17, 2020

10. Network experimentation at scale

Brian Karrer, Liang Shi, Monica Bhole, Matt Goldman, Tyrone Palmer, Charlie Gelman, Mikael Konutgan, Feng Sun

retweets: 134, favorites: 37 (12/18/2020 18:07:58)
links: abs | pdf
cs.SI | stat.AP | stat.ME

We describe our framework, deployed at Facebook, that accounts for interference between experimental units through cluster-randomized experiments. We document this system, including the design and estimation procedures, and detail insights we have gained from the many experiments that have used this system at scale. We introduce a cluster-based regression adjustment that substantially improves precision for estimating global treatment effects as well as testing for interference as part of our estimation procedure. With this regression adjustment, we find that imbalanced clusters can better account for interference than balanced clusters without sacrificing accuracy. In addition, we show how logging exposure to a treatment can be used for additional variance reduction. Interference is a widely acknowledged issue with online field experiments, yet there is less evidence from real-world experiments demonstrating interference in online settings. We fill this gap by describing two case studies that capture significant network effects and highlight the value of this experimentation framework.

"Network experimentation at scale" on arxiv, by #BrianKarrer and FB colleagues: https://t.co/oLkxNQ2jEL Looks to get a lot of milage out of ("agnostic") regression adjustments, including making imbalanced clusters reasonable/preferable. Looks like a lot to unpack…!
— Johan Ugander (@jugander) December 17, 2020

11. On Avoiding the Union Bound When Answering Multiple Differentially Private Queries

Badih Ghazi, Ravi Kumar, Pasin Manurangsi

retweets: 30, favorites: 77 (12/18/2020 18:07:58)
links: abs | pdf
cs.DS | cs.CR | cs.LG

In this work, we study the problem of answering $k$ queries with $(\epsilon, \delta)$ -differential privacy, where each query has sensitivity one. We give an algorithm for this task that achieves an expected $\ell_\infty$ error bound of $O(\frac{1}{\epsilon}\sqrt{k \log \frac{1}{\delta}})$ , which is known to be tight (Steinke and Ullman, 2016). A very recent work by Dagan and Kur (2020) provides a similar result, albeit via a completely different approach. One difference between our work and theirs is that our guarantee holds even when $\delta < 2^{-\Omega(k/(\log k)^8)}$ whereas theirs does not apply in this case. On the other hand, the algorithm of Dagan and Kur has a remarkable advantage that the $\ell_{\infty}$ error bound of $O(\frac{1}{\epsilon}\sqrt{k \log \frac{1}{\delta}})$ holds not only in expectation but always (i.e., with probability one) while we can only get a high probability (or expected) guarantee on the error.

WOW! A second solution!
🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉https://t.co/koNrAgC2w5 https://t.co/f6QxlJjk5g pic.twitter.com/9QGTAPyB7b
— Thomas Steinke (@shortstein) December 17, 2020

12. Improved StyleGAN Embedding: Where are the Good Latents?

Peihao Zhu, Rameen Abdal, Yipeng Qin, Peter Wonka

retweets: 30, favorites: 60 (12/18/2020 18:07:58)
links: abs | pdf
cs.CV | cs.GR

StyleGAN is able to produce photorealistic images almost indistinguishable from real ones. Embedding images into the StyleGAN latent space is not a trivial task due to the reconstruction quality and editing quality trade-off. In this paper, we first introduce a new normalized space to analyze the diversity and the quality of the reconstructed latent codes. This space can help answer the question of where good latent codes are located in latent space. Second, we propose a framework to analyze the quality of different embedding algorithms. Third, we propose an improved embedding algorithm based on our analysis. We compare our results with the current state-of-the-art methods and achieve a better trade-off between reconstruction quality and editing quality.

Improved StyleGAN Embedding: Where are the Good Latents?
pdf: https://t.co/eu8XvxfYM0
abs: https://t.co/vXOI5m9X6o pic.twitter.com/kVAVidIquC
— AK (@ak92501) December 17, 2020

Zhiqin Chen, Vladimir Kim, Matthew Fisher, Noam Aigerman, Hao Zhang, Siddhartha Chaudhuri

retweets: 16, favorites: 45 (12/18/2020 18:07:58)
links: abs | pdf
cs.CV | cs.GR | cs.LG

We introduce a deep generative network for 3D shape detailization, akin to stylization with the style being geometric details. We address the challenge of creating large varieties of high-resolution and detailed 3D geometry from a small set of exemplars by treating the problem as that of geometric detail transfer. Given a low-resolution coarse voxel shape, our network refines it, via voxel upsampling, into a higher-resolution shape enriched with geometric details. The output shape preserves the overall structure (or content) of the input, while its detail generation is conditioned on an input “style code” corresponding to a detailed exemplar. Our 3D detailization via conditional refinement is realized by a generative adversarial network, coined DECOR-GAN. The network utilizes a 3D CNN generator for upsampling coarse voxels and a 3D PatchGAN discriminator to enforce local patches of the generated model to be similar to those in the training detailed shapes. During testing, a style code is fed into the generator to condition the refinement. We demonstrate that our method can refine a coarse shape into a variety of detailed shapes with different styles. The generated results are evaluated in terms of content preservation, plausibility, and diversity. Comprehensive ablation studies are conducted to validate our network designs.

DECOR-GAN: 3D Shape Detailization by Conditional Refinement
pdf: https://t.co/lEDaAfJwZU
abs: https://t.co/3IBJnchHOe pic.twitter.com/JTNwGgooh6
— AK (@ak92501) December 17, 2020

14. StrokeGAN: Reducing Mode Collapse in Chinese Font Generation via Stroke Encoding

Jinshan Zeng, Qi Chen, Yunxin Liu, Mingwen Wang, Yuan Yao

retweets: 36, favorites: 23 (12/18/2020 18:07:58)
links: abs | pdf
cs.CV | cs.AI | cs.LG

The generation of stylish Chinese fonts is an important problem involved in many applications. Most of existing generation methods are based on the deep generative models, particularly, the generative adversarial networks (GAN) based models. However, these deep generative models may suffer from the mode collapse issue, which significantly degrades the diversity and quality of generated results. In this paper, we introduce a one-bit stroke encoding to capture the key mode information of Chinese characters and then incorporate it into CycleGAN, a popular deep generative model for Chinese font generation. As a result we propose an efficient method called StrokeGAN, mainly motivated by the observation that the stroke encoding contains amount of mode information of Chinese characters. In order to reconstruct the one-bit stroke encoding of the associated generated characters, we introduce a stroke-encoding reconstruction loss imposed on the discriminator. Equipped with such one-bit stroke encoding and stroke-encoding reconstruction loss, the mode collapse issue of CycleGAN can be significantly alleviated, with an improved preservation of strokes and diversity of generated characters. The effectiveness of StrokeGAN is demonstrated by a series of generation tasks over nine datasets with different fonts. The numerical results demonstrate that StrokeGAN generally outperforms the state-of-the-art methods in terms of content and recognition accuracies, as well as certain stroke error, and also generates more realistic characters.

StrokeGAN: Reducing Mode Collapse in Chinese Font Generation via Stroke Encoding
pdf: https://t.co/e9ry0h8IBY
abs: https://t.co/oJoqSVRix3
github: https://t.co/jIhjtEVW0F pic.twitter.com/OGeYlS5n51
— AK (@ak92501) December 17, 2020

15. No Budget? Don’t Flex! Cost Consideration when Planning to Adopt NLP for Your Business

Made Nindyatama Nityasya, Haryo Akbarianto Wibowo, Radityo Eko Prasojo, Alham Fikri Aji

retweets: 32, favorites: 21 (12/18/2020 18:07:58)
links: abs | pdf
cs.CL

Recent advances in Natural Language Processing (NLP) have largely pushed deep transformer-based models as the go-to state-of-the-art technique without much regard to the production and utilization cost. Companies planning to adopt these methods into their business face difficulties because of the lack of machine and human resources to build them. In this work, we compare both the performance and the cost of classical learning algorithms to the latest ones in common sequence and text labeling tasks. We find that classical models often perform on par with deep neural ones despite the lower cost. We argue that under many circumstances the smaller and lighter models fit better for AI-pivoting businesses and that we call for more research into low-cost models, especially for under-resourced languages.

16. A Closer Look at the Robustness of Vision-and-Language Pre-trained Models

Linjie Li, Zhe Gan, Jingjing Liu

retweets: 25, favorites: 27 (12/18/2020 18:07:58)
links: abs | pdf
cs.CV | cs.CL

Large-scale pre-trained multimodal transformers, such as ViLBERT and UNITER, have propelled the state of the art in vision-and-language (V+L) research to a new level. Although achieving impressive performance on standard tasks, to date, it still remains unclear how robust these pre-trained models are. To investigate, we conduct a host of thorough evaluations on existing pre-trained models over 4 different types of V+L specific model robustness: (i) Linguistic Variation; (ii) Logical Reasoning; (iii) Visual Content Manipulation; and (iv) Answer Distribution Shift. Interestingly, by standard model finetuning, pre-trained V+L models already exhibit better robustness than many task-specific state-of-the-art methods. To further enhance model robustness, we propose Mango, a generic and efficient approach that learns a Multimodal Adversarial Noise GeneratOr in the embedding space to fool pre-trained V+L models. Differing from previous studies focused on one specific type of robustness, Mango is task-agnostic, and enables universal performance lift for pre-trained models over diverse tasks designed to evaluate broad aspects of robustness. Comprehensive experiments demonstrate that Mango achieves new state of the art on 7 out of 9 robustness benchmarks, surpassing existing methods by a significant margin. As the first comprehensive study on V+L robustness, this work puts robustness of pre-trained models into sharper focus, pointing new directions for future study.

A Closer Look at the Robustness of Vision-and-Language Pre-trained Models
pdf: https://t.co/ahrw8ABOHl
abs: https://t.co/GrPeBGcINJ pic.twitter.com/Lodch2YaLt
— AK (@ak92501) December 17, 2020

Published 18 Dec 2020

ML Lead at Beatrust. (https://beatrust.com)Tatsuya Shirakawa on Twitter