Hot Papers 2021-03-10

1. NeX: Real-time View Synthesis with Neural Basis Expansion

Suttisak Wizadwongsa, Pakkapon Phongthawee, Jiraphon Yenphraphai, Supasorn Suwajanakorn

retweets: 2162, favorites: 251 (03/11/2021 08:39:33)
links: abs | pdf
cs.CV | cs.GR | cs.LG

We present NeX, a new approach to novel view synthesis based on enhancements of multiplane image (MPI) that can reproduce next-level view-dependent effects — in real time. Unlike traditional MPI that uses a set of simple RGB $\alpha$ planes, our technique models view-dependent effects by instead parameterizing each pixel as a linear combination of basis functions learned from a neural network. Moreover, we propose a hybrid implicit-explicit modeling strategy that improves upon fine detail and produces state-of-the-art results. Our method is evaluated on benchmark forward-facing datasets as well as our newly-introduced dataset designed to test the limit of view-dependent modeling with significantly more challenging effects such as rainbow reflections on a CD. Our method achieves the best overall scores across all major metrics on these datasets with more than 1000 $\times$ faster rendering time than the state of the art. For real-time demos, visit https://nex-mpi.github.io/

NeX: Real-time View Synthesis with Neural Basis Expansion
pdf: https://t.co/kJecG5uNAF
abs: https://t.co/sS4LSrglFZ pic.twitter.com/UYo60IM5xi
— AK (@ak92501) March 10, 2021

2. Pretrained Transformers as Universal Computation Engines

Kevin Lu, Aditya Grover, Pieter Abbeel, Igor Mordatch

retweets: 1310, favorites: 314 (03/11/2021 08:39:33)
links: abs | pdf
cs.LG | cs.AI

We investigate the capability of a transformer pretrained on natural language to generalize to other modalities with minimal finetuning — in particular, without finetuning of the self-attention and feedforward layers of the residual blocks. We consider such a model, which we call a Frozen Pretrained Transformer (FPT), and study finetuning it on a variety of sequence classification tasks spanning numerical computation, vision, and protein fold prediction. In contrast to prior works which investigate finetuning on the same modality as the pretraining dataset, we show that pretraining on natural language improves performance and compute efficiency on non-language downstream tasks. In particular, we find that such pretraining enables FPT to generalize in zero-shot to these modalities, matching the performance of a transformer fully trained on these tasks.

What are the limits to the generalization of large pretrained transformer models?

We find minimal fine-tuning (~0.1% of params) performs as well as training from scratch on a completely new modality!

with @_kevinlu, @adityagrover_, @pabbeel
paper: https://t.co/DtWGJ0Afh7

1/8
— Igor Mordatch (@IMordatch) March 10, 2021

"Pretrained Transformers as Universal Computation Engines," Lu et al.: https://t.co/AUXvmXSDpS

"We investigate the capability of a transformer pretrained on natural language to generalize to other modalities...w/o finetuning of the self-attn and FF layers of the res. blocks."
— Miles Brundage (@Miles_Brundage) March 10, 2021

Pretrained Transformers As Universal Computation Engines
pdf: https://t.co/GmFFRJXi63
abs: https://t.co/B4o9hpACj9 pic.twitter.com/kwoM2NEK9u
— AK (@ak92501) March 10, 2021

3. GAN Vocoder: Multi-Resolution Discriminator Is All You Need

Jaeseong You, Dalhyun Kim, Gyuhyeon Nam, Geumbyeol Hwang, Gyeongsu Chae

retweets: 616, favorites: 180 (03/11/2021 08:39:34)
links: abs | pdf
cs.SD | cs.LG | eess.AS

Several of the latest GAN-based vocoders show remarkable achievements, outperforming autoregressive and flow-based competitors in both qualitative and quantitative measures while synthesizing orders of magnitude faster. In this work, we hypothesize that the common factor underlying their success is the multi-resolution discriminating framework, not the minute details in architecture, loss function, or training strategy. We experimentally test the hypothesis by evaluating six different generators paired with one shared multi-resolution discriminating framework. For all evaluative measures with respect to text-to-speech syntheses and for all perceptual metrics, their performances are not distinguishable from one another, which supports our hypothesis.

GAN Vocoder: Multi-Resolution Discriminator Is All You Need

Shows that performance of vocoder most heavily depends on the use of multi-resolution discriminator rather than other architectural details.

abs: https://t.co/mN0k74Als8
project: https://t.co/76IOTqlOWk pic.twitter.com/4VWDH03CzJ
— Aran Komatsuzaki (@arankomatsuzaki) March 10, 2021

GAN Vocoder: Multi-Resolution Discriminator Is All You Need
pdf: https://t.co/aieXpQxJ1y
abs: https://t.co/oDV3bwdCKK pic.twitter.com/FmxMZJyjd3
— AK (@ak92501) March 10, 2021

4. Advances in Inference and Representation for Simultaneous Localization and Mapping

David M. Rosen, Kevin J. Doherty, Antonio Teran Espinoza, John J. Leonard

retweets: 558, favorites: 112 (03/11/2021 08:39:34)
links: abs | pdf
cs.RO | cs.CV

Simultaneous localization and mapping (SLAM) is the process of constructing a global model of an environment from local observations of it; this is a foundational capability for mobile robots, supporting such core functions as planning, navigation, and control. This article reviews recent progress in SLAM, focusing on advances in the expressive capacity of the environmental models used in SLAM systems (representation) and the performance of the algorithms used to estimate these models from data (inference). A prominent theme of recent SLAM research is the pursuit of environmental representations (including learned representations) that go beyond the classical attributes of geometry and appearance to model properties such as hierarchical organization, affordance, dynamics, and semantics; these advances equip autonomous agents with a more comprehensive understanding of the world, enabling more versatile and intelligent operation. A second major theme is a revitalized interest in the mathematical properties of the SLAM estimation problem itself (including its computational and information-theoretic performance limits); this work has led to the development of novel classes of certifiable and robust inference methods that dramatically improve the reliability of SLAM systems in real-world operation. We survey these advances with an emphasis on their ramifications for achieving robust, long-duration autonomy, and conclude with a discussion of open challenges and a perspective on future research directions.

SLAM review from MIT
Advances in Inference and Representation for Simultaneous Localization and Mappinghttps://t.co/wZtjn3zLIC
— Giseop Kim (@GiseopK) March 10, 2021

5. Model Complexity of Deep Learning: A Survey

Xia Hu, Lingyang Chu, Jian Pei, Weiqing Liu, Jiang Bian

retweets: 468, favorites: 133 (03/11/2021 08:39:34)
links: abs | pdf
cs.LG | cs.AI

Model complexity is a fundamental problem in deep learning. In this paper we conduct a systematic overview of the latest studies on model complexity in deep learning. Model complexity of deep learning can be categorized into expressive capacity and effective model complexity. We review the existing studies on those two categories along four important factors, including model framework, model size, optimization process and data complexity. We also discuss the applications of deep learning model complexity including understanding model generalization capability, model optimization, and model selection and design. We conclude by proposing several interesting future directions.

There are many factors that contribute to model complexity in deep learning. Understanding these factors allows us to better quantify generalization and improve optimization of deep learning models. This new survey paper discusses more on the topic:https://t.co/NJK4EYgRwb pic.twitter.com/SF6JB0NM2d
— elvis (@omarsar0) March 10, 2021

6. The Physics of Financial Networks

Marco Bardoscia, Paolo Barucca, Stefano Battiston, Fabio Caccioli, Giulio Cimini, Diego Garlaschelli, Fabio Saracco, Tiziano Squartini, Guido Caldarelli

retweets: 306, favorites: 121 (03/11/2021 08:39:34)
links: abs | pdf
physics.soc-ph | cond-mat.stat-mech | cs.SI | q-fin.RM

The field of Financial Networks is a paramount example of the novel applications of Statistical Physics that have made possible by the present data revolution. As the total value of the global financial market has vastly outgrown the value of the real economy, financial institutions on this planet have created a web of interactions whose size and topology calls for a quantitative analysis by means of Complex Networks. Financial Networks are not only a playground for the use of basic tools of statistical physics as ensemble representation and entropy maximization; rather, their particular dynamics and evolution triggered theoretical advancements as the definition of DebtRank to measure the impact and diffusion of shocks in the whole systems. In this review we present the state of the art in this field, starting from the different definitions of financial networks (based either on loans, on assets ownership, on contracts involving several parties — such as credit default swaps, to multiplex representation when firms are introduced in the game and a link with real economy is drawn) and then discussing the various dynamics of financial contagion as well as applications in financial network inference and validation. We believe that this analysis is particularly timely since financial stability as well as recent innovations in climate finance, once properly analysed and understood in terms of complex network theory, can play a pivotal role in the transformation of our society towards a more sustainable world.

The Physics of Financial Networkshttps://t.co/XaYx5qdAPk
金融ネットワークを統計力学的手法で解析する研究の最新レビュー、金の匂いがするね pic.twitter.com/eRLYpoFooY
— 部品（本田翼） (@tjmlab) March 10, 2021

New preprint with many colleagues and friends @marcobardoscia @StefBattiston @Gius_C @__Sarawalk__ and more https://t.co/ZspBrnnAgL
— Guido Caldarelli 贵多 (@GuidoCaldarelli) March 10, 2021

7. Pixel-wise Anomaly Detection in Complex Driving Scenes

Giancarlo Di Biase, Hermann Blum, Roland Siegwart, Cesar Cadena

retweets: 72, favorites: 37 (03/11/2021 08:39:35)
links: abs | pdf
cs.CV

The inability of state-of-the-art semantic segmentation methods to detect anomaly instances hinders them from being deployed in safety-critical and complex applications, such as autonomous driving. Recent approaches have focused on either leveraging segmentation uncertainty to identify anomalous areas or re-synthesizing the image from the semantic label map to find dissimilarities with the input image. In this work, we demonstrate that these two methodologies contain complementary information and can be combined to produce robust predictions for anomaly segmentation. We present a pixel-wise anomaly detection framework that uses uncertainty maps to improve over existing re-synthesis methods in finding dissimilarities between the input and generated images. Our approach works as a general framework around already trained segmentation networks, which ensures anomaly detection without compromising segmentation accuracy, while significantly outperforming all similar methods. Top-2 performance across a range of different anomaly datasets shows the robustness of our approach to handling different anomaly instances.

Pixel-wise Anomaly Detection in Complex Driving Scenes
pdf: https://t.co/MPwOEuzBV6
abs: https://t.co/kTp2GUgzZv pic.twitter.com/t2R2SYWqZ1
— AK (@ak92501) March 10, 2021

8. Symbolic integration by integrating learning models with different strengths and weaknesses

Hazumi Kubota, Yuta Tokuoka, Takahiro G. Yamada, Akira Funahashi

retweets: 72, favorites: 23 (03/11/2021 08:39:35)
links: abs | pdf
cs.LG | cs.SC

Integration is indispensable, not only in mathematics, but also in a wide range of other fields. A deep learning method has recently been developed and shown to be capable of integrating mathematical functions that could not previously be integrated on a computer. However, that method treats integration as equivalent to natural language translation and does not reflect mathematical information. In this study, we adjusted the learning model to take mathematical information into account and developed a wide range of learning models that learn the order of numerical operations more robustly. In this way, we achieved a 98.80% correct answer rate with symbolic integration, a higher rate than that of any existing method. We judged the correctness of the integration based on whether the derivative of the primitive function was consistent with the integrand. By building an integrated model based on this strategy, we achieved a 99.79% rate of correct answers with symbolic integration.

We (Kubota et al.) have succeeded in developing a symbolic integration algorithm which achieved an astonishing accuracy of 99.79%. The manuscript is on arXiv (https://t.co/8RswIGuHZD), and the code is here (https://t.co/eqaIivP2xQ). Please check it out! #machinelearning
— Akira Funahashi (@akira_funahashi) March 10, 2021

9. Knowledge Evolution in Neural Networks

Ahmed Taha, Abhinav Shrivastava, Larry Davis

retweets: 64, favorites: 22 (03/11/2021 08:39:35)
links: abs | pdf
cs.CV | cs.AI | cs.LG

Deep learning relies on the availability of a large corpus of data (labeled or unlabeled). Thus, one challenging unsettled question is: how to train a deep network on a relatively small dataset? To tackle this question, we propose an evolution-inspired training approach to boost performance on relatively small datasets. The knowledge evolution (KE) approach splits a deep network into two hypotheses: the fit-hypothesis and the reset-hypothesis. We iteratively evolve the knowledge inside the fit-hypothesis by perturbing the reset-hypothesis for multiple generations. This approach not only boosts performance, but also learns a slim network with a smaller inference cost. KE integrates seamlessly with both vanilla and residual convolutional networks. KE reduces both overfitting and the burden for data collection. We evaluate KE on various network architectures and loss functions. We evaluate KE using relatively small datasets (e.g., CUB-200) and randomly initialized deep networks. KE achieves an absolute 21% improvement margin on a state-of-the-art baseline. This performance improvement is accompanied by a relative 73% reduction in inference cost. KE achieves state-of-the-art results on classification and metric learning benchmarks. Code available at http://bit.ly/3uLgwYb

Knowledge Evolution in Neural Networks
pdf: https://t.co/ToIB5qELKq
abs: https://t.co/9x8YqrbefD
github: https://t.co/PphIhCfPdJ pic.twitter.com/Oq8el7nhN2
— AK (@ak92501) March 10, 2021

10. ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis

Yinan He, Bei Gan, Siyu Chen, Yichun Zhou, Guojun Yin, Luchuan Song, Lu Sheng, Jing Shao, Ziwei Liu

retweets: 35, favorites: 42 (03/11/2021 08:39:35)
links: abs | pdf
cs.CV | cs.LG

The rapid progress of photorealistic synthesis techniques has reached at a critical point where the boundary between real and manipulated images starts to blur. Thus, benchmarking and advancing digital forgery analysis have become a pressing issue. However, existing face forgery datasets either have limited diversity or only support coarse-grained analysis. To counter this emerging threat, we construct the ForgeryNet dataset, an extremely large face forgery dataset with unified annotations in image- and video-level data across four tasks: 1) Image Forgery Classification, including two-way (real / fake), three-way (real / fake with identity-replaced forgery approaches / fake with identity-remained forgery approaches), and n-way (real and 15 respective forgery approaches) classification. 2) Spatial Forgery Localization, which segments the manipulated area of fake images compared to their corresponding source real images. 3) Video Forgery Classification, which re-defines the video-level forgery classification with manipulated frames in random positions. This task is important because attackers in real world are free to manipulate any target frame. and 4) Temporal Forgery Localization, to localize the temporal segments which are manipulated. ForgeryNet is by far the largest publicly available deep face forgery dataset in terms of data-scale (2.9 million images, 221,247 videos), manipulations (7 image-level approaches, 8 video-level approaches), perturbations (36 independent and more mixed perturbations) and annotations (6.3 million classification labels, 2.9 million manipulated area annotations and 221,247 temporal forgery segment labels). We perform extensive benchmarking and studies of existing face forensics methods and obtain several valuable observations.

Our #CVPR2021 **oral** paper "ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis":

Paper: https://t.co/hz3swCSfdD
Project Page: https://t.co/WeqnDByAHL

- The largest face forgery dataset in terms of data-scale, manipulations, perturbations, and annotations. pic.twitter.com/6iMpezQrO5
— Ziwei Liu (@liuziwei7) March 10, 2021

11. Higher-order Network Analysis Takes Off, Fueled by Classical Ideas and New Data

Austin R. Benson, David F. Gleich, Desmond J. Higham

retweets: 30, favorites: 25 (03/11/2021 08:39:35)
links: abs | pdf
cs.SI | physics.soc-ph

Higher-order network analysis uses the ideas of hypergraphs, simplicial complexes, multilinear and tensor algebra, and more, to study complex systems. These are by now well established mathematical abstractions. What’s new is that the ideas can be tested and refined on the type of large-scale data arising in today’s digital world. This research area therefore is making an impact across many applications. Here, we provide a brief history, guide, and survey.

"Higher-order Network Analysis Takes Off, Fueled by Classical Ideas and New Data" (by Austin R. Benson, David F. Gleich, Desmond J. Higham): https://t.co/ii4Rkdhgxj
— DynamicalSystemsSIAM (@DynamicsSIAM) March 10, 2021

Published 11 Mar 2021

ML Lead at Beatrust. (https://beatrust.com)Tatsuya Shirakawa on Twitter