Hot Papers 2021-05-20

1. E(n) Equivariant Normalizing Flows for Molecule Generation in 3D

Victor Garcia Satorras, Emiel Hoogeboom, Fabian B. Fuchs, Ingmar Posner, Max Welling

retweets: 2619, favorites: 288 (05/21/2021 07:15:34)
links: abs | pdf
cs.LG | physics.chem-ph | stat.ML

This paper introduces a generative model equivariant to Euclidean symmetries: E(n) Equivariant Normalizing Flows (E-NFs). To construct E-NFs, we take the discriminative E(n) graph neural networks and integrate them as a differential equation to obtain an invertible equivariant function: a continuous-time normalizing flow. We demonstrate that E-NFs considerably outperform baselines and existing methods from the literature on particle systems such as DW4 and LJ13, and on molecules from QM9 in terms of log-likelihood. To the best of our knowledge, this is the first likelihood-based deep generative model that generates molecules in 3D.

Very excited to share our latest work E(n) Equivariant Normalizing Flows for Molecule Generation in 3D. Joint work with @emiel_hoogeboom @FabianFuchsML @IngmarPosner @wellingmax.

Paper: https://t.co/glBZVfzKTu pic.twitter.com/DXSQZ44nev
— Víctor Garcia Satorras (@vgsatorras) May 20, 2021

2. Compositional Processing Emerges in Neural Networks Solving Math Problems

Jacob Russin, Roland Fernandez, Hamid Palangi, Eric Rosen, Nebojsa Jojic, Paul Smolensky, Jianfeng Gao

retweets: 962, favorites: 157 (05/21/2021 07:15:35)
links: abs | pdf
cs.LG | cs.AI | cs.CL

A longstanding question in cognitive science concerns the learning mechanisms underlying compositionality in human cognition. Humans can infer the structured relationships (e.g., grammatical rules) implicit in their sensory observations (e.g., auditory speech), and use this knowledge to guide the composition of simpler meanings into complex wholes. Recent progress in artificial neural networks has shown that when large models are trained on enough linguistic data, grammatical structure emerges in their representations. We extend this work to the domain of mathematical reasoning, where it is possible to formulate precise hypotheses about how meanings (e.g., the quantities corresponding to numerals) should be composed according to structured rules (e.g., order of operations). Our work shows that neural networks are not only able to infer something about the structured relationships implicit in their training data, but can also deploy this knowledge to guide the composition of individual meanings into composite wholes.

Compositional Processing Emerges in Neural Networks Solving Math Problems
pdf: https://t.co/2GwgUxHmSz
abs: https://t.co/aKriRDhlpI pic.twitter.com/Ufb4MrFWpA
— AK (@ak92501) May 20, 2021

3. Recursive-NeRF: An Efficient and Dynamically Growing NeRF

Guo-Wei Yang, Wen-Yang Zhou, Hao-Yang Peng, Dun Liang, Tai-Jiang Mu, Shi-Min Hu

retweets: 306, favorites: 114 (05/21/2021 07:15:35)
links: abs | pdf
cs.CV

View synthesis methods using implicit continuous shape representations learned from a set of images, such as the Neural Radiance Field (NeRF) method, have gained increasing attention due to their high quality imagery and scalability to high resolution. However, the heavy computation required by its volumetric approach prevents NeRF from being useful in practice; minutes are taken to render a single image of a few megapixels. Now, an image of a scene can be rendered in a level-of-detail manner, so we posit that a complicated region of the scene should be represented by a large neural network while a small neural network is capable of encoding a simple region, enabling a balance between efficiency and quality. Recursive-NeRF is our embodiment of this idea, providing an efficient and adaptive rendering and training approach for NeRF. The core of Recursive-NeRF learns uncertainties for query coordinates, representing the quality of the predicted color and volumetric intensity at each level. Only query coordinates with high uncertainties are forwarded to the next level to a bigger neural network with a more powerful representational capability. The final rendered image is a composition of results from neural networks of all levels. Our evaluation on three public datasets shows that Recursive-NeRF is more efficient than NeRF while providing state-of-the-art quality. The code will be available at https://github.com/Gword/Recursive-NeRF.

Recursive-NeRF: An Efficient and Dynamically Growing NeRF
pdf: https://t.co/x7qdU1BgKV
abs: https://t.co/eLxxBeWotE

learns uncertainties for query coordinates, representing the quality of the predicted color and volumetric intensity at each level pic.twitter.com/JFqCvSK8VT
— AK (@ak92501) May 20, 2021

4. Ab-initio study of interacting fermions at finite temperature with neural canonical transformation

Hao Xie, Linfeng Zhang, Lei Wang

retweets: 240, favorites: 72 (05/21/2021 07:15:35)
links: abs | pdf
cond-mat.str-el | cond-mat.quant-gas | cond-mat.stat-mech | cs.LG | physics.comp-ph

We present a variational density matrix approach to the thermal properties of interacting fermions in the continuum. The variational density matrix is parametrized by a permutation equivariant many-body unitary transformation together with a discrete probabilistic model. The unitary transformation is implemented as a quantum counterpart of neural canonical transformation, which incorporates correlation effects via a flow of fermion coordinates. As the first application, we study electrons in a two-dimensional quantum dot with an interaction-induced crossover from Fermi liquid to Wigner molecule. The present approach provides accurate results in the low-temperature regime, where conventional quantum Monte Carlo methods face severe difficulties due to the fermion sign problem. The approach is general and flexible for further extensions, thus holds the promise to deliver new physical results on strongly correlated fermions in the context of ultracold quantum gases, condensed matter, and warm dense matter physics.

FermiFlow: Ab-initio study of fermions at finite temperature

Code: https://t.co/qlryo5fzT5…
Paper: https://t.co/1NPzfrzROK

Animation shows the flow of electrons in a quantum dot towards the so called "Wigner molecule" structure. pic.twitter.com/ZgMhqicj65
— Lei Wang (@wangleiphy) May 20, 2021

5. High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network

Jie Liang, Hui Zeng, Lei Zhang

retweets: 225, favorites: 85 (05/21/2021 07:15:35)
links: abs | pdf
cs.CV

Existing image-to-image translation (I2IT) methods are either constrained to low-resolution images or long inference time due to their heavy computational burden on the convolution of high-resolution feature maps. In this paper, we focus on speeding-up the high-resolution photorealistic I2IT tasks based on closed-form Laplacian pyramid decomposition and reconstruction. Specifically, we reveal that the attribute transformations, such as illumination and color manipulation, relate more to the low-frequency component, while the content details can be adaptively refined on high-frequency components. We consequently propose a Laplacian Pyramid Translation Network (LPTN) to simultaneously perform these two tasks, where we design a lightweight network for translating the low-frequency component with reduced resolution and a progressive masking strategy to efficiently refine the high-frequency ones. Our model avoids most of the heavy computation consumed by processing high-resolution feature maps and faithfully preserves the image details. Extensive experimental results on various tasks demonstrate that the proposed method can translate 4K images in real-time using one normal GPU while achieving comparable transformation performance against existing methods. Datasets and codes are available: https://github.com/csjliang/LPTN.

High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network
pdf: https://t.co/VcTs6HbIzd
abs: https://t.co/sbXrdixFot
github: https://t.co/cc3c0wQ3up pic.twitter.com/gcLUdbzcn4
— AK (@ak92501) May 20, 2021

Jing Yu Koh, Honglak Lee, Yinfei Yang, Jason Baldridge, Peter Anderson

retweets: 120, favorites: 75 (05/21/2021 07:15:35)
links: abs | pdf
cs.CV | cs.LG

People navigating in unfamiliar buildings take advantage of myriad visual, spatial and semantic cues to efficiently achieve their navigation goals. Towards equipping computational agents with similar capabilities, we introduce Pathdreamer, a visual world model for agents navigating in novel indoor environments. Given one or more previous visual observations, Pathdreamer generates plausible high-resolution 360 visual observations (RGB, semantic segmentation and depth) for viewpoints that have not been visited, in buildings not seen during training. In regions of high uncertainty (e.g. predicting around corners, imagining the contents of an unseen room), Pathdreamer can predict diverse scenes, allowing an agent to sample multiple realistic outcomes for a given trajectory. We demonstrate that Pathdreamer encodes useful and accessible visual, spatial and semantic knowledge about human environments by using it in the downstream task of Vision-and-Language Navigation (VLN). Specifically, we show that planning ahead with Pathdreamer brings about half the benefit of looking ahead at actual observations from unobserved parts of the environment. We hope that Pathdreamer will help unlock model-based approaches to challenging embodied navigation tasks such as navigating to specified objects and VLN.

Pathdreamer: A World Model for Indoor Navigation
pdf: https://t.co/Ia9jYWAUMN
abs: https://t.co/eNEBInFwPX pic.twitter.com/4WzEGcR19F
— AK (@ak92501) May 20, 2021

Pathdreamer: A World Model for Indoor Navigation

Pathdreamer generates plausible high-res 360 visual observations for viewpoints that have not been visited, in buildings not seen during training.

abs: https://t.co/xhwvq91jTt
video: https://t.co/F18Wjq9wyN
— Aran Komatsuzaki (@arankomatsuzaki) May 20, 2021

7. Effective Attention Sheds Light On Interpretability

Kaiser Sun, Ana Marasović

retweets: 78, favorites: 72 (05/21/2021 07:15:35)
links: abs | pdf
cs.CL

An attention matrix of a transformer self-attention sublayer can provably be decomposed into two components and only one of them (effective attention) contributes to the model output. This leads us to ask whether visualizing effective attention gives different conclusions than interpretation of standard attention. Using a subset of the GLUE tasks and BERT, we carry out an analysis to compare the two attention matrices, and show that their interpretations differ. Effective attention is less associated with the features related to the language modeling pretraining such as the separator token, and it has more potential to illustrate linguistic features captured by the model for solving the end-task. Given the found differences, we recommend using effective attention for studying a transformer’s behavior since it is more pertinent to the model output by design.

Our paper “Effective Attention Sheds Light On Interpretability”(w/ @anmarasovic) was accepted into Findings of ACL2021 #ACL2021NLP #NLProc

Pre-print available at: https://t.co/730HNeNiUC
Thread⬇️ pic.twitter.com/7fThrXDCmY
— Kaiser Sun (@KaiserWhoLearns) May 20, 2021

Effective Attention Sheds Light On Interpretability
pdf: https://t.co/07oPOjXsnR
abs: https://t.co/22pqtKfeFX

Effective attention is less associated with the features related to the language modeling pretraining such as the separator token pic.twitter.com/aG7mvKO4ax
— AK (@ak92501) May 20, 2021

8. Large-scale Localization Datasets in Crowded Indoor Spaces

Donghwan Lee, Soohyun Ryu, Suyong Yeon, Yonghan Lee, Deokhwa Kim, Cheolho Han, Yohann Cabon, Philippe Weinzaepfel, Nicolas Guérin, Gabriela Csurka, Martin Humenberger

retweets: 81, favorites: 55 (05/21/2021 07:15:36)
links: abs | pdf
cs.CV

Estimating the precise location of a camera using visual localization enables interesting applications such as augmented reality or robot navigation. This is particularly useful in indoor environments where other localization technologies, such as GNSS, fail. Indoor spaces impose interesting challenges on visual localization algorithms: occlusions due to people, textureless surfaces, large viewpoint changes, low light, repetitive textures, etc. Existing indoor datasets are either comparably small or do only cover a subset of the mentioned challenges. In this paper, we introduce 5 new indoor datasets for visual localization in challenging real-world environments. They were captured in a large shopping mall and a large metro station in Seoul, South Korea, using a dedicated mapping platform consisting of 10 cameras and 2 laser scanners. In order to obtain accurate ground truth camera poses, we developed a robust LiDAR SLAM which provides initial poses that are then refined using a novel structure-from-motion based optimization. We present a benchmark of modern visual localization algorithms on these challenging datasets showing superior performance of structure-based methods using robust image features. The datasets are available at: https://naverlabs.com/datasets

Large-scale Localization Datasets in Crowded Indoor Spaces
pdf: https://t.co/jfWfeBeiiF
abs: https://t.co/xNfgD1KaQ8

5 new indoor datasets for visual localization in challenging real-world environments pic.twitter.com/wOAgLgFiId
— AK (@ak92501) May 20, 2021

Large-scale Localization Datasets in Crowded Indoor Spaces
Donghwan Lee et al (incl. @WeinzaepfelP and @naverlabseurope )

tl;dr: multisensor dataset + benchmark for indoor VisLoc.
P.S. ESAC and PoseNet fail, structure methods rule.https://t.co/x5yKa5NGuE pic.twitter.com/8K8rR1y8f7
— Dmytro Mishkin (@ducha_aiki) May 20, 2021

9. Multi-Person Extreme Motion Prediction with Cross-Interaction Attention

Wen Guo, Xiaoyu Bie, Xavier Alameda-Pineda, Francesc Moreno

retweets: 74, favorites: 28 (05/21/2021 07:15:36)
links: abs | pdf
cs.CV

Human motion prediction aims to forecast future human poses given a sequence of past 3D skeletons. While this problem has recently received increasing attention, it has mostly been tackled for single humans in isolation. In this paper we explore this problem from a novel perspective, involving humans performing collaborative tasks. We assume that the input of our system are two sequences of past skeletons for two interacting persons, and we aim to predict the future motion for each of them. For this purpose, we devise a novel cross interaction attention mechanism that exploits historical information of both persons and learns to predict cross dependencies between self poses and the poses of the other person in spite of their spatial or temporal distance. Since no dataset to train such interactive situations is available, we have captured ExPI (Extreme Pose Interaction), a new lab-based person interaction dataset of professional dancers performing acrobatics. ExPI contains 115 sequences with 30k frames and 60k instances with annotated 3D body poses and shapes. We thoroughly evaluate our cross-interaction network on this dataset and show that both in short-term and long-term predictions, it consistently outperforms baselines that independently reason for each person. We plan to release our code jointly with the dataset and the train/test splits to spur future research on the topic.

Interested on extreme pose estimation in interactive scenarios, check our pre-print (https://t.co/8xOpnNEJ6v) and dataset (https://t.co/GEFtIDAleU). Joint work with @fmorenoguer @wen80560669 @BieXiaoyu pic.twitter.com/CtTBc8MSuw
— Xavier Alameda-Pineda (@xavirema) May 20, 2021

10. Tool- and Domain-Agnostic Parameterization of Style Transfer Effects Leveraging Pretrained Perceptual Metrics

Hiromu Yakura, Yuki Koyama, Masataka Goto

retweets: 42, favorites: 46 (05/21/2021 07:15:36)
links: abs | pdf
cs.LG | cs.CV | cs.HC

Current deep learning techniques for style transfer would not be optimal for design support since their “one-shot” transfer does not fit exploratory design processes. To overcome this gap, we propose parametric transcription, which transcribes an end-to-end style transfer effect into parameter values of specific transformations available in an existing content editing tool. With this approach, users can imitate the style of a reference sample in the tool that they are familiar with and thus can easily continue further exploration by manipulating the parameters. To enable this, we introduce a framework that utilizes an existing pretrained model for style transfer to calculate a perceptual style distance to the reference sample and uses black-box optimization to find the parameters that minimize this distance. Our experiments with various third-party tools, such as Instagram and Blender, show that our framework can effectively leverage deep learning techniques for computational design support.

ACT-Xなどで取り組んでいた研究が、採択率13.9%という思っていたより激しい競争をくぐり抜けてIJCAI2021に採択されました！既存の訓練済みモデルをうまく使ってコンピュテーショナルデザインを行う面白い研究になっていると思います。arXivにも上げているのでよければぜひ！ https://t.co/nVkYaOW9nY https://t.co/EwK5aeF0UI
— Hiromu Yakura (@hiromu1996) May 20, 2021

Published 21 May 2021

ML Lead at Beatrust. (https://beatrust.com)Tatsuya Shirakawa on Twitter