Hot Papers 2021-07-28

1. Geometric Deep Learning on Molecular Representations

Kenneth Atz, Francesca Grisoni, Gisbert Schneider

retweets: 7316, favorites: 358 (07/29/2021 10:09:49)
links: abs | pdf
physics.chem-ph | cs.AI | cs.LG | q-bio.BM

Geometric deep learning (GDL), which is based on neural network architectures that incorporate and process symmetry information, has emerged as a recent paradigm in artificial intelligence. GDL bears particular promise in molecular modeling applications, in which various molecular representations with different symmetry properties and levels of abstraction exist. This review provides a structured and harmonized overview of molecular GDL, highlighting its applications in drug discovery, chemical synthesis prediction, and quantum chemistry. Emphasis is placed on the relevance of the learned molecular features and their complementarity to well-established molecular descriptors. This review provides an overview of current challenges and opportunities, and presents a forecast of the future of GDL for molecular sciences.

How can geometric #DeepLearning help us tackle modeling challenges in molecular sciences? Check out our latest review "Geometric Deep Learning on Molecular Representations" https://t.co/nfffFMemME @keennethy @ETH_en #MachineLearning #AI #chemtwitter #compchem pic.twitter.com/SjIv62W0Bf
— Francesca Grisoni (@fra_grisoni) July 28, 2021

2. Segmentation in Style: Unsupervised Semantic Image Segmentation with Stylegan and CLIP

Daniil Pakhomov, Sanchit Hira, Narayani Wagle, Kemar E. Green, Nassir Navab

retweets: 3420, favorites: 318 (07/29/2021 10:09:50)
links: abs | pdf
cs.CV

We introduce a method that allows to automatically segment images into semantically meaningful regions without human supervision. Derived regions are consistent across different images and coincide with human-defined semantic classes on some datasets. In cases where semantic regions might be hard for human to define and consistently label, our method is still able to find meaningful and consistent semantic classes. In our work, we use pretrained StyleGAN2~\cite{karras2020analyzing} generative model: clustering in the feature space of the generative model allows to discover semantic classes. Once classes are discovered, a synthetic dataset with generated images and corresponding segmentation masks can be created. After that a segmentation model is trained on the synthetic dataset and is able to generalize to real images. Additionally, by using CLIP~\cite{radford2021learning} we are able to use prompts defined in a natural language to discover some desired semantic classes. We test our method on publicly available datasets and show state-of-the-art results.

Segmentation in Style: Unsupervised Semantic Image Segmentation with Stylegan and CLIP
pdf: https://t.co/KSjm8txURN
abs: https://t.co/olELgzAMqI

a method that allows to automatically segment images into semantically meaningful regions without human supervision pic.twitter.com/or7MmWgL15
— AK (@ak92501) July 28, 2021

3. H3D-Net: Few-Shot High-Fidelity 3D Head Reconstruction

Eduard Ramon, Gil Triginer, Janna Escur, Albert Pumarola, Jaime Garcia, Xavier Giro-i-Nieto, Francesc Moreno-Noguer

retweets: 3304, favorites: 268 (07/29/2021 10:09:50)
links: abs | pdf
cs.CV | cs.AI

Recent learning approaches that implicitly represent surface geometry using coordinate-based neural representations have shown impressive results in the problem of multi-view 3D reconstruction. The effectiveness of these techniques is, however, subject to the availability of a large number (several tens) of input views of the scene, and computationally demanding optimizations. In this paper, we tackle these limitations for the specific problem of few-shot full 3D head reconstruction, by endowing coordinate-based representations with a probabilistic shape prior that enables faster convergence and better generalization when using few input images (down to three). First, we learn a shape model of 3D heads from thousands of incomplete raw scans using implicit representations. At test time, we jointly overfit two coordinate-based neural networks to the scene, one modeling the geometry and another estimating the surface radiance, using implicit differentiable rendering. We devise a two-stage optimization strategy in which the learned prior is used to initialize and constrain the geometry during an initial optimization phase. Then, the prior is unfrozen and fine-tuned to the scene. By doing this, we achieve high-fidelity head reconstructions, including hair and shoulders, and with a high level of detail that consistently outperforms both state-of-the-art 3D Morphable Models methods in the few-shot scenario, and non-parametric methods when large sets of views are available.

H3D-Net: Few-Shot High-Fidelity 3D Head Reconstruction
pdf: https://t.co/NsZ0nPgunn
abs: https://t.co/3JH1SgeGJg
project page: https://t.co/WWwTqCoeGx pic.twitter.com/w1IHGlzA1j
— AK (@ak92501) July 28, 2021

4. TaikoNation: Patterning-focused Chart Generation for Rhythm Action Games

Emily Halina, Matthew Guzdial

retweets: 2064, favorites: 256 (07/29/2021 10:09:50)
links: abs | pdf
cs.LG | cs.SD | eess.AS

Generating rhythm game charts from songs via machine learning has been a problem of increasing interest in recent years. However, all existing systems struggle to replicate human-like patterning: the placement of game objects in relation to each other to form congruent patterns based on events in the song. Patterning is a key identifier of high quality rhythm game content, seen as a necessary component in human rankings. We establish a new approach for chart generation that produces charts with more congruent, human-like patterning than seen in prior work.

My first academic paper is now public!

TaikoNation: Patterning-focused Chart Generation for Rhythm Action Games

"We establish a new approach for chart generation that produces charts with more congruent, human-like patterning than seen in prior work."https://t.co/2915W3IuWM pic.twitter.com/lM5JALsenR
— Emily (@livingsuitcase) July 28, 2021

5. QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering and Reading Comprehension

Anna Rogers, Matt Gardner, Isabelle Augenstein

retweets: 1766, favorites: 175 (07/29/2021 10:09:51)
links: abs | pdf
cs.CL | cs.AI

Alongside huge volumes of research on deep learning models in NLP in the recent years, there has been also much work on benchmark datasets needed to track modeling progress. Question answering and reading comprehension have been particularly prolific in this regard, with over 80 new datasets appearing in the past two years. This study is the largest survey of the field to date. We provide an overview of the various formats and domains of the current resources, highlighting the current lacunae for future work. We further discuss the current classifications of reasoning types” in question answering and propose a new taxonomy. We also discuss the implications of over-focusing on English, and survey the current monolingual resources for other languages and multilingual resources. The study is aimed at both practitioners looking for pointers to the wealth of existing data, and at researchers working on new resources.

#NLPaperAlert: QA Dataset Explosion!🔥
A survey of 200+ QA/RC datasets proposing a taxonomy of formats & reasoning skills. Also in the bag: modalities, conversational QA, domains & beyond-English data.
Honored to work on this with @nlpmattg & @IAugenstein https://t.co/xaz9GIXjI4 pic.twitter.com/HVfAPvm3OC
— Anna Rogers (@annargrs) July 28, 2021

6. Human-Level Reinforcement Learning through Theory-Based Modeling, Exploration, and Planning

Pedro A. Tsividis, Joao Loula, Jake Burga, Nathan Foss, Andres Campero, Thomas Pouncy, Samuel J. Gershman, Joshua B. Tenenbaum

retweets: 100, favorites: 54 (07/29/2021 10:09:51)
links: abs | pdf
cs.AI

Reinforcement learning (RL) studies how an agent comes to achieve reward in an environment through interactions over time. Recent advances in machine RL have surpassed human expertise at the world’s oldest board games and many classic video games, but they require vast quantities of experience to learn successfully — none of today’s algorithms account for the human ability to learn so many different tasks, so quickly. Here we propose a new approach to this challenge based on a particularly strong form of model-based RL which we call Theory-Based Reinforcement Learning, because it uses human-like intuitive theories — rich, abstract, causal models of physical objects, intentional agents, and their interactions — to explore and model an environment, and plan effectively to achieve task goals. We instantiate the approach in a video game playing agent called EMPA (the Exploring, Modeling, and Planning Agent), which performs Bayesian inference to learn probabilistic generative models expressed as programs for a game-engine simulator, and runs internal simulations over these models to support efficient object-based, relational exploration and heuristic planning. EMPA closely matches human learning efficiency on a suite of 90 challenging Atari-style video games, learning new games in just minutes of game play and generalizing robustly to new game situations and new levels. The model also captures fine-grained structure in people’s exploration trajectories and learning dynamics. Its design and behavior suggest a way forward for building more general human-like AI systems.

Human-Level Reinforcement Learning through Theory-Based Modeling, Exploration, and Planning
pdf: https://t.co/6nDz0DhsUu
abs: https://t.co/4x0kAzHoaB

closely matches human learning efficiency on a suite of 90 Atari-style video games, learning new games in just minutes pic.twitter.com/kdapVulrmD
— AK (@ak92501) July 28, 2021

7. Language Grounding with 3D Objects

Jesse Thomason, Mohit Shridhar, Yonatan Bisk, Chris Paxton, Luke Zettlemoyer

retweets: 76, favorites: 67 (07/29/2021 10:09:51)
links: abs | pdf
cs.CL | cs.AI | cs.CV | cs.LG | cs.RO

Seemingly simple natural language requests to a robot are generally underspecified, for example “Can you bring me the wireless mouse?” When viewing mice on the shelf, the number of buttons or presence of a wire may not be visible from certain angles or positions. Flat images of candidate mice may not provide the discriminative information needed for “wireless”. The world, and objects in it, are not flat images but complex 3D shapes. If a human requests an object based on any of its basic properties, such as color, shape, or texture, robots should perform the necessary exploration to accomplish the task. In particular, while substantial effort and progress has been made on understanding explicitly visual attributes like color and category, comparatively little progress has been made on understanding language about shapes and contours. In this work, we introduce a novel reasoning task that targets both visual and non-visual language about 3D objects. Our new benchmark, ShapeNet Annotated with Referring Expressions (SNARE), requires a model to choose which of two objects is being referenced by a natural language description. We introduce several CLIP-based models for distinguishing objects and demonstrate that while recent advances in jointly modeling vision and language are useful for robotic language understanding, it is still the case that these models are weaker at understanding the 3D nature of objects — properties which play a key role in manipulation. In particular, we find that adding view estimation to language grounding models improves accuracy on both SNARE and when identifying objects referred to in language on a robot platform.

Robots can go beyond image-based grounding by looking at objects from multiple vantage points.
To study this ability, "Language Grounding with 3D Objects" presents the ShapeNet Annotated with Referring Expressions (SNARE) benchmarkhttps://t.co/qgspOJeb4m https://t.co/IRWvvIujt4 pic.twitter.com/ebRwSyJeFf
— Jesse Thomason (@_jessethomason_) July 28, 2021

Language Grounding with 3D Objects
pdf: https://t.co/0pnKXWu72R
abs: https://t.co/a5rnWkUnFk pic.twitter.com/5RQFhO7bns
— AK (@ak92501) July 28, 2021

8. Ensemble Learning For Mega Man Level Generation

Bowei Li, Ruohan Chen, Yuqing Xue, Ricky Wang, Wenwen Li, Matthew Guzdial

retweets: 30, favorites: 35 (07/29/2021 10:09:51)
links: abs | pdf
cs.LG

Procedural content generation via machine learning (PCGML) is the process of procedurally generating game content using models trained on existing game content. PCGML methods can struggle to capture the true variance present in underlying data with a single model. In this paper, we investigated the use of ensembles of Markov chains for procedurally generating \emph{Mega Man} levels. We conduct an initial investigation of our approach and evaluate it on measures of playability and stylistic similarity in comparison to a non-ensemble, existing Markov chain approach.

Ensemble Learning For Mega Man Level Generation
pdf: https://t.co/1VEL3UBQVI
abs: https://t.co/3IU5E4aq5W pic.twitter.com/mZHhhTQqZe
— AK (@ak92501) July 28, 2021

Danielle Rothermel, Margaret Li, Tim Rocktäschel, Jakob Foerster

retweets: 25, favorites: 36 (07/29/2021 10:09:52)
links: abs | pdf
cs.LG | cs.AI

Self-supervised pre-training of large-scale transformer models on text corpora followed by finetuning has achieved state-of-the-art on a number of natural language processing tasks. Recently, Lu et al. (2021, arXiv:2103.05247) claimed that frozen pretrained transformers (FPTs) match or outperform training from scratch as well as unfrozen (fine-tuned) pretrained transformers in a set of transfer tasks to other modalities. In our work, we find that this result is, in fact, an artifact of not tuning the learning rates. After carefully redesigning the empirical setup, we find that when tuning learning rates properly, pretrained transformers do outperform or match training from scratch in all of our tasks, but only as long as the entire model is finetuned. Thus, while transfer from pretrained language models to other modalities does indeed provide gains and hints at exciting possibilities for future work, properly tuning hyperparameters is important for arriving at robust findings.

Don’t Sweep your Learning Rate under the Rug: A Closer Look at Cross-modal Transfer of Pretrained Transformers
pdf: https://t.co/0OmBiMNeEN

show that, across a variety of tasks, the best results are obtained when finetuning all of the weights of a pretrained
model pic.twitter.com/VXWTebdcH5
— AK (@ak92501) July 28, 2021

Published 29 Jul 2021

ML Lead at Beatrust. (https://beatrust.com)Tatsuya Shirakawa on Twitter