Hot Papers 2021-01-18

1. Counterfactual Generative Networks

Axel Sauer, Andreas Geiger

retweets: 576, favorites: 138 (01/19/2021 08:57:46)
links: abs | pdf
cs.LG | cs.CV

Neural networks are prone to learning shortcuts — they often model simple correlations, ignoring more complex ones that potentially generalize better. Prior works on image classification show that instead of learning a connection to object shape, deep classifiers tend to exploit spurious correlations with low-level texture or the background for solving the classification task. In this work, we take a step towards more robust and interpretable classifiers that explicitly expose the task’s causal structure. Building on current advances in deep generative modeling, we propose to decompose the image generation process into independent causal mechanisms that we train without direct supervision. By exploiting appropriate inductive biases, these mechanisms disentangle object shape, object texture, and background; hence, they allow for generating counterfactual images. We demonstrate the ability of our model to generate such images on MNIST and ImageNet. Further, we show that the counterfactual images can improve out-of-distribution robustness with a marginal drop in performance on the original classification task, despite being synthetic. Lastly, our generative model can be trained efficiently on a single GPU, exploiting common pre-trained models as inductive biases.

Counterfactual Generative Networks
pdf: https://t.co/VRRIIMoSIU
abs: https://t.co/QGdRnvsTXv pic.twitter.com/FZnHRSaoYI
— AK (@ak92501) January 18, 2021

2. Supervised Transfer Learning at Scale for Medical Imaging

Basil Mustafa, Aaron Loh, Jan Freyberg, Patricia MacWilliams, Alan Karthikesalingam, Neil Houlsby, Vivek Natarajan

retweets: 132, favorites: 53 (01/19/2021 08:57:46)
links: abs | pdf
cs.CV

Transfer learning is a standard technique to improve performance on tasks with limited data. However, for medical imaging, the value of transfer learning is less clear. This is likely due to the large domain mismatch between the usual natural-image pre-training (e.g. ImageNet) and medical images. However, recent advances in transfer learning have shown substantial improvements from scale. We investigate whether modern methods can change the fortune of transfer learning for medical imaging. For this, we study the class of large-scale pre-trained networks presented by Kolesnikov et al. on three diverse imaging tasks: chest radiography, mammography, and dermatology. We study both transfer performance and critical properties for the deployment in the medical domain, including: out-of-distribution generalization, data-efficiency, sub-group fairness, and uncertainty estimation. Interestingly, we find that for some of these properties transfer from natural to medical images is indeed extremely effective, but only when performed at sufficient scale.

New paper from our team @GoogleHealth/@GoogleAI (https://t.co/va8o7COLib) Pre-training at scale improves AI accuracy, generalisation + fairness in many medical imaging tasks: Chest X-Ray, Dermatology & Mammography! Led by @_basilM, @JanFreyberg, Aaron Loh, @neilhoulsby, @vivnat pic.twitter.com/zncpXLnKhD
— Alan Karthikesalingam (@alan_karthi) January 18, 2021

3. The Geometry of Deep Generative Image Models and its Applications

Binxu Wang, Carlos R. Ponce

retweets: 90, favorites: 65 (01/19/2021 08:57:47)
links: abs | pdf
cs.LG | cs.NE | math.NA

Generative adversarial networks (GANs) have emerged as a powerful unsupervised method to model the statistical patterns of real-world data sets, such as natural images. These networks are trained to map random inputs in their latent space to new samples representative of the learned data. However, the structure of the latent space is hard to intuit due to its high dimensionality and the non-linearity of the generator, which limits the usefulness of the models. Understanding the latent space requires a way to identify input codes for existing real-world images (inversion), and a way to identify directions with known image transformations (interpretability). Here, we use a geometric framework to address both issues simultaneously. We develop an architecture-agnostic method to compute the Riemannian metric of the image manifold created by GANs. The eigen-decomposition of the metric isolates axes that account for different levels of image variability. An empirical analysis of several pretrained GANs shows that image variation around each position is concentrated along surprisingly few major axes (the space is highly anisotropic) and the directions that create this large variation are similar at different positions in the space (the space is homogeneous). We show that many of the top eigenvectors correspond to interpretable transforms in the image space, with a substantial part of eigenspace corresponding to minor transforms which could be compressed out. This geometric understanding unifies key previous results related to GAN interpretability. We show that the use of this metric allows for more efficient optimization in the latent space (e.g. GAN inversion) and facilitates unsupervised discovery of interpretable axes. Our results illustrate that defining the geometry of the GAN image manifold can serve as a general framework for understanding GANs.

The Geometry of Deep Generative Image Models and its Applications
pdf: https://t.co/MXi8fsBOBa
abs: https://t.co/g8GL4Xtvj5 pic.twitter.com/hNTzVwy9jY
— AK (@ak92501) January 18, 2021

4. SimGAN: Hybrid Simulator Identification for Domain Adaptation via Adversarial Reinforcement Learning

Yifeng Jiang, Tingnan Zhang, Daniel Ho, Yunfei Bai, C. Karen Liu, Sergey Levine, Jie Tan

retweets: 100, favorites: 41 (01/19/2021 08:57:47)
links: abs | pdf
cs.RO

As learning-based approaches progress towards automating robot controllers design, transferring learned policies to new domains with different dynamics (e.g. sim-to-real transfer) still demands manual effort. This paper introduces SimGAN, a framework to tackle domain adaptation by identifying a hybrid physics simulator to match the simulated trajectories to the ones from the target domain, using a learned discriminative loss to address the limitations associated with manual loss design. Our hybrid simulator combines neural networks and traditional physics simulaton to balance expressiveness and generalizability, and alleviates the need for a carefully selected parameter set in System ID. Once the hybrid simulator is identified via adversarial reinforcement learning, it can be used to refine policies for the target domain, without the need to collect more data. We show that our approach outperforms multiple strong baselines on six robotic locomotion tasks for domain adaptation.

SimGAN: Hybrid Simulator Identification for Domain Adaptation via Adversarial Reinforcement Learning
pdf: https://t.co/rLjq2WgAwU
abs: https://t.co/CbAuEAapuC pic.twitter.com/EjQcP0rIUn
— AK (@ak92501) January 18, 2021

5. The Challenge of Value Alignment: from Fairer Algorithms to AI Safety

Iason Gabriel, Vafa Ghazavi

retweets: 72, favorites: 20 (01/19/2021 08:57:47)
links: abs | pdf
cs.CY

This paper addresses the question of how to align AI systems with human values and situates it within a wider body of thought regarding technology and value. Far from existing in a vacuum, there has long been an interest in the ability of technology to ‘lock-in’ different value systems. There has also been considerable thought about how to align technologies with specific social values, including through participatory design-processes. In this paper we look more closely at the question of AI value alignment and suggest that the power and autonomy of AI systems gives rise to opportunities and challenges in the domain of value that have not been encountered before. Drawing important continuities between the work of the fairness, accountability, transparency and ethics community, and work being done by technical AI safety researchers, we suggest that more attention needs to be paid to the question of ‘social value alignment’ - that is, how to align AI systems with the plurality of values endorsed by groups of people, especially on the global level.

✨Considerations of fair process and epistemic virtue point toward the need for a properly inclusive discussion around the ethics of AI alignment✨

Check out this new survey piece, co-authored with @GhazaviVD + forthcoming in an OUP volume @carissaveliz:https://t.co/SZXR85GBoC
— Iason Gabriel (@IasonGabriel) January 18, 2021

6. Nets with Mana: A Framework for Chemical Reaction Modelling

Fabrizio Romano Genovese, Fosco Loregian, Daniele Palombi

retweets: 30, favorites: 51 (01/19/2021 08:57:47)
links: abs | pdf
math.CT | cs.FL | q-bio.MN

We use categorical methods to define a new flavor of Petri nets which could be useful in modelling chemical reactions.

"Borrowing the terminology from the popular Turing machine Magic:The gathering [19, 6] we propose a possible solution to this problem by endowing transitions in a [Petri] net with mana"https://t.co/2acuWyfp6b

[19] is Wikipedia
[6] is arXiv:1904.09828
— theHigherGeometer (@HigherGeometer) January 18, 2021

7. Temporal-Relational CrossTransformers for Few-Shot Action Recognition

Toby Perrett, Alessandro Masullo, Tilo Burghardt, Majid Mirmehdi, Dima Damen

retweets: 27, favorites: 28 (01/19/2021 08:57:47)
links: abs | pdf
cs.CV

We propose a novel approach to few-shot action recognition, finding temporally-corresponding frame tuples between the query and videos in the support set. Distinct from previous few-shot action recognition works, we construct class prototypes using the CrossTransformer attention mechanism to observe relevant sub-sequences of all support videos, rather than using class averages or single best matches. Video representations are formed from ordered tuples of varying numbers of frames, which allows sub-sequences of actions at different speeds and temporal offsets to be compared. Our proposed Temporal-Relational CrossTransformers achieve state-of-the-art results on both Kinetics and Something-Something V2 (SSv2), outperforming prior work on SSv2 by a wide margin (6.8%) due to the method’s ability to model temporal relations. A detailed ablation showcases the importance of matching to multiple support set videos and learning higher-order relational CrossTransformers. Code is available at https://github.com/tobyperrett/trx

Temporal-Relational CrossTransformers for Few-Shot Action Recognition
pdf: https://t.co/H6qZrr5Fc9
abs: https://t.co/o0LqPP4agY
github: https://t.co/9hwa9nNWuM pic.twitter.com/BjXyFJneDZ
— AK (@ak92501) January 18, 2021

8. Reasoning over Vision and Language: Exploring the Benefits of Supplemental Knowledge

Violetta Shevchenko, Damien Teney, Anthony Dick, Anton van den Hengel

retweets: 20, favorites: 34 (01/19/2021 08:57:47)
links: abs | pdf
cs.CV | cs.LG

The limits of applicability of vision-and-language models are defined by the coverage of their training data. Tasks like vision question answering (VQA) often require commonsense and factual information beyond what can be learned from task-specific datasets. This paper investigates the injection of knowledge from general-purpose knowledge bases (KBs) into vision-and-language transformers. We use an auxiliary training objective that encourages the learned representations to align with graph embeddings of matching entities in a KB. We empirically study the relevance of various KBs to multiple tasks and benchmarks. The technique brings clear benefits to knowledge-demanding question answering tasks (OK-VQA, FVQA) by capturing semantic and relational knowledge absent from existing models. More surprisingly, the technique also benefits visual reasoning tasks (NLVR2, SNLI-VE). We perform probing experiments and show that the injection of additional knowledge regularizes the space of embeddings, which improves the representation of lexical and semantic similarities. The technique is model-agnostic and can expand the applicability of any vision-and-language transformer with minimal computational overhead.

Here's a new general-purpose technique to inject knowledge in transformers for vision and language. It helps with question answering but also (surprisingly) with visual reasoning, suggesting subtle regularization effects from the additional knowledge.https://t.co/Y2bW0aMjoc pic.twitter.com/o9DVQycRoo
— Damien Teney (@DamienTeney) January 18, 2021

Published 19 Jan 2021

ML Lead at Beatrust. (https://beatrust.com)Tatsuya Shirakawa on Twitter