1. Accounting for Variance in Machine Learning Benchmarks
Xavier Bouthillier, Pierre Delaunay, Mirko Bronzi, Assya Trofimov, Brennan Nichyporuk, Justin Szeto, Naz Sepah, Edward Raff, Kanika Madan, Vikram Voleti, Samira Ebrahimi Kahou, Vincent Michalski, Dmitriy Serdyuk, Tal Arbel, Chris Pal, Gaël Varoquaux, Pascal Vincent
Strong empirical evidence that one machine-learning algorithm A outperforms another one B ideally calls for multiple trials optimizing the learning pipeline over sources of variation such as data sampling, data augmentation, parameter initialization, and hyperparameters choices. This is prohibitively expensive, and corners are cut to reach conclusions. We model the whole benchmarking process, revealing that variance due to data sampling, parameter initialization and hyperparameter choice impact markedly the results. We analyze the predominant comparison methods used today in the light of this variance. We show a counter-intuitive result that adding more sources of variation to an imperfect estimator approaches better the ideal estimator at a 51 times reduction in compute cost. Building on these results, we study the error rate of detecting improvements, on five different deep-learning tasks/architectures. This study leads us to propose recommendations for performance comparisons.
New preprint: Accounting for Variance in Machine Learning Benchmarkshttps://t.co/iu5QuMWzw8
— Gael Varoquaux (@GaelVaroquaux) March 5, 2021
Lead by @bouthilx and @Mila_Quebec friends
We show that ML benchmarks contain multiple sources of uncontrolled variation, not only inits. We propose procedure for reliable conclusion 1/8
2. Out of Distribution Generalization in Machine Learning
Martin Arjovsky
Machine learning has achieved tremendous success in a variety of domains in recent years. However, a lot of these success stories have been in places where the training and the testing distributions are extremely similar to each other. In everyday situations when models are tested in slightly different data than they were trained on, ML algorithms can fail spectacularly. This research attempts to formally define this problem, what sets of assumptions are reasonable to make in our data and what kind of guarantees we hope to obtain from them. Then, we focus on a certain class of out of distribution problems, their assumptions, and introduce simple algorithms that follow from these assumptions that are able to provide more reliable generalization. A central topic in the thesis is the strong link between discovering the causal structure of the data, finding features that are reliable (when using them to predict) regardless of their context, and out of distribution generalization.
Out of Distribution Generalization in Machine Learning
— Aran Komatsuzaki (@arankomatsuzaki) March 5, 2021
Martin Arjovsky's PhD thesis to review, contextualize, and clarify the current knowledge in out of distribution generalization.https://t.co/BQojqTcbPy
3. Perceiver: General Perception with Iterative Attention
Andrew Jaegle, Felix Gimeno, Andrew Brock, Andrew Zisserman, Oriol Vinyals, Joao Carreira
- retweets: 857, favorites: 251 (03/06/2021 15:15:58)
- links: abs | pdf
- cs.CV | cs.AI | cs.LG | cs.SD | eess.AS
Biological systems understand the world by simultaneously processing high-dimensional inputs from modalities as diverse as vision, audition, touch, proprioception, etc. The perception models used in deep learning on the other hand are designed for individual modalities, often relying on domain-specific assumptions such as the local grid structures exploited by virtually all existing vision models. These priors introduce helpful inductive biases, but also lock models to individual modalities. In this paper we introduce the Perceiver - a model that builds upon Transformers and hence makes few architectural assumptions about the relationship between its inputs, but that also scales to hundreds of thousands of inputs, like ConvNets. The model leverages an asymmetric attention mechanism to iteratively distill inputs into a tight latent bottleneck, allowing it to scale to handle very large inputs. We show that this architecture performs competitively or beyond strong, specialized models on classification tasks across various modalities: images, point clouds, audio, video and video+audio. The Perceiver obtains performance comparable to ResNet-50 on ImageNet without convolutions and by directly attending to 50,000 pixels. It also surpasses state-of-the-art results for all modalities in AudioSet.
Perceiver: General Perception with Iterative Attention
— Aran Komatsuzaki (@arankomatsuzaki) March 5, 2021
- Competitive perf on classification tasks across various modalities: images, audio and video.
- Obtains perf comparable to ResNet on ImageNet w/o convs and by directly attending to 50,000 pixels. https://t.co/I8qpvFH2Z8 pic.twitter.com/Lgf7Khfr08
Perceiver: General Perception with Iterative Attention https://t.co/OHfgdlA0CD
— ワクワクさん(ミジンコ) (@mosko_mule) March 5, 2021
隠れベクトルと生の信号をcross attentionすることで高次元データをそのまま扱えるTransformer派生。画像、ビデオ、音声、ポイントクラウドなどを同じ構造で扱うことができる。ImageNetでResNet50並 pic.twitter.com/KuHcr8cKjw
Perceiver: General Perception with Iterative Attention
— AK (@ak92501) March 5, 2021
pdf: https://t.co/1EACnTm1Sj
abs: https://t.co/JWverNYmDR pic.twitter.com/cQRsBIcNWm
4. Improving Computational Efficiency in Visual Reinforcement Learning via Stored Embeddings
Lili Chen, Kimin Lee, Aravind Srinivas, Pieter Abbeel
Recent advances in off-policy deep reinforcement learning (RL) have led to impressive success in complex tasks from visual observations. Experience replay improves sample-efficiency by reusing experiences from the past, and convolutional neural networks (CNNs) process high-dimensional inputs effectively. However, such techniques demand high memory and computational bandwidth. In this paper, we present Stored Embeddings for Efficient Reinforcement Learning (SEER), a simple modification of existing off-policy RL methods, to address these computational and memory requirements. To reduce the computational overhead of gradient updates in CNNs, we freeze the lower layers of CNN encoders early in training due to early convergence of their parameters. Additionally, we reduce memory requirements by storing the low-dimensional latent vectors for experience replay instead of high-dimensional images, enabling an adaptive increase in the replay buffer capacity, a useful technique in constrained-memory settings. In our experiments, we show that SEER does not degrade the performance of RL agents while significantly saving computation and memory across a diverse set of DeepMind Control environments and Atari games. Finally, we show that SEER is useful for computation-efficient transfer learning in RL because lower layers of CNNs extract generalizable features, which can be used for different tasks and domains.
New paper, SEER, improving both compute and memory efficiency of pixel-based RL.
— Aravind (@AravSrinivas) March 5, 2021
Using two simple ideas:
(1) Freeze lower layers of CNN encoders early on in training; (2) Store latents in replay buffer instead of pixels.
🎓https://t.co/j4kDnNhO0N
💻https://t.co/8Wxi3BaqGf pic.twitter.com/F97qooStgT
Two ML papers with methods called "SEER" in 24 hours - self-supervised image recognition from Facebook: https://t.co/e65Jr00hVH
— Miles Brundage (@Miles_Brundage) March 5, 2021
And more efficient visual RL from Berkeley: https://t.co/2TzzO04N9K
Improving Computational Efficiency in Visual Reinforcement Learning via Stored Embeddings
— Aran Komatsuzaki (@arankomatsuzaki) March 5, 2021
SEER saves significant computation and memory across DeepMind Control environments and Atari games without degrading the performance.https://t.co/37vJ1Otg7i pic.twitter.com/noNhexMVYe
5. Catala: A Programming Language for the Law
Denis Merigoux, Nicolas Chataing, Jonathan Protzenko
Law at large underpins modern society, codifying and governing many aspects of citizens’ daily lives. Oftentimes, law is subject to interpretation, debate and challenges throughout various courts and jurisdictions. But in some other areas, law leaves little room for interpretation, and essentially aims to rigorously describe a computation, a decision procedure or, simply said, an algorithm. Unfortunately, prose remains a woefully inadequate tool for the job. The lack of formalism leaves room for ambiguities; the structure of legal statutes, with many paragraphs and sub-sections spread across multiple pages, makes it hard to compute the intended outcome of the algorithm underlying a given text; and, as with any other piece of poorly-specified critical software, the use of informal language leaves corner cases unaddressed. We introduce Catala, a new programming language that we specifically designed to allow a straightforward and systematic translation of statutory law into an executable implementation. Catala aims to bring together lawyers and programmers through a shared medium, which together they can understand, edit and evolve, bridging a gap that often results in dramatically incorrect implementations of the law. We have implemented a compiler for Catala, and have proven the correctness of its core compilation steps using the F* proof assistant. We evaluate Catala on several legal texts that are algorithms in disguise, notably section 121 of the US federal income tax and the byzantine French family benefits; in doing so, we uncover a bug in the official implementation. We observe as a consequence of the formalization process that using Catala enables rich interactions between lawyers and programmers, leading to a greater understanding of the original legislative intent, while producing a correct-by-construction executable specification reusable by the greater software ecosystem.
Catala: a programming language for the law https://t.co/Qd9ZhwPFNH
— ly(s)xia (@lysxia) March 5, 2021
"We evaluate Catala on several legal texts (...), notably section 121 of the US federal income tax and the byzantine French family benefits; in doing so, we uncover a bug in the official implementation."
What is the future of legal expert systems? How to make sure taxes are computed correctly? Can we efficiently translate law into #rulesascode?
— Denis Merigoux (@DMerigoux) March 5, 2021
New paper with @NChataing and @_protz_:
➡️ https://t.co/TMkXbpy0s4
📖 https://t.co/WBkLp3mmz5
🚀 https://t.co/wH1usUT9Cm pic.twitter.com/Gs8qp9xRjA
6. GenoML: Automated Machine Learning for Genomics
Mary B. Makarious, Hampton L. Leonard, Dan Vitale, Hirotaka Iwaki, David Saffo, Lana Sargent, Anant Dadu, Eduardo Salmerón Castaño, John F. Carter, Melina Maleknia, Juan A. Botia, Cornelis Blauwendraat, Roy H. Campbell, Sayed Hadi Hashemi, Andrew B. Singleton, Mike A. Nalls, Faraz Faghri
GenoML is a Python package automating machine learning workflows for genomics (genetics and multi-omics) with an open science philosophy. Genomics data require significant domain expertise to clean, pre-process, harmonize and perform quality control of the data. Furthermore, tuning, validation, and interpretation involve taking into account the biology and possibly the limitations of the underlying data collection, protocols, and technology. GenoML’s mission is to bring machine learning for genomics and clinical data to non-experts by developing an easy-to-use tool that automates the full development, evaluation, and deployment process. Emphasis is put on open science to make workflows easily accessible, replicable, and transferable within the scientific community. Source code and documentation is available at https://genoml.com.
GenoML: Automated Machine Learning for Genomics
— AK (@ak92501) March 5, 2021
pdf: https://t.co/E4NNpz9g4R
abs: https://t.co/1CfAufCsZY
project page: https://t.co/5NRkDdePpf pic.twitter.com/mCLtAeAHyT
7. Anycost GANs for Interactive Image Synthesis and Editing
Ji Lin, Richard Zhang, Frieder Ganz, Song Han, Jun-Yan Zhu
Generative adversarial networks (GANs) have enabled photorealistic image synthesis and editing. However, due to the high computational cost of large-scale generators (e.g., StyleGAN2), it usually takes seconds to see the results of a single edit on edge devices, prohibiting interactive user experience. In this paper, we take inspirations from modern rendering software and propose Anycost GAN for interactive natural image editing. We train the Anycost GAN to support elastic resolutions and channels for faster image generation at versatile speeds. Running subsets of the full generator produce outputs that are perceptually similar to the full generator, making them a good proxy for preview. By using sampling-based multi-resolution training, adaptive-channel training, and a generator-conditioned discriminator, the anycost generator can be evaluated at various configurations while achieving better image quality compared to separately trained models. Furthermore, we develop new encoder training and latent code optimization techniques to encourage consistency between the different sub-generators during image projection. Anycost GAN can be executed at various cost budgets (up to 10x computation reduction) and adapt to a wide range of hardware and latency requirements. When deployed on desktop CPUs and edge devices, our model can provide perceptually similar previews at 6-12x speedup, enabling interactive image editing. The code and demo are publicly available: https://github.com/mit-han-lab/anycost-gan.
Anycost GANs for Interactive Image Synthesis and Editing
— AK (@ak92501) March 5, 2021
pdf: https://t.co/i81p9MRaj3
abs: https://t.co/2hDmFy9OTG
github: https://t.co/W6PHWlLWX5
project page: https://t.co/mFoPI7PI1B pic.twitter.com/0hJdqHqlXf
8. Self-supervised Geometric Perception
Heng Yang, Wei Dong, Luca Carlone, Vladlen Koltun
We present self-supervised geometric perception (SGP), the first general framework to learn a feature descriptor for correspondence matching without any ground-truth geometric model labels (e.g., camera poses, rigid transformations). Our first contribution is to formulate geometric perception as an optimization problem that jointly optimizes the feature descriptor and the geometric models given a large corpus of visual measurements (e.g., images, point clouds). Under this optimization formulation, we show that two important streams of research in vision, namely robust model fitting and deep feature learning, correspond to optimizing one block of the unknown variables while fixing the other block. This analysis naturally leads to our second contribution — the SGP algorithm that performs alternating minimization to solve the joint optimization. SGP iteratively executes two meta-algorithms: a teacher that performs robust model fitting given learned features to generate geometric pseudo-labels, and a student that performs deep feature learning under noisy supervision of the pseudo-labels. As a third contribution, we apply SGP to two perception problems on large-scale real datasets, namely relative camera pose estimation on MegaDepth and point cloud registration on 3DMatch. We demonstrate that SGP achieves state-of-the-art performance that is on-par or superior to the supervised oracles trained using ground-truth labels.
Self-supervised Geometric Perception
— AK (@ak92501) March 5, 2021
pdf: https://t.co/fi3XAQLhCM
abs: https://t.co/vLjCN1l2NO pic.twitter.com/cJuEDeqntk
Self-supervised Geometric Perception, with W. Dong, @lucacarlone1, V. Koltun, is accepted as #CVPR2021 Oral. Appreciated the discussion w/ @ducha_aiki on difference b/w SGP and reconstruction-based supervised learning. Check out future research on Page 8:https://t.co/1oSejKamCA pic.twitter.com/07eq2WqLl8
— Heng Yang (@hankyang94) March 5, 2021
Self-supervised Geometric Perception@hankyang94, @wdong397, @lucacarlone1, Vladlen Koltun
— Dmytro Mishkin (@ducha_aiki) March 5, 2021
main idea: train CAPS by @Jimantha with Es generated by RANSAC. Use SIFT descs for 1st iteration, then use learned descriptor.https://t.co/FZfFf2gB8b
Short review: in the thread pic.twitter.com/46oTwcrG4i
9. COIN: COmpression with Implicit Neural representations
Emilien Dupont, Adam Goliński, Milad Alizadeh, Yee Whye Teh, Arnaud Doucet
We propose a new simple approach for image compression: instead of storing the RGB values for each pixel of an image, we store the weights of a neural network overfitted to the image. Specifically, to encode an image, we fit it with an MLP which maps pixel locations to RGB values. We then quantize and store the weights of this MLP as a code for the image. To decode the image, we simply evaluate the MLP at every pixel location. We found that this simple approach outperforms JPEG at low bit-rates, even without entropy coding or learning a distribution over weights. While our framework is not yet competitive with state of the art compression methods, we show that it has various attractive properties which could make it a viable alternative to other neural data compression approaches.
Galaxy brain compression method: "instead of storing the RGB values for each pixel..., we store the weights of a neural network overfitted to the image. ... not yet competitive with state of the art compression methods [but] various attractive properties": https://t.co/LP56NeztFY
— Miles Brundage (@Miles_Brundage) March 5, 2021
10. Enhanced 3D Human Pose Estimation from Videos by using Attention-Based Neural Network with Dilated Convolutions
Ruixu Liu, Ju Shen, He Wang, Chen Chen, Sen-ching Cheung, Vijayan K. Asari
The attention mechanism provides a sequential prediction framework for learning spatial models with enhanced implicit temporal consistency. In this work, we show a systematic design (from 2D to 3D) for how conventional networks and other forms of constraints can be incorporated into the attention framework for learning long-range dependencies for the task of pose estimation. The contribution of this paper is to provide a systematic approach for designing and training of attention-based models for the end-to-end pose estimation, with the flexibility and scalability of arbitrary video sequences as input. We achieve this by adapting temporal receptive field via a multi-scale structure of dilated convolutions. Besides, the proposed architecture can be easily adapted to a causal model enabling real-time performance. Any off-the-shelf 2D pose estimation systems, e.g. Mocap libraries, can be easily integrated in an ad-hoc fashion. Our method achieves the state-of-the-art performance and outperforms existing methods by reducing the mean per joint position error to 33.4 mm on Human3.6M dataset.
Enhanced 3D Human Pose Estimation from Videos by using Attention-Based Neural Network with Dilated Convolutions
— AK (@ak92501) March 5, 2021
pdf: https://t.co/uUqyjvGdG6
abs: https://t.co/J2f1jHUimX
github: https://t.co/KVYpcQtHcR pic.twitter.com/Bje0eBS7yu
11. Data Augmentation for Object Detection via Differentiable Neural Rendering
Guanghan Ning, Guang Chen, Chaowei Tan, Si Luo, Liefeng Bo, Heng Huang
It is challenging to train a robust object detector when annotated data is scarce. Existing approaches to tackle this problem include semi-supervised learning that interpolates labeled data from unlabeled data, self-supervised learning that exploit signals within unlabeled data via pretext tasks. Without changing the supervised learning paradigm, we introduce an offline data augmentation method for object detection, which semantically interpolates the training data with novel views. Specifically, our proposed system generates controllable views of training images based on differentiable neural rendering, together with corresponding bounding box annotations which involve no human intervention. Firstly, we extract and project pixel-aligned image features into point clouds while estimating depth maps. We then re-project them with a target camera pose and render a novel-view 2d image. Objects in the form of keypoints are marked in point clouds to recover annotations in new views. It is fully compatible with online data augmentation methods, such as affine transform, image mixup, etc. Extensive experiments show that our method, as a cost-free tool to enrich images and labels, can significantly boost the performance of object detection systems with scarce training data. Code is available at \url{https://github.com/Guanghan/DANR}.
Data Augmentation for Object Detection via Differentiable Neural Rendering
— AK (@ak92501) March 5, 2021
pdf: https://t.co/DbG4JX1tM2
abs: https://t.co/OCspgEedYe pic.twitter.com/Dhts6ym93t
12. MOGAN: Morphologic-structure-aware Generative Learning from a Single Image
Jinshu Chen, Qihui Xu, Qi Kang, MengChu Zhou
In most interactive image generation tasks, given regions of interest (ROI) by users, the generated results are expected to have adequate diversities in appearance while maintaining correct and reasonable structures in original images. Such tasks become more challenging if only limited data is available. Recently proposed generative models complete training based on only one image. They pay much attention to the monolithic feature of the sample while ignoring the actual semantic information of different objects inside the sample. As a result, for ROI-based generation tasks, they may produce inappropriate samples with excessive randomicity and without maintaining the related objects’ correct structures. To address this issue, this work introduces a MOrphologic-structure-aware Generative Adversarial Network named MOGAN that produces random samples with diverse appearances and reliable structures based on only one image. For training for ROI, we propose to utilize the data coming from the original image being augmented and bring in a novel module to transform such augmented data into knowledge containing both structures and appearances, thus enhancing the model’s comprehension of the sample. To learn the rest areas other than ROI, we employ binary masks to ensure the generation isolated from ROI. Finally, we set parallel and hierarchical branches of the mentioned learning process. Compared with other single image GAN schemes, our approach focuses on internal features including the maintenance of rational structures and variation on appearance. Experiments confirm a better capacity of our model on ROI-based image generation tasks than its competitive peers.
MOGAN: Morphologic-structure-aware Generative Learning from a Single Image
— AK (@ak92501) March 5, 2021
pdf: https://t.co/IDS2mAA8vb
abs: https://t.co/6in9hRE5CO pic.twitter.com/XE5WaNmxSU
13. Continuous Coordination As a Realistic Scenario for Lifelong Learning
Hadi Nekoei, Akilesh Badrinaaraayanan, Aaron Courville, Sarath Chandar
Current deep reinforcement learning (RL) algorithms are still highly task-specific and lack the ability to generalize to new environments. Lifelong learning (LLL), however, aims at solving multiple tasks sequentially by efficiently transferring and using knowledge between tasks. Despite a surge of interest in lifelong RL in recent years, the lack of a realistic testbed makes robust evaluation of LLL algorithms difficult. Multi-agent RL (MARL), on the other hand, can be seen as a natural scenario for lifelong RL due to its inherent non-stationarity, since the agents’ policies change over time. In this work, we introduce a multi-agent lifelong learning testbed that supports both zero-shot and few-shot settings. Our setup is based on Hanabi — a partially-observable, fully cooperative multi-agent game that has been shown to be challenging for zero-shot coordination. Its large strategy space makes it a desirable environment for lifelong RL tasks. We evaluate several recent MARL methods, and benchmark state-of-the-art LLL algorithms in limited memory and computation regimes to shed light on their strengths and weaknesses. This continual learning paradigm also provides us with a pragmatic way of going beyond centralized training which is the most commonly used training protocol in MARL. We empirically show that the agents trained in our setup are able to coordinate well with unseen agents, without any additional assumptions made by previous works.
Are you tired of manually creating new tasks for Lifelong RL? We introduce Lifelong Hanabi in which every task is coordinating with a partner that's an expert player of Hanabi. Work led by @HadiNekoei and @akileshbadri.
— sarath chandar (@apsarathchandar) March 5, 2021
paper: https://t.co/8b4Gk3MlOI @Mila_Quebec 1/n pic.twitter.com/LmQBrpyFxC