1. Adaptive Risk Minimization: A Meta-Learning Approach for Tackling Group Shift
Marvin Zhang, Henrik Marklund, Abhishek Gupta, Sergey Levine, Chelsea Finn
A fundamental assumption of most machine learning algorithms is that the training and test data are drawn from the same underlying distribution. However, this assumption is violated in almost all practical applications: machine learning systems are regularly tested on data that are structurally different from the training set, either due to temporal correlations, particular end users, or other factors. In this work, we consider the setting where test examples are not drawn from the training distribution. Prior work has approached this problem by attempting to be robust to all possible test time distributions, which may degrade average performance, or by “peeking” at the test examples during training, which is not always feasible. In contrast, we propose to learn models that are adaptable, such that they can adapt to distribution shift at test time using a batch of unlabeled test data points. We acquire such models by learning to adapt to training batches sampled according to different sub-distributions, which simulate structural distribution shifts that may occur at test time. We introduce the problem of adaptive risk minimization (ARM), a formalization of this setting that lends itself to meta-learning methods. Compared to a variety of methods under the paradigms of empirical risk minimization and robust optimization, our approach provides substantial empirical gains on image classification problems in the presence of distribution shift.
Supervised ML methods (i.e. ERM) assume that train & test data are from the same distribution, & deteriorate when this assumption is broken.
— Chelsea Finn (@chelseabfinn) July 7, 2020
To help, we introduce adaptive risk minimization (ARM):https://t.co/y3l2KCmmiB
With M Zhang, H Marklund @abhishekunique7 @svlevine
(1/6)
2. SurVAE Flows: Surjections to Bridge the Gap between VAEs and Flows
Didrik Nielsen, Priyank Jaini, Emiel Hoogeboom, Ole Winther, Max Welling
Normalizing flows and variational autoencoders are powerful generative models that can represent complicated density functions. However, they both impose constraints on the models: Normalizing flows use bijective transformations to model densities whereas VAEs learn stochastic transformations that are non-invertible and thus typically do not provide tractable estimates of the marginal likelihood. In this paper, we introduce SurVAE Flows: A modular framework of composable transformations that encompasses VAEs and normalizing flows. SurVAE Flows bridge the gap between normalizing flows and VAEs with surjective transformations, wherein the transformations are deterministic in one direction — thereby allowing exact likelihood computation, and stochastic in the reverse direction — hence providing a lower bound on the corresponding likelihood. We show that several recently proposed methods, including dequantization and augmented normalizing flows, can be expressed as SurVAE Flows. Finally, we introduce common operations such as the max value, the absolute value, sorting and stochastic permutation as composable layers in SurVAE Flows.
Merging VAEs with Flows into SurVAE flows. Great work by Didrik Nielsen who was on a highly successful ELLIS exchange from Denmark TU at U. Amsterdam. (w/ E. Hoogeboom, P. Jaini, O. Winther).https://t.co/WBwOuzj3o3
— Max Welling (@wellingmax) July 7, 2020
1/9 Excited to present SurVAE Flows with the brilliant @nielsen_didrik , @emiel_hoogeboom, Ole Winther and @wellingmax that bridge the gap between Flows and VAEs using Surjections.
— Priyank Jaini (@priyankjaini) July 7, 2020
Paper: https://t.co/HH4FYcCpE7
Code: https://t.co/RZJ9sxsLEs
Thread below. pic.twitter.com/I5pgASaxJb
SurVAE Flowsは生成モデルの変換として全単射(正規化フロー)、確率的(VAE)以外に全射(離散化、最大値, ソート操作など)をサポートし、これらは逆変換、尤度貢献度の三つのI/Fさえ実装すれば自由に組み合わせることができ、近年提案された多くのモデルを特殊例として含むhttps://t.co/o5frcWBHun
— Daisuke Okanohara (@hillbig) July 7, 2020
3. BézierSketch: A generative model for scalable vector sketches
Ayan Das, Yongxin Yang, Timothy Hospedales, Tao Xiang, Yi-Zhe Song
The study of neural generative models of human sketches is a fascinating contemporary modeling problem due to the links between sketch image generation and the human drawing process. The landmark SketchRNN provided breakthrough by sequentially generating sketches as a sequence of waypoints. However this leads to low-resolution image generation, and failure to model long sketches. In this paper we present B’ezierSketch, a novel generative model for fully vector sketches that are automatically scalable and high-resolution. To this end, we first introduce a novel inverse graphics approach to stroke embedding that trains an encoder to embed each stroke to its best fit B’ezier curve. This enables us to treat sketches as short sequences of paramaterized strokes and thus train a recurrent sketch generator with greater capacity for longer sketches, while producing scalable high-resolution results. We report qualitative and quantitative results on the Quick, Draw! benchmark.
BézierSketch: A generative model for scalable vector sketches
— AK (@ak92501) July 7, 2020
pdf: https://t.co/rmEPsRv4ZH
abs: https://t.co/v2mOg2BZtp pic.twitter.com/wRM2IcnGfD
Nice results using control points for Bézier curves instead of pen stroke locations in a generative model applied on QuickDraw doodles.
— hardmaru (@hardmaru) July 7, 2020
Will be interesting to incorporate Bézier curve models as a prior for sketch-based models for pixel images. https://t.co/TuZvnqHAej #ECCV2020 https://t.co/CzTz2TRB5a pic.twitter.com/opX3Ptz5by
4. Meta-Learning through Hebbian Plasticity in Random Networks
Elias Najarro, Sebastian Risi
Lifelong learning and adaptability are two defining aspects of biological agents. Modern reinforcement learning (RL) approaches have shown significant progress in solving complex tasks, however once training is concluded, the found solutions are typically static and incapable of adapting to new information or perturbations. While it is still not completely understood how biological brains learn and adapt so efficiently from experience, it is believed that synaptic plasticity plays a prominent role in this process. Inspired by this biological mechanism, we propose a search method that, instead of optimizing the weight parameters of neural networks directly, only searches for synapse-specific Hebbian learning rules that allow the network to continuously self-organize its weights during the lifetime of the agent. We demonstrate our approach on several reinforcement learning tasks with different sensory modalities and more than 450K trainable plasticity parameters. We find that starting from completely random weights, the discovered Hebbian rules enable an agent to navigate a dynamical 2D-pixel environment; likewise they allow a simulated 3D quadrupedal robot to learn how to walk while adapting to different morphological damage in the absence of any explicit reward or error signal.
@enasmel and myself are excited to announce our paper "Meta-Learning through Hebbian Plasticity in Random Networks" https://t.co/UxUnRgOJRB
— Sebastian Risi (@risi1979) July 7, 2020
Instead of optimizing the neural network's weights directly, we only search for synapse-specific Hebbian learning rules. Thread 👇 pic.twitter.com/zDiZEUuKLL
5. Descent-to-Delete: Gradient-Based Methods for Machine Unlearning
Seth Neel, Aaron Roth, Saeed Sharifi-Malvajerdi
We study the data deletion problem for convex models. By leveraging techniques from convex optimization and reservoir sampling, we give the first data deletion algorithms that are able to handle an arbitrarily long sequence of adversarial updates while promising both per-deletion run-time and steady-state error that do not grow with the length of the update sequence. We also introduce several new conceptual distinctions: for example, we can ask that after a deletion, the entire state maintained by the optimization algorithm is statistically indistinguishable from the state that would have resulted had we retrained, or we can ask for the weaker condition that only the observable output is statistically indistinguishable from the observable output that would have resulted from retraining. We are able to give more efficient deletion algorithms under this weaker deletion criterion.
New Preprint out today with @aaroth, and Saeed Sharifi-Malvajerdi! “Descent-to-Delete: Gradient-Based Methods for Machine Unlearning” (https://t.co/QfdD14fpIk) Motivated by GDPR's "Right to be Forgotten" we study the problem of efficiently deleting user data from AI models (1/n)
— Seth Neel (@SethVNeel) July 7, 2020
6. A Unifying View of Optimism in Episodic Reinforcement Learning
Gergely Neu, Ciara Pike-Burke
The principle of optimism in the face of uncertainty underpins many theoretically successful reinforcement learning algorithms. In this paper we provide a general framework for designing, analyzing and implementing such algorithms in the episodic reinforcement learning problem. This framework is built upon Lagrangian duality, and demonstrates that every model-optimistic algorithm that constructs an optimistic MDP has an equivalent representation as a value-optimistic dynamic programming algorithm. Typically, it was thought that these two classes of algorithms were distinct, with model-optimistic algorithms benefiting from a cleaner probabilistic analysis while value-optimistic algorithms are easier to implement and thus more practical. With the framework developed in this paper, we show that it is possible to get the best of both worlds by providing a class of algorithms which have a computationally efficient dynamic-programming implementation and also a simple probabilistic analysis. Besides being able to capture many existing algorithms in the tabular setting, our framework can also address largescale problems under realizable function approximation, where it enables a simple model-based analysis of some recently proposed methods.
New EPIC paper with @CiaraPikeBurke finally online!
— Gergely Neu (@neu_rips) July 7, 2020
We provide a unifying framework for optimistic RL algorithms that formally shows how optimism in the model space is *equivalent* to optimism in the value space. Also works for linear FA.https://t.co/EPvyCZUbxY
THREAD👇
1/10 pic.twitter.com/HLkRSAdF5u
7. Scaling Imitation Learning in Minecraft
Artemij Amiranashvili, Nicolai Dorka, Wolfram Burgard, Vladlen Koltun, Thomas Brox
Imitation learning is a powerful family of techniques for learning sensorimotor coordination in immersive environments. We apply imitation learning to attain state-of-the-art performance on hard exploration problems in the Minecraft environment. We report experiments that highlight the influence of network architecture, loss function, and data augmentation. An early version of our approach reached second place in the MineRL competition at NeurIPS 2019. Here we report stronger results that can be used as a starting point for future competition entries and related research. Our code is available at https://github.com/amiranas/minerl_imitation_learning.
Scaling Imitation Learning in Minecraft
— AK (@ak92501) July 7, 2020
pdf: https://t.co/T8BZ6GKYCh
abs: https://t.co/H11XBX8JTW pic.twitter.com/9hWxYQpUnA
8. HoughNet: Integrating near and long-range evidence for bottom-up object detection
Nermin Samet, Samet Hicsonmez, Emre Akbas
This paper presents HoughNet, a one-stage, anchor-free, voting-based, bottom-up object detection method. Inspired by the Generalized Hough Transform, HoughNet determines the presence of an object at a certain location by the sum of the votes cast on that location. Votes are collected from both near and long-distance locations based on a log-polar vote field. Thanks to this voting mechanism, HoughNet is able to integrate both near and long-range, class-conditional evidence for visual recognition, thereby generalizing and enhancing current object detection methodology, which typically relies on only local evidence. On the COCO dataset, HoughNet achieves 46.4 AP (and 65.1 AP_50), performing on par with the state-of-the-art in bottom-up object detection and outperforming most major one-stage and two-stage methods. We further validate the effectiveness of our proposal in another task, namely, “labels to photo” image generation by integrating the voting module of HoughNet to two different GAN models and showing that the accuracy is significantly improved in both cases. Code is available at: https://github.com/nerminsamet/houghnet
New paper! "HoughNet: Integrating near and long-range evidence for bottom-up object detection" by Nermin Samet, Samet Hicsonmez and Emre Akbas, accepted to #ECCV2020. Paper: https://t.co/K4KfmbIVH9 Code: https://t.co/3MIXqy4YbV @nemka_ @giddyyupp @eakbas2 pic.twitter.com/1d3mY3BN4A
— METU ImageLab (@metu_imagelab) July 7, 2020
9. Finding Symmetry Breaking Order Parameters with Euclidean Neural Networks
Tess E. Smidt, Mario Geiger, Benjamin Kurt Miller
- retweets: 10, favorites: 43 (07/08/2020 09:39:25)
- links: abs | pdf
- cs.LG | cond-mat.dis-nn | physics.comp-ph
Curie’s principle states that “when effects show certain asymmetry, this asymmetry must be found in the causes that gave rise to them”. We demonstrate that symmetry equivariant neural networks uphold Curie’s principle and this property can be used to uncover symmetry breaking order parameters necessary to make input and output data symmetrically compatible. We prove these properties mathematically and demonstrate them numerically by training a Euclidean symmetry equivariant neural network to learn symmetry breaking input to deform a square into a rectangle.
New @arxiv pre-print "Finding Symmetry Breaking Order Parameters with Euclidean Neural Networks" https://t.co/8dvE5xZyCf
— Dr. Tess Smidt (@tesssmidt) July 7, 2020
If you like symmetry and want to see how much you can say about a neural network trained on one example, this is the pre-print for you! pic.twitter.com/2PcuCpMiXL