1. Growing 3D Artefacts and Functional Machines with Neural Cellular Automata
Shyam Sudhakaran, Djordje Grbic, Siyan Li, Adam Katona, Elias Najarro, Claire Glanois, Sebastian Risi
Neural Cellular Automata (NCAs) have been proven effective in simulating morphogenetic processes, the continuous construction of complex structures from very few starting cells. Recent developments in NCAs lie in the 2D domain, namely reconstructing target images from a single pixel or infinitely growing 2D textures. In this work, we propose an extension of NCAs to 3D, utilizing 3D convolutions in the proposed neural network architecture. Minecraft is selected as the environment for our automaton since it allows the generation of both static structures and moving machines. We show that despite their simplicity, NCAs are capable of growing complex entities such as castles, apartment blocks, and trees, some of which are composed of over 3,000 blocks. Additionally, when trained for regeneration, the system is able to regrow parts of simple functional machines, significantly expanding the capabilities of simulated morphogenetic systems.
Growing 3D Artefacts and Functional Machines with Neural Cellular Automata
— hardmaru (@hardmaru) March 17, 2021
They use Neural CA to generate moving creatures in Minecraft. When they're cut into pieces, each piece has the ability to regenerate into a fully formed creature. Super cool work!https://t.co/Z1vjX0kvdt https://t.co/tXWe3x1bLo
Excited to share our work on Morphogenesis in Minecraft! We show that neural cellular automata can learn to grow not only complex 3D artifacts with over 3,000 blocks but also functional Minecraft machines that can regenerate when cut in half 🐛🔪=🐛🐛
— Sebastian Risi (@risi1979) March 17, 2021
PDF:https://t.co/hi573xzWIG pic.twitter.com/m19572pcIe
2. Back to the Feature: Learning Robust Camera Localization from Pixels to Pose
Paul-Edouard Sarlin, Ajaykumar Unagar, Måns Larsson, Hugo Germain, Carl Toft, Viktor Larsson, Marc Pollefeys, Vincent Lepetit, Lars Hammarstrand, Fredrik Kahl, Torsten Sattler
Camera pose estimation in known scenes is a 3D geometry task recently tackled by multiple learning algorithms. Many regress precise geometric quantities, like poses or 3D points, from an input image. This either fails to generalize to new viewpoints or ties the model parameters to a specific scene. In this paper, we go Back to the Feature: we argue that deep networks should focus on learning robust and invariant visual features, while the geometric estimation should be left to principled algorithms. We introduce PixLoc, a scene-agnostic neural network that estimates an accurate 6-DoF pose from an image and a 3D model. Our approach is based on the direct alignment of multiscale deep features, casting camera localization as metric learning. PixLoc learns strong data priors by end-to-end training from pixels to pose and exhibits exceptional generalization to new scenes by separating model parameters and scene geometry. The system can localize in large environments given coarse pose priors but also improve the accuracy of sparse feature matching by jointly refining keypoints and poses with little overhead. The code will be publicly available at https://github.com/cvg/pixloc.
Back to the Feature: Learning Robust Camera Localization from Pixels to Pose
— Tomasz Malisiewicz (@quantombone) March 17, 2021
This paper is based on the alignment of deep multiscale features and combines the best of deep learning with multiview geometry. Checkout those learned heatmaps!https://t.co/JQXvvSEdgF #computervision pic.twitter.com/NsRC36DK6t
Back to the Feature: Learning Robust Camera Localization from Pixels to Pose
— AK (@ak92501) March 17, 2021
pdf: https://t.co/Vj51BlgvY7
abs: https://t.co/jmOSZyNJr5 pic.twitter.com/eWlTtHsLk4
Back to the Feature: Learning Robust Camera Localization from Pixels to Pose@pesarlin + 10 authorshttps://t.co/7OEOyvfYhd
— Dmytro Mishkin (@ducha_aiki) March 17, 2021
Main idea: roughly good (R,t) is all you need. The rest can be obtained from the alignment of the deep features. pic.twitter.com/XkHhzGzTdw
3. Is it Enough to Optimize CNN Architectures on ImageNet?
Lukas Tuggener, Jürgen Schmidhuber, Thilo Stadelmann
An implicit but pervasive hypothesis of modern computer vision research is that convolutional neural network (CNN) architectures that perform better on ImageNet will also perform better on other vision datasets. We challenge this hypothesis through an extensive empirical study for which we train 500 sampled CNN architectures on ImageNet as well as 8 other image classification datasets from a wide array of application domains. The relationship between architecture and performance varies wildly, depending on the datasets. For some of them, the performance correlation with ImageNet is even negative. Clearly, it is not enough to optimize architectures solely for ImageNet when aiming for progress that is relevant for all applications. Therefore, we identify two dataset-specific performance indicators: the cumulative width across layers as well as the total depth of the network. Lastly, we show that the range of dataset variability covered by ImageNet can be significantly extended by adding ImageNet subsets restricted to few classes.
Is it Enough to Optimize CNN Architectures on ImageNet?
— Aran Komatsuzaki (@arankomatsuzaki) March 17, 2021
The relationship between architecture and performance varies wildly, depending on the datasets. For some of them the performance correlation with ImageNet is even negative.https://t.co/WLdQl3HprY pic.twitter.com/LooMfCIRY5
Is it Enough to Optimize CNN Architectures on ImageNet?
— AK (@ak92501) March 17, 2021
pdf: https://t.co/zC5jToLTto
abs: https://t.co/oIYWstLrIf pic.twitter.com/RrEHLNJEqa
4. Autonomous Drone Racing with Deep Reinforcement Learning
Yunlong Song, Mats Steinweg, Elia Kaufmann, Davide Scaramuzza
In many robotic tasks, such as drone racing, the goal is to travel through a set of waypoints as fast as possible. A key challenge for this task is planning the minimum-time trajectory, which is typically solved by assuming perfect knowledge of the waypoints to pass in advance. The resulting solutions are either highly specialized for a single-track layout, or suboptimal due to simplifying assumptions about the platform dynamics. In this work, a new approach to minimum-time trajectory generation for quadrotors is presented. Leveraging deep reinforcement learning and relative gate observations, this approach can adaptively compute near-time-optimal trajectories for random track layouts. Our method exhibits a significant computational advantage over approaches based on trajectory optimization for non-trivial track configurations. The proposed approach is evaluated on a set of race tracks in simulation and the real world, achieving speeds of up to 17 m/s with a physical quadrotor.
Autonomous Drone Racing with Deep Reinforcement Learning
— AK (@ak92501) March 17, 2021
pdf: https://t.co/sZeU9pcRTL
abs: https://t.co/rnHLQPEoPw pic.twitter.com/ZnyhUapzca
5. Deep learning: a statistical viewpoint
Peter L. Bartlett, Andrea Montanari, Alexander Rakhlin
The remarkable practical success of deep learning has revealed some major surprises from a theoretical perspective. In particular, simple gradient methods easily find near-optimal solutions to non-convex optimization problems, and despite giving a near-perfect fit to training data without any explicit effort to control model complexity, these methods exhibit excellent predictive accuracy. We conjecture that specific principles underlie these phenomena: that overparametrization allows gradient methods to find interpolating solutions, that these methods implicitly impose regularization, and that overparametrization leads to benign overfitting. We survey recent theoretical progress that provides examples illustrating these principles in simpler settings. We first review classical uniform convergence results and why they fall short of explaining aspects of the behavior of deep learning methods. We give examples of implicit regularization in simple settings, where gradient methods lead to minimal norm functions that perfectly fit the training data. Then we review prediction methods that exhibit benign overfitting, focusing on regression problems with quadratic loss. For these methods, we can decompose the prediction rule into a simple component that is useful for prediction and a spiky component that is useful for overfitting but, in a favorable setting, does not harm prediction accuracy. We focus specifically on the linear regime for neural networks, where the network can be approximated by a linear model. In this regime, we demonstrate the success of gradient flow, and we consider benign overfitting with two-layer networks, giving an exact asymptotic analysis that precisely demonstrates the impact of overparametrization. We conclude by highlighting the key challenges that arise in extending these insights to realistic deep learning settings.
Peter L. Bartlett, Andrea Montanari, Alexander Rakhlin: Deep learning: a statistical viewpoint https://t.co/0CNfVp4BcI https://t.co/lBmGKZEygZ
— arXiv math.ST Statistics Theory (@mathSTb) March 17, 2021
6. Fast Development of ASR in African Languages using Self Supervised Speech Representation Learning
Jama Hussein Mohamud, Lloyd Acquaye Thompson, Aissatou Ndoye, Laurent Besacier
This paper describes the results of an informal collaboration launched during the African Master of Machine Intelligence (AMMI) in June 2020. After a series of lectures and labs on speech data collection using mobile applications and on self-supervised representation learning from speech, a small group of students and the lecturer continued working on automatic speech recognition (ASR) project for three languages: Wolof, Ga, and Somali. This paper describes how data was collected and ASR systems developed with a small amount (1h) of transcribed speech as training data. In these low resource conditions, pre-training a model on large amounts of raw speech was fundamental for the efficiency of ASR systems developed.
An informal collab. during 2020 African Master of Machine Intelligence (AMMI) lead to this paper on ASR for Wolof, Ga, and Somali. We leveraged self supervised speech representation learning to train with only 1h of transcribed speechhttps://t.co/Wxa2PUM0N2
— laurent besacier (@laurent_besacie) March 17, 2021
7. A Systematic Literature Review and Taxonomy of Modern Code Review
Nicole Davila, Ingrid Nunes
Modern Code Review (MCR) is a widely known practice of software quality assurance. However, the existing body of knowledge of MCR is currently not understood as a whole. Objective: Our goal is to identify the state of the art on MCR, providing a structured overview and an in-depth analysis of the research done in this field. Method: We performed a systematic literature review, selecting publications from four digital libraries. Results: A total of 139 papers were selected and analyzed in three main categories. Foundational studies are those that analyze existing or collected data from the adoption of MCR. Proposals consist of techniques and tools to support MCR, while evaluations are studies to assess an approach or compare a set of them. Conclusion: The most represented category is foundational studies, mainly aiming to understand the motivations for adopting MCR, its challenges and benefits, and which influence factors lead to which MCR outcomes. The most common types of proposals are code reviewer recommender and support to code checking. Evaluations of MCR-supporting approaches have been done mostly offline, without involving human subjects. Five main research gaps have been identified, which point out directions for future work in the area.
Lots of papers on #CodeReview were published. But what do we know? What makes code review effective? What kind of support does science provide? See the answers to these questions in our paper (with @nicoleNCD) accepted by @JSSoftware:
— Ingrid Nunes (@ingridnunesIN) March 17, 2021
Preprint: https://t.co/O41QffvKwx
8. Spatial Dependency Networks: Neural Layers for Improved Generative Image Modeling
Đorđe Miladinović, Aleksandar Stanić, Stefan Bauer, Jürgen Schmidhuber, Joachim M. Buhmann
How to improve generative modeling by better exploiting spatial regularities and coherence in images? We introduce a novel neural network for building image generators (decoders) and apply it to variational autoencoders (VAEs). In our spatial dependency networks (SDNs), feature maps at each level of a deep neural net are computed in a spatially coherent way, using a sequential gating-based mechanism that distributes contextual information across 2-D space. We show that augmenting the decoder of a hierarchical VAE by spatial dependency layers considerably improves density estimation over baseline convolutional architectures and the state-of-the-art among the models within the same class. Furthermore, we demonstrate that SDN can be applied to large images by synthesizing samples of high quality and coherence. In a vanilla VAE setting, we find that a powerful SDN decoder also improves learning disentangled representations, indicating that neural architectures play an important role in this task. Our results suggest favoring spatial dependency over convolutional layers in various VAE settings. The accompanying source code is given at https://github.com/djordjemila/sdn.
Spatial Dependency Networks: Neural Layers for Improved Generative Image Modeling
— AK (@ak92501) March 17, 2021
pdf: https://t.co/vo8UnnmneG
abs: https://t.co/KxGX5CsgTN
github: https://t.co/YxzwQpTTQo pic.twitter.com/jyzybfvEfb
Spatial Dependency Networks: Neural Layers for Improved Generative Image Modeling
— Aran Komatsuzaki (@arankomatsuzaki) March 17, 2021
Achieves nearly SotA NLL in image modeling by augmenting the decoder of a hierarchical VAE by spatial dependency layers.
abs: https://t.co/nsg2DgR7pd
github: https://t.co/CD5hB7O28g
9. Understanding the Representation and Representativeness of Age in AI Data Sets
Joon Sung Park, Michael S. Bernstein, Robin N. Brewer, Ece Kamar, Meredith Ringel Morris
A diverse representation of different demographic groups in AI training data sets is important in ensuring that the models will work for a large range of users. To this end, recent efforts in AI fairness and inclusion have advocated for creating AI data sets that are well-balanced across race, gender, socioeconomic status, and disability status. In this paper, we contribute to this line of work by focusing on the representation of age by asking whether older adults are represented proportionally to the population at large in AI data sets. We examine publicly-available information about 92 face data sets to understand how they codify age as a case study to investigate how the subjects’ ages are recorded and whether older generations are represented. We find that older adults are very under-represented; five data sets in the study that explicitly documented the closed age intervals of their subjects included older adults (defined as older than 65 years), while only one included oldest-old adults (defined as older than 85 years). Additionally, we find that only 24 of the data sets include any age-related information in their documentation or metadata, and that there is no consistent method followed across these data sets to collect and record the subjects’ ages. We recognize the unique difficulties in creating representative data sets in terms of age, but raise it as an important dimension that researchers and engineers interested in inclusive AI should consider.
Posted a new arXiv pre-print "Understanding the Representation and Representativeness of Age in #AI Data Sets" w/ @joon_s_pk @msbernst @_rnbrewer @ecekamar - https://t.co/EQLDDLTmKX
— Meredith Ringel Morris (@merrierm) March 17, 2021
10. Flow-based Self-supervised Density Estimation for Anomalous Sound Detection
Kota Dohi, Takashi Endo, Harsh Purohit, Ryo Tanabe, Yohei Kawaguchi
- retweets: 62, favorites: 19 (03/18/2021 09:55:41)
- links: abs | pdf
- eess.AS | cs.LG | cs.SD | stat.ML
To develop a machine sound monitoring system, a method for detecting anomalous sound is proposed. Exact likelihood estimation using Normalizing Flows is a promising technique for unsupervised anomaly detection, but it can fail at out-of-distribution detection since the likelihood is affected by the smoothness of the data. To improve the detection performance, we train the model to assign higher likelihood to target machine sounds and lower likelihood to sounds from other machines of the same machine type. We demonstrate that this enables the model to incorporate a self-supervised classification-based approach. Experiments conducted using the DCASE 2020 Challenge Task2 dataset showed that the proposed method improves the AUC by 4.6% on average when using Masked Autoregressive Flow (MAF) and by 5.8% when using Glow, which is a significant improvement over the previous method.