1. Underspecification Presents Challenges for Credibility in Modern Machine Learning
Alexander D’Amour, Katherine Heller, Dan Moldovan, Ben Adlam, Babak Alipanahi, Alex Beutel, Christina Chen, Jonathan Deaton, Jacob Eisenstein, Matthew D. Hoffman, Farhad Hormozdiari, Neil Houlsby, Shaobo Hou, Ghassen Jerfel, Alan Karthikesalingam, Mario Lucic, Yian Ma, Cory McLean, Diana Mincu, Akinori Mitani, Andrea Montanari, Zachary Nado, Vivek Natarajan, Christopher Nielson, Thomas F. Osborne, Rajiv Raman, Kim Ramasamy, Rory Sayres, Jessica Schrouff, Martin Seneviratne, Shannon Sequeira, Harini Suresh, Victor Veitch, Max Vladymyrov, Xuezhi Wang, Kellie Webster, Steve Yadlowsky, Taedong Yun, Xiaohua Zhai, D. Sculley
ML models often exhibit unexpectedly poor behavior when they are deployed in real-world domains. We identify underspecification as a key reason for these failures. An ML pipeline is underspecified when it can return many predictors with equivalently strong held-out performance in the training domain. Underspecification is common in modern ML pipelines, such as those based on deep learning. Predictors returned by underspecified pipelines are often treated as equivalent based on their training domain performance, but we show here that such predictors can behave very differently in deployment domains. This ambiguity can lead to instability and poor model behavior in practice, and is a distinct failure mode from previously identified issues arising from structural mismatch between training and deployment domains. We show that this problem appears in a wide variety of practical ML pipelines, using examples from computer vision, medical imaging, natural language processing, clinical risk prediction based on electronic health records, and medical genomics. Our results show the need to explicitly account for underspecification in modeling pipelines that are intended for real-world deployment in any domain.
NEW from a big collaboration at Google: Underspecification Presents Challenges for Credibility in Modern Machine Learning
— Alexander D'Amour (@alexdamour) November 9, 2020
Explores a common failure mode when applying ML to real-world problems. 🧵 1/14https://t.co/7vX5D8yMhq pic.twitter.com/AqtoNBGzd5
機械学習モデルが実問題適用時に想定外に性能劣化する問題の多くは、評価データで同じ性能を達成する解が複数ある解不定性(underspecification)が主原因であり、解集合からの解選択(周辺化では不十分)、利用想定したストレステストの設計、タスク特化の正則化が重要。 https://t.co/cA1wgA1Pq2
— Daisuke Okanohara (@hillbig) November 9, 2020
Underspecification Presents Challenges for Credibility in Modern Machine Learning
— Aran Komatsuzaki (@arankomatsuzaki) November 9, 2020
Massive collaboration by Googlers to show underspecified ML pipeline can lead to various instability and poor model behavior, incl. shortcuts and spurious correlations.https://t.co/fpJQuiMLiP pic.twitter.com/uNk1L9H7v3
2. Modular Primitives for High-Performance Differentiable Rendering
Samuli Laine, Janne Hellsten, Tero Karras, Yeongho Seol, Jaakko Lehtinen, Timo Aila
We present a modular differentiable renderer design that yields performance superior to previous methods by leveraging existing, highly optimized hardware graphics pipelines. Our design supports all crucial operations in a modern graphics pipeline: rasterizing large numbers of triangles, attribute interpolation, filtered texture lookups, as well as user-programmable shading and geometry processing, all in high resolutions. Our modular primitives allow custom, high-performance graphics pipelines to be built directly within automatic differentiation frameworks such as PyTorch or TensorFlow. As a motivating application, we formulate facial performance capture as an inverse rendering problem and show that it can be solved efficiently using our tools. Our results indicate that this simple and straightforward approach achieves excellent geometric correspondence between rendered results and reference imagery.
Modular Primitives for High-Performance Differentiable Rendering
— AK (@ak92501) November 9, 2020
pdf: https://t.co/ZxEH9LWNo2
abs: https://t.co/eQV4Ty6zdJ
github: https://t.co/H59YVj58ym pic.twitter.com/PsIoGsv49p
3. Complex Query Answering with Neural Link Predictors
Erik Arakelyan, Daniel Daza, Pasquale Minervini, Michael Cochez
Neural link predictors are immensely useful for identifying missing edges in large scale Knowledge Graphs. However, it is still not clear how to use these models for answering more complex queries that arise in a number of domains, such as queries using logical conjunctions, disjunctions, and existential quantifiers, while accounting for missing edges. In this work, we propose a framework for efficiently answering complex queries on incomplete Knowledge Graphs. We translate each query into an end-to-end differentiable objective, where the truth value of each atom is computed by a pre-trained neural link predictor. We then analyse two solutions to the optimisation problem, including gradient-based and combinatorial search. In our experiments, the proposed approach produces more accurate results than state-of-the-art methods — black-box neural models trained on millions of generated queries — without the need of training on a large and diverse set of complex queries. Using orders of magnitude less training data, we obtain relative improvements ranging from 8% up to 40% in Hits@3 across different knowledge graphs containing factual information. Finally, we demonstrate that it is possible to explain the outcome of our model in terms of the intermediate solutions identified for each of the complex query atoms.
Do we need deep black-box models to answer complex logical queries in KGs?
— Pasquale Minervini (@PMinervini) November 9, 2020
We show how a neural link predictor can be used to produce more accurate results than SOTA models while improving explanations! w/ @_kire_kara_ @danieldazac @michaelcochez, arxiv: https://t.co/eWA2QlskUr pic.twitter.com/eJfMLOx0oT
4. Disentangling 3D Prototypical Networks For Few-Shot Concept Learning
Mihir Prabhudesai, Shamit Lal, Darshan Patil, Hsiao-Yu Tung, Adam W Harley, Katerina Fragkiadaki
We present neural architectures that disentangle RGB-D images into objects’ shapes and styles and a map of the background scene, and explore their applications for few-shot 3D object detection and few-shot concept classification. Our networks incorporate architectural biases that reflect the image formation process, 3D geometry of the world scene, and shape-style interplay. They are trained end-to-end self-supervised by predicting views in static scenes, alongside a small number of 3D object boxes. Objects and scenes are represented in terms of 3D feature grids in the bottleneck of the network. We show that the proposed 3D neural representations are compositional: they can generate novel 3D scene feature maps by mixing object shapes and styles, resizing and adding the resulting object 3D feature maps over background scene feature maps. We show that classifiers for object categories, color, materials, and spatial relationships trained over the disentangled 3D feature sub-spaces generalize better with dramatically fewer examples than the current state-of-the-art, and enable a visual question answering system that uses them as its modules to generalize one-shot to novel objects in the scene.
Disentangling 3D Prototypical Networks For Few-Shot Concept Learning
— AK (@ak92501) November 9, 2020
pdf: https://t.co/lR2jqohJKq
abs: https://t.co/XdNiOIoiDv
project page: https://t.co/tdB2z6rsFL pic.twitter.com/8QG1oPa2fb
5. Large-scale multilingual audio visual dubbing
Yi Yang, Brendan Shillingford, Yannis Assael, Miaosen Wang, Wendi Liu, Yutian Chen, Yu Zhang, Eren Sezener, Luis C. Cobo, Misha Denil, Yusuf Aytar, Nando de Freitas
We describe a system for large-scale audiovisual translation and dubbing, which translates videos from one language to another. The source language’s speech content is transcribed to text, translated, and automatically synthesized into target language speech using the original speaker’s voice. The visual content is translated by synthesizing lip movements for the speaker to match the translated audio, creating a seamless audiovisual experience in the target language. The audio and visual translation subsystems each contain a large-scale generic synthesis model trained on thousands of hours of data in the corresponding domain. These generic models are fine-tuned to a specific speaker before translation, either using an auxiliary corpus of data from the target speaker, or using the video to be translated itself as the input to the fine-tuning process. This report gives an architectural overview of the full system, as well as an in-depth discussion of the video dubbing component. The role of the audio and text components in relation to the full system is outlined, but their design is not discussed in detail. Translated and dubbed demo videos generated using our system can be viewed at https://www.youtube.com/playlist?list=PLSi232j2ZA6_1Exhof5vndzyfbxAhhEs5
Here is our latest work on automated video translation https://t.co/p8OLpi1YLH! @yangyi02, @BrendanShilling, @MiaosenWang, Wendi Liu, @yutianc, Yu Zhang, Eren Sezener, Luis C. Cobo, @notmisha, @yusufaytar, @NandoDF
— Yannis Assael (@iassael) November 9, 2020
Large-scale multilingual audio visual dubbing
— AK (@ak92501) November 9, 2020
pdf: https://t.co/9ig4Ct7PhC
abs: https://t.co/G1h17gMPHR pic.twitter.com/IhLpMyy7hb
6. “What’s This?” — Learning to Segment Unknown Objects from Manipulation Sequences
Wout Boerdijk, Martin Sundermeyer, Maximilian Durner, Rudolph Triebel
We present a novel framework for self-supervised grasped object segmentation with a robotic manipulator. Our method successively learns an agnostic foreground segmentation followed by a distinction between manipulator and object solely by observing the motion between consecutive RGB frames. In contrast to previous approaches, we propose a single, end-to-end trainable architecture which jointly incorporates motion cues and semantic knowledge. Furthermore, while the motion of the manipulator and the object are substantial cues for our algorithm, we present means to robustly deal with distraction objects moving in the background, as well as with completely static scenes. Our method neither depends on any visual registration of a kinematic robot or 3D object models, nor on precise hand-eye calibration or any additional sensor data. By extensive experimental evaluation we demonstrate the superiority of our framework and provide detailed insights on its capability of dealing with the aforementioned extreme cases of motion. We also show that training a semantic segmentation network with the automatically labeled data achieves results on par with manually annotated training data. Code and pretrained models will be made publicly available.
“What’s This?” - Learning to Segment Unknown Objects from Manipulation Sequenceshttps://t.co/bc8mqhMSIm pic.twitter.com/QgRzdwwmRU
— sim2real (@sim2realAIorg) November 9, 2020
7. RetinaGAN: An Object-aware Approach to Sim-to-Real Transfer
Daniel Ho, Kanishka Rao, Zhuo Xu, Eric Jang, Mohi Khansari, Yunfei Bai
The success of deep reinforcement learning (RL) and imitation learning (IL) in vision-based robotic manipulation typically hinges on the expense of large scale data collection. With simulation, data to train a policy can be collected efficiently at scale, but the visual gap between sim and real makes deployment in the real world difficult. We introduce RetinaGAN, a generative adversarial network (GAN) approach to adapt simulated images to realistic ones with object-detection consistency. RetinaGAN is trained in an unsupervised manner without task loss dependencies, and preserves general object structure and texture in adapted images. We evaluate our method on three real world tasks: grasping, pushing, and door opening. RetinaGAN improves upon the performance of prior sim-to-real methods for RL-based object instance grasping and continues to be effective even in the limited data regime. When applied to a pushing task in a similar visual domain, RetinaGAN demonstrates transfer with no additional real data requirements. We also show our method bridges the visual gap for a novel door opening task using imitation learning in a new visual domain. Visit the project website at https://retinagan.github.io/
RetinaGAN: An Object-aware Approach to Sim-to-Real Transfer
— AK (@ak92501) November 9, 2020
pdf: https://t.co/w5rPnI0fp4
abs: https://t.co/XLgicF2bfO pic.twitter.com/6HbFTGH8PG
RetinaGAN: An Object-aware Approach to Sim-to-Real Transferhttps://t.co/l8C7lo94LC pic.twitter.com/IHuZ0ZbM9i
— sim2real (@sim2realAIorg) November 9, 2020
8. RealAnt: An Open-Source Low-Cost Quadruped for Research in Real-World Reinforcement Learning
Rinu Boney, Jussi Sainio, Mikko Kaivola, Arno Solin, Juho Kannala
Current robot platforms available for research are either very expensive or unable to handle the abuse of exploratory controls in reinforcement learning. We develop RealAnt, a minimal low-cost physical version of the popular ‘Ant’ benchmark used in reinforcement learning. RealAnt costs only $410 in materials and can be assembled in less than an hour. We validate the platform with reinforcement learning experiments and provide baseline results on a set of benchmark tasks. We demonstrate that the TD3 algorithm can learn to walk the RealAnt from less than 45 minutes of experience. We also provide simulator versions of the robot (with the same dimensions, state-action spaces, and delayed noisy observations) in the MuJoCo and PyBullet simulators. We open-source hardware designs, supporting software, and baseline results for ease of reproducibility.
🐜 3D print your own quadruped Ant for #RL 🐜
— Arno Solin (@arnosolin) November 9, 2020
"RealAnt: An Open-Source Low-Cost Quadruped for Research in Real-World Reinforcement Learning" (\w @rinuboney, @JussiSainio, Mikko, and Juho)
📄: https://t.co/iMoDKJ3cyQ
🖨: https://t.co/NPb1LF7nRI
🧮: https://t.co/bZV1pWCO3x pic.twitter.com/KHU3jc7B47
9. STReSSD: Sim-To-Real from Sound for Stochastic Dynamics
Carolyn Matl, Yashraj Narang, Dieter Fox, Ruzena Bajcsy, Fabio Ramos
Sound is an information-rich medium that captures dynamic physical events. This work presents STReSSD, a framework that uses sound to bridge the simulation-to-reality gap for stochastic dynamics, demonstrated for the canonical case of a bouncing ball. A physically-motivated noise model is presented to capture stochastic behavior of the balls upon collision with the environment. A likelihood-free Bayesian inference framework is used to infer the parameters of the noise model, as well as a material property called the coefficient of restitution, from audio observations. The same inference framework and the calibrated stochastic simulator are then used to learn a probabilistic model of ball dynamics. The predictive capabilities of the dynamics model are tested in two robotic experiments. First, open-loop predictions anticipate probabilistic success of bouncing a ball into a cup. The second experiment integrates audio perception with a robotic arm to track and deflect a bouncing ball in real-time. We envision that this work is a step towards integrating audio-based inference for dynamic robotic tasks. Experimental results can be viewed at https://youtu.be/b7pOrgZrArk.
STReSSD: Sim-To-Real from Sound for Stochastic Dynamics
— AK (@ak92501) November 9, 2020
pdf: https://t.co/MIPPHRurws
abs: https://t.co/lrTcHdfImA
video: https://t.co/bKQp2KoYBm pic.twitter.com/Cew1Nv8I2M
10. Efficient quantum algorithm for dissipative nonlinear differential equations
Jin-Peng Liu, Herman Øie Kolden, Hari K. Krovi, Nuno F. Loureiro, Konstantina Trivisa, Andrew M. Childs
- retweets: 20, favorites: 36 (11/10/2020 09:24:41)
- links: abs | pdf
- quant-ph | math.NA | physics.plasm-ph
While there has been extensive previous work on efficient quantum algorithms for linear differential equations, analogous progress for nonlinear differential equations has been severely limited due to the linearity of quantum mechanics. Despite this obstacle, we develop a quantum algorithm for initial value problems described by dissipative quadratic -dimensional ordinary differential equations. Assuming , where is a parameter characterizing the ratio of the nonlinearity to the linear dissipation, this algorithm has complexity , where is the evolution time and is the allowed error in the output quantum state. This is an exponential improvement over the best previous quantum algorithms, whose complexity is exponential in . We achieve this improvement using the method of Carleman linearization, for which we give an improved convergence theorem. This method maps a system of nonlinear differential equations to an infinite-dimensional system of linear differential equations, which we discretize, truncate, and solve using the forward Euler method and the quantum linear system algorithm. We also provide a lower bound on the worst-case complexity of quantum algorithms for general quadratic differential equations, showing that the problem is intractable for . Finally, we discuss potential applications of this approach to problems arising in biology as well as in fluid and plasma dynamics.
New paper with @JinPengLiu__Sky, Kolden, Krovi, Loureiro, and Trivisa gives an efficient quantum for nonlinear differential equations with strong enough dissipation, shows hardness for weak dissipation. [1/2] https://t.co/RguERPF8H4
— Andrew Childs (@andrewmchilds) November 9, 2020