Hot Papers 2021-03-09

1. Scale invariant robot behavior with fractals

Sam Kriegman, Amir Mohammadi Nasab, Douglas Blackiston, Hannah Steele, Michael Levin, Rebecca Kramer-Bottiglio, Josh Bongard

retweets: 576, favorites: 147 (03/10/2021 11:05:11)
links: abs | pdf
cs.RO | cs.AI

Robots deployed at orders of magnitude different size scales, and that retain the same desired behavior at any of those scales, would greatly expand the environments in which the robots could operate. However it is currently not known whether such robots exist, and, if they do, how to design them. Since self similar structures in nature often exhibit self similar behavior at different scales, we hypothesize that there may exist robot designs that have the same property. Here we demonstrate that this is indeed the case for some, but not all, modular soft robots: there are robot designs that exhibit a desired behavior at a small size scale, and if copies of that robot are attached together to realize the same design at higher scales, those larger robots exhibit similar behavior. We show how to find such designs in simulation using an evolutionary algorithm. Further, when fractal attachment is not assumed and attachment geometries must thus be evolved along with the design of the base robot unit, scale invariant behavior is not achieved, demonstrating that structural self similarity, when combined with appropriate designs, is a useful path to realizing scale invariant robot behavior. We validate our findings by demonstrating successful transferal of self similar structure and behavior to pneumatically-controlled soft robots. Finally, we show that biobots can spontaneously exhibit self similar attachment geometries, thereby suggesting that self similar behavior via self similar structure may be realizable across a wide range of robot platforms in future.

Q: Can we design tiny robots that when combined with copies of themselves fractally, provide utility at multiple length scales?

A: https://t.co/gVO2soVsCy pic.twitter.com/2gCvJiW77T
— Sam Kriegman (@Kriegmerica) March 9, 2021

"Scale invariant robot behavior with fractals”

New preprint from recent PhD graduate @Kriegmerica w/faculty member @DoctorJosh & teamhttps://t.co/mxyOrbSc3T pic.twitter.com/pVqQUV26IP
— Vermont Complex Systems Center @ UVM (@uvmcomplexity) March 9, 2021

2. Reinforcement Learning, Bit by Bit

Xiuyuan Lu, Benjamin Van Roy, Vikranth Dwaracherla, Morteza Ibrahimi, Ian Osband, Zheng Wen

retweets: 441, favorites: 93 (03/10/2021 11:05:12)
links: abs | pdf
cs.LG | cs.AI

Reinforcement learning agents have demonstrated remarkable achievements in simulated environments. Data efficiency poses an impediment to carrying this success over to real environments. The design of data-efficient agents calls for a deeper understanding of information acquisition and representation. We develop concepts and establish a regret bound that together offer principled guidance. The bound sheds light on questions of what information to seek, how to seek that information, and it what information to retain. To illustrate concepts, we design simple agents that build on them and present computational results that demonstrate improvements in data efficiency.

Reinforcement Learning, Bit by Bit
pdf: https://t.co/r32UxWG9OW
abs: https://t.co/b0BZN1qQxj pic.twitter.com/tIeDcp1xm6
— AK (@ak92501) March 9, 2021

3. The Hintons in your Neural Network: a Quantum Field Theory View of Deep Learning

Roberto Bondesan, Max Welling

retweets: 391, favorites: 95 (03/10/2021 11:05:12)
links: abs | pdf
quant-ph | cs.LG

In this work we develop a quantum field theory formalism for deep learning, where input signals are encoded in Gaussian states, a generalization of Gaussian processes which encode the agent’s uncertainty about the input signal. We show how to represent linear and non-linear layers as unitary quantum gates, and interpret the fundamental excitations of the quantum model as particles, dubbed Hintons”. On top of opening a new perspective and techniques for studying neural networks, the quantum formulation is well suited for optical quantum computing, and provides quantum deformations of neural networks that can be run efficiently on those devices. Finally, we discuss a semi-classical limit of the quantum deformed models which is amenable to classical simulation.

Hintonがついに粒子になってしまったhttps://t.co/JH0yGUlPl4
— kenjikun (@kenjikun__) March 9, 2021

The Hintons in your Neural Network: a Quantum Field Theory View of Deep Learninghttps://t.co/YMGpOwn4Ya
ハイジ「教えて、どうしてニューラルネットワークは学習できるの」
おじいさん「ヒントンが中にいるからだよ」
— ワクワクさん（ミジンコ） (@mosko_mule) March 9, 2021

4. Parser-Free Virtual Try-on via Distilling Appearance Flows

Yuying Ge, Yibing Song, Ruimao Zhang, Chongjian Ge, Wei Liu, Ping Luo

retweets: 240, favorites: 51 (03/10/2021 11:05:12)
links: abs | pdf
cs.CV

Image virtual try-on aims to fit a garment image (target clothes) to a person image. Prior methods are heavily based on human parsing. However, slightly-wrong segmentation results would lead to unrealistic try-on images with large artifacts. Inaccurate parsing misleads parser-based methods to produce visually unrealistic results where artifacts usually occur. A recent pioneering work employed knowledge distillation to reduce the dependency of human parsing, where the try-on images produced by a parser-based method are used as supervisions to train a “student” network without relying on segmentation, making the student mimic the try-on ability of the parser-based model. However, the image quality of the student is bounded by the parser-based model. To address this problem, we propose a novel approach, “teacher-tutor-student” knowledge distillation, which is able to produce highly photo-realistic images without human parsing, possessing several appealing advantages compared to prior arts. (1) Unlike existing work, our approach treats the fake images produced by the parser-based method as “tutor knowledge”, where the artifacts can be corrected by real “teacher knowledge”, which is extracted from the real person images in a self-supervised way. (2) Other than using real images as supervisions, we formulate knowledge distillation in the try-on problem as distilling the appearance flows between the person image and the garment image, enabling us to find accurate dense correspondences between them to produce high-quality results. (3) Extensive evaluations show large superiority of our method (see Fig. 1).

Parser-Free Virtual Try-on via Distilling Appearance Flows
pdf: https://t.co/2r7K4T9PR4
abs: https://t.co/c3dqg5g8Xz
github: https://t.co/xS37Jdm07X pic.twitter.com/DIm9D7U6xp
— AK (@ak92501) March 9, 2021

5. High Perceptual Quality Image Denoising with a Posterior Sampling CGAN

Guy Ohayon, Theo Adrai, Gregory Vaksman, Michael Elad, Peyman Milanfar

retweets: 185, favorites: 89 (03/10/2021 11:05:12)
links: abs | pdf
cs.CV | cs.LG | eess.IV

The vast work in Deep Learning (DL) has led to a leap in image denoising research. Most DL solutions for this task have chosen to put their efforts on the denoiser’s architecture while maximizing distortion performance. However, distortion driven solutions lead to blurry results with sub-optimal perceptual quality, especially in immoderate noise levels. In this paper we propose a different perspective, aiming to produce sharp and visually pleasing denoised images that are still faithful to their clean sources. Formally, our goal is to achieve high perceptual quality with acceptable distortion. This is attained by a stochastic denoiser that samples from the posterior distribution, trained as a generator in the framework of conditional generative adversarial networks (CGANs). Contrary to distortion-based regularization terms that conflict with perceptual quality, we introduce to the CGANs objective a theoretically founded penalty term that does not force a distortion requirement on individual samples, but rather on their mean. We showcase our proposed method with a novel denoiser architecture that achieves the reformed denoising goal and produces vivid and diverse outcomes in immoderate noise levels.

High perceptual quality denoising is obtained by sampling a posterior trained as a generator in a CGAN framework. We don’t force a distortion on individual samples but instead on their mean. The denoiser produces vivid & diverse images even in high noisehttps://t.co/jc6hNwzGUP pic.twitter.com/237jsRNvYG
— Peyman Milanfar (@docmilanfar) March 9, 2021

High Perceptual Quality Image Denoising with a Posterior Sampling CGAN
pdf: https://t.co/krViSxIxjI
abs: https://t.co/6p2tBtDSGr pic.twitter.com/TT00dMJzb8
— AK (@ak92501) March 9, 2021

6. Multimodal Representation Learning via Maximization of Local Mutual Information

Ruizhi Liao, Daniel Moyer, Miriam Cha, Keegan Quigley, Seth Berkowitz, Steven Horng, Polina Golland, William M. Wells

retweets: 195, favorites: 70 (03/10/2021 11:05:13)
links: abs | pdf
cs.CV

We propose and demonstrate a representation learning approach by maximizing the mutual information between local features of images and text. The goal of this approach is to learn useful image representations by taking advantage of the rich information contained in the free text that describes the findings in the image. Our method learns image and text encoders by encouraging the resulting representations to exhibit high local mutual information. We make use of recent advances in mutual information estimation with neural network discriminators. We argue that, typically, the sum of local mutual information is a lower bound on the global mutual information. Our experimental results in the downstream image classification tasks demonstrate the advantages of using local features for image-text representation learning.

Multimodal Representation Learning via Maximization of Local Mutual Information
pdf: https://t.co/Re3zrepXI8
abs: https://t.co/iMKG3E5UvX pic.twitter.com/ogCtrCF9sD
— AK (@ak92501) March 9, 2021

7. Repurposing GANs for One-shot Semantic Part Segmentation

Nontawat Tritrong, Pitchaporn Rewatbowornwong, Supasorn Suwajanakorn

retweets: 169, favorites: 89 (03/10/2021 11:05:13)
links: abs | pdf
cs.CV | cs.LG

While GANs have shown success in realistic image generation, the idea of using GANs for other tasks unrelated to synthesis is underexplored. Do GANs learn meaningful structural parts of objects during their attempt to reproduce those objects? In this work, we test this hypothesis and propose a simple and effective approach based on GANs for semantic part segmentation that requires as few as one label example along with an unlabeled dataset. Our key idea is to leverage a trained GAN to extract pixel-wise representation from the input image and use it as feature vectors for a segmentation network. Our experiments demonstrate that GANs representation is “readily discriminative” and produces surprisingly good results that are comparable to those from supervised baselines trained with significantly more labels. We believe this novel repurposing of GANs underlies a new class of unsupervised representation learning that is applicable to many other tasks. More results are available at https://repurposegans.github.io/.

Repurposing GANs for One-shot Semantic Part Segmentation
pdf: https://t.co/BsslMf2fQ0
abs: https://t.co/6dTIHL5Nne
project page: https://t.co/8Sb4PIgAu5 pic.twitter.com/b71pYPIGEB
— AK (@ak92501) March 9, 2021

8. Kanerva++: extending The Kanerva Machine with differentiable, locally block allocated latent memory

Jason Ramapuram, Yan Wu, Alexandros Kalousis

retweets: 133, favorites: 55 (03/10/2021 11:05:13)
links: abs | pdf
cs.NE | cs.AI | cs.CV | cs.LG | stat.ML

Episodic and semantic memory are critical components of the human memory model. The theory of complementary learning systems (McClelland et al., 1995) suggests that the compressed representation produced by a serial event (episodic memory) is later restructured to build a more generalized form of reusable knowledge (semantic memory). In this work we develop a new principled Bayesian memory allocation scheme that bridges the gap between episodic and semantic memory via a hierarchical latent variable model. We take inspiration from traditional heap allocation and extend the idea of locally contiguous memory to the Kanerva Machine, enabling a novel differentiable block allocated latent memory. In contrast to the Kanerva Machine, we simplify the process of memory writing by treating it as a fully feed forward deterministic process, relying on the stochasticity of the read key distribution to disperse information within the memory. We demonstrate that this allocation scheme improves performance in conditional image generation, resulting in new state-of-the-art likelihood values on binarized MNIST (<=41.58 nats/image) , binarized Omniglot (<=66.24 nats/image), as well as presenting competitive performance on CIFAR10, DMLab Mazes, Celeb-A and ImageNet32x32.

Kanerva++: extending The Kanerva Machine with differentiable, locally block allocated latent memory : our recent ICLR 2021 paper in collaboration with Yan Wu & Alexandros Kalousis.https://t.co/Y1N39soQmh

We propose a novel memory model inspired by traditional heap allocation pic.twitter.com/ricu33bGlk
— Jason Ramapuram (@jramapuram) March 9, 2021

9. Deep Generative Modelling: A Comparative Review of VAEs, GANs, Normalizing Flows, Energy-Based and Autoregressive Models

Sam Bond-Taylor, Adam Leach, Yang Long, Chris G. Willcocks

retweets: 89, favorites: 32 (03/10/2021 11:05:13)
links: abs | pdf
cs.LG | cs.CV | stat.ML

Deep generative modelling is a class of techniques that train deep neural networks to model the distribution of training samples. Research has fragmented into various interconnected approaches, each of which making trade-offs including run-time, diversity, and architectural restrictions. In particular, this compendium covers energy-based models, variational autoencoders, generative adversarial networks, autoregressive models, normalizing flows, in addition to numerous hybrid approaches. These techniques are drawn under a single cohesive framework, comparing and contrasting to explain the premises behind each, while reviewing current state-of-the-art advances and implementations.

[R] Deep Generative Modelling: A Comparative Review of VAEs, GANs, Normalizing Flows, Energy-Based and Autoregressive Models - @sambondtaylor at @comp_sci_durham https://t.co/IA9Qp1HBWh https://t.co/MAT1PggitO #MachineLearning #ai #artificialintelligence #deeplearning #science pic.twitter.com/qf7hBFPhy3
— Chris Willcocks (@cwkx) March 9, 2021

10. Syntax-BERT: Improving Pre-trained Transformers with Syntax Trees

Jiangang Bai, Yujing Wang, Yiren Chen, Yaming Yang, Jing Bai, Jing Yu, Yunhai Tong

retweets: 58, favorites: 58 (03/10/2021 11:05:13)
links: abs | pdf
cs.CL

Pre-trained language models like BERT achieve superior performances in various NLP tasks without explicit consideration of syntactic information. Meanwhile, syntactic information has been proved to be crucial for the success of NLP applications. However, how to incorporate the syntax trees effectively and efficiently into pre-trained Transformers is still unsettled. In this paper, we address this problem by proposing a novel framework named Syntax-BERT. This framework works in a plug-and-play mode and is applicable to an arbitrary pre-trained checkpoint based on Transformer architecture. Experiments on various datasets of natural language understanding verify the effectiveness of syntax trees and achieve consistent improvement over multiple pre-trained models, including BERT, RoBERTa, and T5.

Syntax-BERT: Improving Pre-trained Transformers with Syntax Trees
pdf: https://t.co/8BxmlpZAAh
abs: https://t.co/D3FWo679ik pic.twitter.com/hsZosTfh9f
— AK (@ak92501) March 9, 2021

11. End-to-End Human Object Interaction Detection with HOI Transformer

Cheng Zou, Bohan Wang, Yue Hu, Junqi Liu, Qian Wu, Yu Zhao, Boxun Li, Chenguang Zhang, Chi Zhang, Yichen Wei, Jian Sun

retweets: 72, favorites: 29 (03/10/2021 11:05:13)
links: abs | pdf
cs.CV

We propose HOI Transformer to tackle human object interaction (HOI) detection in an end-to-end manner. Current approaches either decouple HOI task into separated stages of object detection and interaction classification or introduce surrogate interaction problem. In contrast, our method, named HOI Transformer, streamlines the HOI pipeline by eliminating the need for many hand-designed components. HOI Transformer reasons about the relations of objects and humans from global image context and directly predicts HOI instances in parallel. A quintuple matching loss is introduced to force HOI predictions in a unified way. Our method is conceptually much simpler and demonstrates improved accuracy. Without bells and whistles, HOI Transformer achieves $26.61\%$ $AP$ on HICO-DET and $52.9\%$ $AP_{role}$ on V-COCO, surpassing previous methods with the advantage of being much simpler. We hope our approach will serve as a simple and effective alternative for HOI tasks. Code is available at https://github.com/bbepoch/HoiTransformer .

End-to-End Human Object Interaction Detection with HOI Transformer
pdf: https://t.co/Srg4bQUCqQ
abs: https://t.co/m0IzHeUm0B pic.twitter.com/ogz0p3z573
— AK (@ak92501) March 9, 2021

12. Greedy Hierarchical Variational Autoencoders for Large-Scale Video Prediction

Bohan Wu, Suraj Nair, Roberto Martin-Martin, Li Fei-Fei, Chelsea Finn

retweets: 56, favorites: 23 (03/10/2021 11:05:13)
links: abs | pdf
cs.CV | cs.AI | cs.LG | cs.RO

A video prediction model that generalizes to diverse scenes would enable intelligent agents such as robots to perform a variety of tasks via planning with the model. However, while existing video prediction models have produced promising results on small datasets, they suffer from severe underfitting when trained on large and diverse datasets. To address this underfitting challenge, we first observe that the ability to train larger video prediction models is often bottlenecked by the memory constraints of GPUs or TPUs. In parallel, deep hierarchical latent variable models can produce higher quality predictions by capturing the multi-level stochasticity of future observations, but end-to-end optimization of such models is notably difficult. Our key insight is that greedy and modular optimization of hierarchical autoencoders can simultaneously address both the memory constraints and the optimization challenges of large-scale video prediction. We introduce Greedy Hierarchical Variational Autoencoders (GHVAEs), a method that learns high-fidelity video predictions by greedily training each level of a hierarchical autoencoder. In comparison to state-of-the-art models, GHVAEs provide 17-55% gains in prediction performance on four video datasets, a 35-40% higher success rate on real robot tasks, and can improve performance monotonically by simply adding more modules.

Greedy Hierarchical Variational Autoencoders for Large-Scale Video Prediction
pdf: https://t.co/50xCTdKy4t
abs: https://t.co/oGKYgukzPD
project page: https://t.co/jor3Ohaf2R pic.twitter.com/yMEiXGZUQq
— AK (@ak92501) March 9, 2021

David Hoeller, Lorenz Wellhausen, Farbod Farshidian, Marco Hutter

retweets: 44, favorites: 28 (03/10/2021 11:05:14)
links: abs | pdf
cs.RO | cs.AI | cs.CV | cs.LG

In this work, we present a learning-based pipeline to realise local navigation with a quadrupedal robot in cluttered environments with static and dynamic obstacles. Given high-level navigation commands, the robot is able to safely locomote to a target location based on frames from a depth camera without any explicit mapping of the environment. First, the sequence of images and the current trajectory of the camera are fused to form a model of the world using state representation learning. The output of this lightweight module is then directly fed into a target-reaching and obstacle-avoiding policy trained with reinforcement learning. We show that decoupling the pipeline into these components results in a sample efficient policy learning stage that can be fully trained in simulation in just a dozen minutes. The key part is the state representation, which is trained to not only estimate the hidden state of the world in an unsupervised fashion, but also helps bridging the reality gap, enabling successful sim-to-real transfer. In our experiments with the quadrupedal robot ANYmal in simulation and in reality, we show that our system can handle noisy depth images, avoid dynamic obstacles unseen during training, and is endowed with local spatial awareness.

Learning a navigation policy in dynamic environments in 10 minutes and deploying it sim-to-real is not as hard as it sounds.
See our RA-L paper https://t.co/oSJy8JMDfH with @HoellerDavid, @schlurry, Farbod Farshidian, Marco Hutter, https://t.co/EALPqsMcPn

⬇️ Key ingredients ⬇️
— Robotic Systems Lab (@leggedrobotics) March 9, 2021

14. LOHO: Latent Optimization of Hairstyles via Orthogonalization

Rohit Saha, Brendan Duke, Florian Shkurti, Graham W. Taylor, Parham Aarabi

retweets: 42, favorites: 24 (03/10/2021 11:05:14)
links: abs | pdf
cs.CV | cs.LG

Hairstyle transfer is challenging due to hair structure differences in the source and target hair. Therefore, we propose Latent Optimization of Hairstyles via Orthogonalization (LOHO), an optimization-based approach using GAN inversion to infill missing hair structure details in latent space during hairstyle transfer. Our approach decomposes hair into three attributes: perceptual structure, appearance, and style, and includes tailored losses to model each of these attributes independently. Furthermore, we propose two-stage optimization and gradient orthogonalization to enable disentangled latent space optimization of our hair attributes. Using LOHO for latent space manipulation, users can synthesize novel photorealistic images by manipulating hair attributes either individually or jointly, transferring the desired attributes from reference hairstyles. LOHO achieves a superior FID compared with the current state-of-the-art (SOTA) for hairstyle transfer. Additionally, LOHO preserves the subject’s identity comparably well according to PSNR and SSIM when compared to SOTA image embedding pipelines.

LOHO: Latent Optimization of Hairstyles via Orthogonalization
pdf: https://t.co/DYM65dVpbd
abs: https://t.co/Saba4xytch
github: https://t.co/4QHxRbvwfM pic.twitter.com/7q5varYOJu
— AK (@ak92501) March 9, 2021

15. WebFace260M: A Benchmark Unveiling the Power of Million-Scale Deep Face Recognition

Zheng Zhu, Guan Huang, Jiankang Deng, Yun Ye, Junjie Huang, Xinze Chen, Jiagang Zhu, Tian Yang, Jiwen Lu, Dalong Du, Jie Zhou

retweets: 36, favorites: 28 (03/10/2021 11:05:14)
links: abs | pdf
cs.CV

In this paper, we contribute a new million-scale face benchmark containing noisy 4M identities/260M faces (WebFace260M) and cleaned 2M identities/42M faces (WebFace42M) training data, as well as an elaborately designed time-constrained evaluation protocol. Firstly, we collect 4M name list and download 260M faces from the Internet. Then, a Cleaning Automatically utilizing Self-Training (CAST) pipeline is devised to purify the tremendous WebFace260M, which is efficient and scalable. To the best of our knowledge, the cleaned WebFace42M is the largest public face recognition training set and we expect to close the data gap between academia and industry. Referring to practical scenarios, Face Recognition Under Inference Time conStraint (FRUITS) protocol and a test set are constructed to comprehensively evaluate face matchers. Equipped with this benchmark, we delve into million-scale face recognition problems. A distributed framework is developed to train face recognition models efficiently without tampering with the performance. Empowered by WebFace42M, we reduce relative 40% failure rate on the challenging IJB-C set, and ranks the 3rd among 430 entries on NIST-FRVT. Even 10% data (WebFace4M) shows superior performance compared with public training set. Furthermore, comprehensive baselines are established on our rich-attribute test set under FRUITS-100ms/500ms/1000ms protocol, including MobileNet, EfficientNet, AttentionNet, ResNet, SENet, ResNeXt and RegNet families. Benchmark website is https://www.face-benchmark.org.

WebFace260M: A Benchmark Unveiling the Power of Million-Scale Deep Face Recognition
pdf: https://t.co/XxjF3WQ4kY
abs: https://t.co/JsPvSIu4Qj pic.twitter.com/iXh1WInYJi
— AK (@ak92501) March 9, 2021

Published 10 Mar 2021

ML Lead at Beatrust. (https://beatrust.com)Tatsuya Shirakawa on Twitter