Hot Papers 2020-11-20

1. Parrot: Data-Driven Behavioral Priors for Reinforcement Learning

Avi Singh, Huihan Liu, Gaoyue Zhou, Albert Yu, Nicholas Rhinehart, Sergey Levine

retweets: 453, favorites: 124 (11/21/2020 23:54:46)
links: abs | pdf
cs.LG | cs.RO

Reinforcement learning provides a general framework for flexible decision making and control, but requires extensive data collection for each new task that an agent needs to learn. In other machine learning fields, such as natural language processing or computer vision, pre-training on large, previously collected datasets to bootstrap learning for new tasks has emerged as a powerful paradigm to reduce data requirements when learning a new task. In this paper, we ask the following question: how can we enable similarly useful pre-training for RL agents? We propose a method for pre-training behavioral priors that can capture complex input-output relationships observed in successful trials from a wide range of previously seen tasks, and we show how this learned prior can be used for rapidly learning new tasks without impeding the RL agent’s ability to try out novel behaviors. We demonstrate the effectiveness of our approach in challenging robotic manipulation domains involving image observations and sparse reward functions, where our method outperforms prior works by a substantial margin.

RL agents explore randomly. Humans explore by trying potential good behaviors, because we have a prior on what might be useful. Can robots get such behavioral priors? That's the idea in Parrot.

arxiv https://t.co/COMGTmCInG
web https://t.co/1o1T6rsiHb
vid https://t.co/2MU2V8VNGa pic.twitter.com/cZSxqTHafl
— Sergey Levine (@svlevine) November 20, 2020

2. Creative Sketch Generation

Songwei Ge, Vedanuj Goswami, C. Lawrence Zitnick, Devi Parikh

retweets: 241, favorites: 149 (11/21/2020 23:54:46)
links: abs | pdf
cs.CV | cs.AI

Sketching or doodling is a popular creative activity that people engage in. However, most existing work in automatic sketch understanding or generation has focused on sketches that are quite mundane. In this work, we introduce two datasets of creative sketches — Creative Birds and Creative Creatures — containing 10k sketches each along with part annotations. We propose DoodlerGAN — a part-based Generative Adversarial Network (GAN) — to generate unseen compositions of novel part appearances. Quantitative evaluations as well as human studies demonstrate that sketches generated by our approach are more creative and of higher quality than existing approaches. In fact, in Creative Birds, subjects prefer sketches generated by DoodlerGAN over those drawn by humans! Our code can be found at https://github.com/facebookresearch/DoodlerGAN and a demo can be found at http://doodlergan.cloudcv.org.

Very excited about our work on creative sketching!

Two datasets of ~10k sketches with part annotations
DoodlerGAN: A part-based GAN
(Super fun!) Web demo: https://t.co/6XzYd777lb

Paper + code: https://t.co/rIqW6hj9az

Work led by Songwei Ge. With @vedanujg and Larry Zitnick. pic.twitter.com/Zl2nvoiOTJ
— Devi Parikh (@deviparikh) November 20, 2020

Creative Sketch Generation
pdf: https://t.co/Y5O1Ainvqp
abs: https://t.co/k1vCfyAUdp
github: https://t.co/l1EC0WKmdy
demo: https://t.co/ImZ9xbwXq8 pic.twitter.com/UBZvxe1IA6
— AK (@ak92501) November 20, 2020

3. Universal MelGAN: A Robust Neural Vocoder for High-Fidelity Waveform Generation in Multiple Domains

Won Jang, Dan Lim, Jaesam Yoon

retweets: 212, favorites: 88 (11/21/2020 23:54:46)
links: abs | pdf
eess.AS | cs.CL | cs.LG | cs.SD

We propose Universal MelGAN, a vocoder that synthesizes high-fidelity speech in multiple domains. To preserve sound quality when the MelGAN-based structure is trained with a dataset of hundreds of speakers, we added multi-resolution spectrogram discriminators to sharpen the spectral resolution of the generated waveforms. This enables the model to generate realistic waveforms of multi-speakers, by alleviating the over-smoothing problem in the high frequency band of the large footprint model. Our structure generates signals close to ground-truth data without reducing the inference speed, by discriminating the waveform and spectrogram during training. The model achieved the best mean opinion score (MOS) in most scenarios using ground-truth mel-spectrogram as an input. Especially, it showed superior performance in unseen domains with regard of speaker, emotion, and language. Moreover, in a multi-speaker text-to-speech scenario using mel-spectrogram generated by a transformer model, it synthesized high-fidelity speech of 4.22 MOS. These results, achieved without external domain information, highlight the potential of the proposed model as a universal vocoder.

Universal MelGAN: A Robust Neural Vocoder for High-Fidelity Waveform Generation in Multiple Domains
pdf: https://t.co/xsTHdY30ov
abs: https://t.co/Skm2eVnrYS
project page: https://t.co/ZVsuQSwbl7 pic.twitter.com/T8JHtjFcGf
— AK (@ak92501) November 20, 2020

4. Randomized Self Organizing Map

Nicolas P. Rougier, Georgios Is. Detorakis

retweets: 210, favorites: 81 (11/21/2020 23:54:47)
links: abs | pdf
cs.NE | cs.LG

We propose a variation of the self organizing map algorithm by considering the random placement of neurons on a two-dimensional manifold, following a blue noise distribution from which various topologies can be derived. These topologies possess random (but controllable) discontinuities that allow for a more flexible self-organization, especially with high-dimensional data. The proposed algorithm is tested on one-, two- and three-dimensions tasks as well as on the MNIST handwritten digits dataset and validated using spectral analysis and topological data analysis tools. We also demonstrate the ability of the randomized self-organizing map to gracefully reorganize itself in case of neural lesion and/or neurogenesis.

Randomised Self Organising Map #preprint available at https://t.co/GQkIDOquhK. We revisited Kohonen SOM using random topologies with discontinuities. All figures done using the almighty @matplotlib library. pic.twitter.com/L8oQJHZGnn
— Nicolas P. Rougier (@NPRougier) November 20, 2020

5. Style Intervention: How to Achieve Spatial Disentanglement with Style-based Generators?

Yunfan Liu, Qi Li, Zhenan Sun, Tieniu Tan

retweets: 72, favorites: 46 (11/21/2020 23:54:47)
links: abs | pdf
cs.CV

Generative Adversarial Networks (GANs) with style-based generators (e.g. StyleGAN) successfully enable semantic control over image synthesis, and recent studies have also revealed that interpretable image translations could be obtained by modifying the latent code. However, in terms of the low-level image content, traveling in the latent space would lead to `spatially entangled changes’ in corresponding images, which is undesirable in many real-world applications where local editing is required. To solve this problem, we analyze properties of the ‘style space’ and explore the possibility of controlling the local translation with pre-trained style-based generators. Concretely, we propose ‘Style Intervention’, a lightweight optimization-based algorithm which could adapt to arbitrary input images and render natural translation effects under flexible objectives. We verify the performance of the proposed framework in facial attribute editing on high-resolution images, where both photo-realism and consistency are required. Extensive qualitative results demonstrate the effectiveness of our method, and quantitative measurements also show that the proposed algorithm outperforms state-of-the-art benchmarks in various aspects.

Style Intervention: How to Achieve Spatial Disentanglement with Style-based Generators?
pdf: https://t.co/Zpays6u54e
abs: https://t.co/E8cH1hS3UV pic.twitter.com/BGLOsBpkcy
— AK (@ak92501) November 20, 2020

6. Impact of Accuracy on Model Interpretations

Brian Liu, Madeleine Udell

retweets: 49, favorites: 63 (11/21/2020 23:54:47)
links: abs | pdf
cs.LG | cs.AI

Model interpretations are often used in practice to extract real world insights from machine learning models. These interpretations have a wide range of applications; they can be presented as business recommendations or used to evaluate model bias. It is vital for a data scientist to choose trustworthy interpretations to drive real world impact. Doing so requires an understanding of how the accuracy of a model impacts the quality of standard interpretation tools. In this paper, we will explore how a model’s predictive accuracy affects interpretation quality. We propose two metrics to quantify the quality of an interpretation and design an experiment to test how these metrics vary with model accuracy. We find that for datasets that can be modeled accurately by a variety of methods, simpler methods yield higher quality interpretations. We also identify which interpretation method works the best for lower levels of model accuracy.

"Impact of Accuracy on Model Interpretations" -- was just reading this interesting preprint by Liu and Udell. As always, simulations on real data can be messy, but the conclusions are interesting https://t.co/1Ei4GFt2og 1/2
— Sebastian Raschka (@rasbt) November 20, 2020

7. Multi-Plane Program Induction with 3D Box Priors

Yikai Li, Jiayuan Mao, Xiuming Zhang, Bill Freeman, Josh Tenenbaum, Noah Snavely, Jiajun Wu

retweets: 72, favorites: 30 (11/21/2020 23:54:47)
links: abs | pdf
cs.CV | cs.LG | stat.ML

We consider two important aspects in understanding and editing images: modeling regular, program-like texture or patterns in 2D planes, and 3D posing of these planes in the scene. Unlike prior work on image-based program synthesis, which assumes the image contains a single visible 2D plane, we present Box Program Induction (BPI), which infers a program-like scene representation that simultaneously models repeated structure on multiple 2D planes, the 3D position and orientation of the planes, and camera parameters, all from a single image. Our model assumes a box prior, i.e., that the image captures either an inner view or an outer view of a box in 3D. It uses neural networks to infer visual cues such as vanishing points, wireframe lines to guide a search-based algorithm to find the program that best explains the image. Such a holistic, structured scene representation enables 3D-aware interactive image editing operations such as inpainting missing pixels, changing camera parameters, and extrapolate the image contents.

Multi-Plane Program Induction with 3D Box Priors
pdf: https://t.co/g5hLvul10N
abs: https://t.co/1i1E3DM6oe
project page: https://t.co/GKnPqwcy48 pic.twitter.com/yOR9pkKukP
— AK (@ak92501) November 20, 2020

8. FinRL: A Deep Reinforcement Learning Library for Automated Stock Trading in Quantitative Finance

Xiao-Yang Liu, Hongyang Yang, Qian Chen, Runjia Zhang, Liuqing Yang, Bowen Xiao, Christina Dan Wang

retweets: 80, favorites: 19 (11/21/2020 23:54:47)
links: abs | pdf
q-fin.TR | cs.LG

As deep reinforcement learning (DRL) has been recognized as an effective approach in quantitative finance, getting hands-on experiences is attractive to beginners. However, to train a practical DRL trading agent that decides where to trade, at what price, and what quantity involves error-prone and arduous development and debugging. In this paper, we introduce a DRL library FinRL that facilitates beginners to expose themselves to quantitative finance and to develop their own stock trading strategies. Along with easily-reproducible tutorials, FinRL library allows users to streamline their own developments and to compare with existing schemes easily. Within FinRL, virtual environments are configured with stock market datasets, trading agents are trained with neural networks, and extensive backtesting is analyzed via trading performance. Moreover, it incorporates important trading constraints such as transaction cost, market liquidity and the investor’s degree of risk-aversion. FinRL is featured with completeness, hands-on tutorial and reproducibility that favors beginners: (i) at multiple levels of time granularity, FinRL simulates trading environments across various stock markets, including NASDAQ-100, DJIA, S&P 500, HSI, SSE 50, and CSI 300; (ii) organized in a layered architecture with modular structure, FinRL provides fine-tuned state-of-the-art DRL algorithms (DQN, DDPG, PPO, SAC, A2C, TD3, etc.), commonly-used reward functions and standard evaluation baselines to alleviate the debugging workloads and promote the reproducibility, and (iii) being highly extendable, FinRL reserves a complete set of user-import interfaces. Furthermore, we incorporated three application demonstrations, namely single stock trading, multiple stock trading, and portfolio allocation. The FinRL library will be available on Github at link https://github.com/AI4Finance-LLC/FinRL-Library.

9. Learning Deep Video Stabilization without Optical Flow

Muhammad Kashif Ali, Sangjoon Yu, Tae Hyun Kim

retweets: 72, favorites: 27 (11/21/2020 23:54:47)
links: abs | pdf
cs.CV

Learning the necessary high-level reasoning for video stabilization without the help of optical flow has proved to be one of the most challenging tasks in the field of computer vision. In this work, we present an iterative frame interpolation strategy to generate a novel dataset that is diverse enough to formulate video stabilization as a supervised learning problem unassisted by optical flow. A major benefit of treating video stabilization as a pure RGB based generative task over the conventional optical flow assisted approaches is the preservation of content and resolution, which is usually obstructed in the latter approaches. To do so, we provide a new video stabilization dataset and train an efficient network that can produce competitive stabilization results in a fraction of the time taken to do the same with the recent iterative frame interpolation schema. Our method provides qualitatively and quantitatively better results than those generated through state-of-the-art video stabilization methods. To the best of our knowledge, this is the only work that demonstrates the importance of perspective in formulating video stabilization as a deep learning problem instead of replacing it with an inter-frame motion measure

Learning Deep Video Stabilization without Optical Flow
pdf: https://t.co/OLPrv5gFtH
abs: https://t.co/1XBQLVqkoK pic.twitter.com/peyKRPm0ir
— AK (@ak92501) November 20, 2020

10. Exploring Text Specific and Blackbox Fairness Algorithms in Multimodal Clinical NLP

John Chen, Ian Berlot-Atwell, Safwan Hossain, Xindi Wang, Frank Rudzicz

retweets: 70, favorites: 3 (11/21/2020 23:54:47)
links: abs | pdf
cs.CL | cs.AI

Clinical machine learning is increasingly multimodal, collected in both structured tabular formats and unstructured forms such as freetext. We propose a novel task of exploring fairness on a multimodal clinical dataset, adopting equalized odds for the downstream medical prediction tasks. To this end, we investigate a modality-agnostic fairness algorithm - equalized odds post processing - and compare it to a text-specific fairness algorithm: debiased clinical word embeddings. Despite the fact that debiased word embeddings do not explicitly address equalized odds of protected groups, we show that a text-specific approach to fairness may simultaneously achieve a good balance of performance and classical notions of fairness. We hope that our paper inspires future contributions at the critical intersection of clinical NLP and fairness. The full source code is available here: https://github.com/johntiger1/multimodal_fairness

11. Everybody Sign Now: Translating Spoken Language to Photo Realistic Sign Language Video

Ben Saunders, Necati Cihan Camgoz, Richard Bowden

retweets: 12, favorites: 39 (11/21/2020 23:54:47)
links: abs | pdf
cs.CV | cs.CL | cs.LG

To be truly understandable and accepted by Deaf communities, an automatic Sign Language Production (SLP) system must generate a photo-realistic signer. Prior approaches based on graphical avatars have proven unpopular, whereas recent neural SLP works that produce skeleton pose sequences have been shown to be not understandable to Deaf viewers. In this paper, we propose SignGAN, the first SLP model to produce photo-realistic continuous sign language videos directly from spoken language. We employ a transformer architecture with a Mixture Density Network (MDN) formulation to handle the translation from spoken language to skeletal pose. A pose-conditioned human synthesis model is then introduced to generate a photo-realistic sign language video from the skeletal pose sequence. This allows the photo-realistic production of sign videos directly translated from written text. We further propose a novel keypoint-based loss function, which significantly improves the quality of synthesized hand images, operating in the keypoint space to avoid issues caused by motion blur. In addition, we introduce a method for controllable video generation, enabling training on large, diverse sign language datasets and providing the ability to control the signer appearance at inference. Using a dataset of eight different sign language interpreters extracted from broadcast footage, we show that SignGAN significantly outperforms all baseline methods for quantitative metrics and human perceptual studies.

Everybody Sign Now: Translating Spoken Language to Photo Realistic Sign Language Video
pdf: https://t.co/zvHdFqraZ6
abs: https://t.co/Pxiv0zpsI2 pic.twitter.com/7HCK05f1Wj
— AK (@ak92501) November 20, 2020

Published 21 Nov 2020

ML Lead at Beatrust. (https://beatrust.com)Tatsuya Shirakawa on Twitter