Hot Papers 2020-09-04

1. Flow-edge Guided Video Completion

Chen Gao, Ayush Saraf, Jia-Bin Huang, Johannes Kopf

retweets: 237, favorites: 947 (09/05/2020 10:53:00)
links: abs | pdf
cs.CV

We present a new flow-based video completion algorithm. Previous flow completion methods are often unable to retain the sharpness of motion boundaries. Our method first extracts and completes motion edges, and then uses them to guide piecewise-smooth flow completion with sharp edges. Existing methods propagate colors among local flow connections between adjacent frames. However, not all missing regions in a video can be reached in this way because the motion boundaries form impenetrable barriers. Our method alleviates this problem by introducing non-local flow connections to temporally distant frames, enabling propagating video content over motion boundaries. We validate our approach on the DAVIS dataset. Both visual and quantitative results show that our method compares favorably against the state-of-the-art algorithms.

Super excited to share our work on video completion at #ECCV2020!🤩Our method seamlessly removes objects, watermarks, or expands field-of-view from casually captured videos.

Paper: https://t.co/JwT9bYN85V
Project: https://t.co/7raLq7jVQx

With @gaochen315, Ayush, and @JPKopf pic.twitter.com/eLGjINlbMC
— Jia-Bin Huang (@jbhuang0604) September 4, 2020

Flow-edge Guided Video Completion
pdf: https://t.co/likKCfvwPN
abs: https://t.co/QtQChOReKI
project page: https://t.co/bLjY2v5AAr
webpage: https://t.co/EyvqZYOIHA pic.twitter.com/olZIgluzQN
— AK (@ak92501) September 4, 2020

2. Grounded Language Learning Fast and Slow

Felix Hill, Olivier Tieleman, Tamara von Glehn, Nathaniel Wong, Hamza Merzic, Stephen Clark

retweets: 125, favorites: 589 (09/05/2020 10:53:00)
links: abs | pdf
cs.AI

Recent work has shown that large text-based neural language models, trained with conventional supervised learning objectives, acquire a surprising propensity for few- and one-shot learning. Here, we show that an embodied agent situated in a simulated 3D world, and endowed with a novel dual-coding external memory, can exhibit similar one-shot word learning when trained with conventional reinforcement learning algorithms. After a single introduction to a novel object via continuous visual perception and a language prompt (“This is a dax”), the agent can re-identify the object and manipulate it as instructed (“Put the dax on the bed”). In doing so, it seamlessly integrates short-term, within-episode knowledge of the appropriate referent for the word “dax” with long-term lexical and motor knowledge acquired across episodes (i.e. “bed” and “putting”). We find that, under certain training conditions and with a particular memory writing mechanism, the agent’s one-shot word-object binding generalizes to novel exemplars within the same ShapeNet category, and is effective in settings with unfamiliar numbers of objects. We further show how dual-coding memory can be exploited as a signal for intrinsic motivation, stimulating the agent to seek names for objects that may be useful for later executing instructions. Together, the results demonstrate that deep neural networks can exploit meta-learning, episodic memory and an explicitly multi-modal environment to account for ‘fast-mapping’, a fundamental pillar of human cognitive development and a potentially transformative capacity for agents that interact with human users.

#GPT3 from @OpenAI showed an emergent ability in large neural language models for rapidly acquiring and using new words.

We develop an agent that does this in a simulated 3D environment. https://t.co/VnaDJTJEhM

1/N
— Felix Hill (@FelixHill84) September 4, 2020

https://t.co/qvnHOItZPB
This paper (by @FelixHill84 et al) is really an "It's all coming together" moment for @DeepMind I feel.

Let me try to describe my takeaways from my first readthrough.

1/14
— Connor Leahy (@NPCollapse) September 4, 2020

DeepMind trained a Transformer in a 3D environment to recognize and manipulate specific objects it learns about on the fly. It has an external memory, meaning it can remember new objects with no training.

"Grounded Language Learning Fast and Slow"

arxiv: https://t.co/FydjDhELKY pic.twitter.com/9WQ7sNNSq8
— Shawn Presser (@theshawwn) September 4, 2020

3. A Wholistic View of Continual Learning with Deep Neural Networks: Forgotten Lessons and the Bridge to Active and Open World Learning

Martin Mundt, Yong Won Hong, Iuliia Pliushch, Visvanathan Ramesh

retweets: 75, favorites: 339 (09/05/2020 10:53:00)
links: abs | pdf
cs.LG | stat.ML

Current deep learning research is dominated by benchmark evaluation. A method is regarded as favorable if it empirically performs well on the dedicated test set. This mentality is seamlessly reflected in the resurfacing area of continual learning, where consecutively arriving sets of benchmark data are investigated. The core challenge is framed as protecting previously acquired representations from being catastrophically forgotten due to the iterative parameter updates. However, comparison of individual methods is nevertheless treated in isolation from real world application and typically judged by monitoring accumulated test set performance. The closed world assumption remains predominant. It is assumed that during deployment a model is guaranteed to encounter data that stems from the same distribution as used for training. This poses a massive challenge as neural networks are well known to provide overconfident false predictions on unknown instances and break down in the face of corrupted data. In this work we argue that notable lessons from open set recognition, the identification of statistically deviating data outside of the observed dataset, and the adjacent field of active learning, where data is incrementally queried such that the expected performance gain is maximized, are frequently overlooked in the deep learning era. Based on these forgotten lessons, we propose a consolidated view to bridge continual learning, active learning and open set recognition in deep neural networks. Our results show that this not only benefits each individual paradigm, but highlights the natural synergies in a common framework. We empirically demonstrate improvements when alleviating catastrophic forgetting, querying data in active learning, selecting task orders, while exhibiting robust open world application where previously proposed methods fail.

A Wholistic View of Continual Learning: https://t.co/DnOrnfpAYu

If you're interested in Continual Learning and Active Learning, this paper comes with a neat taxonomy! pic.twitter.com/WJYWoynMvQ
— Denny Britz (@dennybritz) September 4, 2020

Been quiet for a while, so I'm happy to share a progress update.

Our new big paper: "A Wholistic View of Continual Learning with Deep Neural Networks: Forgotten Lessons and the Bridge to Active and Open World Learning" just came online at https://t.co/3JaXmP9e5i 1/3 pic.twitter.com/ySGademKmu
— Martin Mundt (@mundt_martin) September 4, 2020

4. The Sound of Silence: Mining Security Vulnerabilities from Secret Integration Channels in Open-Source Projects

Ralf Ramsauer, Lukas Bulwahn, Daniel Lohmann, Wolfgang Mauerer

retweets: 74, favorites: 137 (09/05/2020 10:53:01)
links: abs | pdf
cs.SE

Public development processes are a key characteristic of open source projects. However, fixes for vulnerabilities are usually discussed privately among a small group of trusted maintainers, and integrated without prior public involvement. This is supposed to prevent early disclosure, and cope with embargo and non-disclosure agreement (NDA) rules. While regular development activities leave publicly available traces, fixes for vulnerabilities that bypass the standard process do not. We present a data-mining based approach to detect code fragments that arise from such infringements of the standard process. By systematically mapping public development artefacts to source code repositories, we can exclude regular process activities, and infer irregularities that stem from non-public integration channels. For the Linux kernel, the most crucial component of many systems, we apply our method to a period of seven months before the release of Linux 5.4. We find 29 commits that address 12 vulnerabilities. For these vulnerabilities, our approach provides a temporal advantage of 2 to 179 days to design exploits before public disclosure takes place, and fixes are rolled out. Established responsible disclosure approaches in open development processes are supposed to limit premature visibility of security vulnerabilities. However, our approach shows that, instead, they open additional possibilities to uncover such changes that thwart the very premise. We conclude by discussing implications and partial countermeasures.

Formalizing the methods we've used for years to spot silently-fixed Linux kernel vulnerabilities: https://t.co/VGZBA1Ijp9
— grsecurity (@grsecurity) September 4, 2020

"The Sound of Silence: Mining Security Vulnerabilities from
Secret Integration Channels in Open-Source Projects"https://t.co/GBOItqVUM9 pic.twitter.com/ZkomiPsZRF
— John Regehr (@johnregehr) September 4, 2020

5. Learning to summarize from human feedback

Nisan Stiennon, Long Ouyang, Jeff Wu, Daniel M. Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei, Paul Christiano

retweets: 9, favorites: 69 (09/05/2020 10:53:01)
links: abs | pdf
cs.CL | cs.AI | cs.LG

As language models become more powerful, training and evaluation are increasingly bottlenecked by the data and metrics used for a particular task. For example, summarization models are often trained to predict human reference summaries and evaluated using ROUGE, but both of these metrics are rough proxies for what we really care about---summary quality. In this work, we show that it is possible to significantly improve summary quality by training a model to optimize for human preferences. We collect a large, high-quality dataset of human comparisons between summaries, train a model to predict the human-preferred summary, and use that model as a reward function to fine-tune a summarization policy using reinforcement learning. We apply our method to a version of the TL;DR dataset of Reddit posts and find that our models significantly outperform both human reference summaries and much larger models fine-tuned with supervised learning alone. Our models also transfer to CNN/DM news articles, producing summaries nearly as good as the human reference without any news-specific fine-tuning. We conduct extensive analyses to understand our human feedback dataset and fine-tuned models. We establish that our reward model generalizes to new datasets, and that optimizing our reward model results in better summaries than optimizing ROUGE according to humans. We hope the evidence from our paper motivates machine learning researchers to pay closer attention to how their training loss affects the model behavior they actually want.

Learning to summarize from human feedback
pdf: https://t.co/eYEQNpKEX5
abs: https://t.co/myE8Gs41hZ
github: https://t.co/62qVBL4YiC pic.twitter.com/BH2iyNQSqv
— AK (@ak92501) September 4, 2020

Learning to Summarize From Human Feedback

Achieves super human-level summarization on TL;DR dataset by training a reward function on human feedback and fine-tuning a generator with PPO. https://t.co/S5DxLB79OS pic.twitter.com/dIWJxFbm0R
— Aran Komatsuzaki (@arankomatsuzaki) September 4, 2020

6. HiFiSinger: Towards High-Fidelity Neural Singing Voice Synthesis

Jiawei Chen, Xu Tan, Jian Luan, Tao Qin, Tie-Yan Liu

retweets: 8, favorites: 51 (09/05/2020 10:53:01)
links: abs | pdf
eess.AS | cs.CL | cs.LG | cs.SD

High-fidelity singing voices usually require higher sampling rate (e.g., 48kHz) to convey expression and emotion. However, higher sampling rate causes the wider frequency band and longer waveform sequences and throws challenges for singing voice synthesis (SVS) in both frequency and time domains. Conventional SVS systems that adopt small sampling rate cannot well address the above challenges. In this paper, we develop HiFiSinger, an SVS system towards high-fidelity singing voice. HiFiSinger consists of a FastSpeech based acoustic model and a Parallel WaveGAN based vocoder to ensure fast training and inference and also high voice quality. To tackle the difficulty of singing modeling caused by high sampling rate (wider frequency band and longer waveform), we introduce multi-scale adversarial training in both the acoustic model and vocoder to improve singing modeling. Specifically, 1) To handle the larger range of frequencies caused by higher sampling rate, we propose a novel sub-frequency GAN (SF-GAN) on mel-spectrogram generation, which splits the full 80-dimensional mel-frequency into multiple sub-bands and models each sub-band with a separate discriminator. 2) To model longer waveform sequences caused by higher sampling rate, we propose a multi-length GAN (ML-GAN) for waveform generation to model different lengths of waveform sequences with separate discriminators. 3) We also introduce several additional designs and findings in HiFiSinger that are crucial for high-fidelity voices, such as adding F0 (pitch) and V/UV (voiced/unvoiced flag) as acoustic features, choosing an appropriate window/hop size for mel-spectrogram, and increasing the receptive field in vocoder for long vowel modeling. Experiment results show that HiFiSinger synthesizes high-fidelity singing voices with much higher quality: 0.32/0.44 MOS gain over 48kHz/24kHz baseline and 0.83 MOS gain over previous SVS systems.

HiFiSinger: Towards High-Fidelity Neural Singing Voice Synthesis
pdf: https://t.co/NVk1eKLElL
abs: https://t.co/rcY5pnUl9J
samples: https://t.co/uBwTY23SH4 pic.twitter.com/GhsiFe2HwW
— AK (@ak92501) September 4, 2020

Published 5 Sep 2020

ML Lead at Beatrust. (https://beatrust.com)Tatsuya Shirakawa on Twitter