All Articles

Hot Papers 2020-10-26

1. A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios

Michael A. Hedderich, Lukas Lange, Heike Adel, Jannik Strötgen, Dietrich Klakow

  • retweets: 198, favorites: 82 (10/27/2020 09:10:20)
  • links: abs | pdf
  • cs.CL | cs.LG

Current developments in natural language processing offer challenges and opportunities for low-resource languages and domains. Deep neural networks are known for requiring large amounts of training data which might not be available in resource-lean scenarios. However, there is also a growing body of works to improve the performance in low-resource settings. Motivated by fundamental changes towards neural models and the currently popular pre-train and fine-tune paradigm, we give an overview of promising approaches for low-resource natural language processing. After a discussion about the definition of low-resource scenarios and the different dimensions of data availability, we then examine methods that enable learning when training data is sparse. This includes mechanisms to create additional labeled data like data augmentation and distant supervision as well as transfer learning settings that reduce the need for target supervision. The survey closes with a brief look into methods suggested in non-NLP machine learning communities, which might be beneficial for NLP in low-resource scenarios

2. Self-Learning Transformations for Improving Gaze and Head Redirection

Yufeng Zheng, Seonwook Park, Xucong Zhang, Shalini De Mello, Otmar Hilliges

  • retweets: 144, favorites: 66 (10/27/2020 09:10:20)
  • links: abs | pdf
  • cs.CV

Many computer vision tasks rely on labeled data. Rapid progress in generative modeling has led to the ability to synthesize photorealistic images. However, controlling specific aspects of the generation process such that the data can be used for supervision of downstream tasks remains challenging. In this paper we propose a novel generative model for images of faces, that is capable of producing high-quality images under fine-grained control over eye gaze and head orientation angles. This requires the disentangling of many appearance related factors including gaze and head orientation but also lighting, hue etc. We propose a novel architecture which learns to discover, disentangle and encode these extraneous variations in a self-learned manner. We further show that explicitly disentangling task-irrelevant factors results in more accurate modelling of gaze and head orientation. A novel evaluation scheme shows that our method improves upon the state-of-the-art in redirection accuracy and disentanglement between gaze direction and head orientation changes. Furthermore, we show that in the presence of limited amounts of real-world training data, our method allows for improvements in the downstream task of semi-supervised cross-dataset gaze estimation. Please check our project page at: https://ait.ethz.ch/projects/2020/STED-gaze/

3. Lightweight Generative Adversarial Networks for Text-Guided Image Manipulation

Bowen Li, Xiaojuan Qi, Philip H. S. Torr, Thomas Lukasiewicz

We propose a novel lightweight generative adversarial network for efficient image manipulation using natural language descriptions. To achieve this, a new word-level discriminator is proposed, which provides the generator with fine-grained training feedback at word-level, to facilitate training a lightweight generator that has a small number of parameters, but can still correctly focus on specific visual attributes of an image, and then edit them without affecting other contents that are not described in the text. Furthermore, thanks to the explicit training signal related to each word, the discriminator can also be simplified to have a lightweight structure. Compared with the state of the art, our method has a much smaller number of parameters, but still achieves a competitive manipulation performance. Extensive experimental results demonstrate that our method can better disentangle different visual attributes, then correctly map them to corresponding semantic words, and thus achieve a more accurate image modification using natural language descriptions.

4. Dynamics and Domain Randomized Gait Modulation with Bezier Curves for Sim-to-Real Legged Locomotion

Maurice Rahme, Ian Abraham, Matthew L. Elwin, Todd D. Murphey

  • retweets: 42, favorites: 63 (10/27/2020 09:10:21)
  • links: abs | pdf
  • cs.RO

We present a sim-to-real framework that uses dynamics and domain randomized offline reinforcement learning to enhance open-loop gaits for legged robots, allowing them to traverse uneven terrain without sensing foot impacts. Our approach, D2^2-Randomized Gait Modulation with Bezier Curves (D2^2-GMBC), uses augmented random search with randomized dynamics and terrain to train, in simulation, a policy that modifies the parameters and output of an open-loop Bezier curve gait generator for quadrupedal robots. The policy, using only inertial measurements, enables the robot to traverse unknown rough terrain, even when the robot’s physical parameters do not match the open-loop model. We compare the resulting policy to hand-tuned Bezier Curve gaits and to policies trained without randomization, both in simulation and on a real quadrupedal robot. With D2^2-GMBC, across a variety of experiments on unobserved and unknown uneven terrain, the robot walks significantly farther than with either hand-tuned gaits or gaits learned without domain randomization. Additionally, using D2^2-GMBC, the robot can walk laterally and rotate while on the rough terrain, even though it was trained only for forward walking.

5. Factor Graph Grammars

David Chiang, Darcey Riley

  • retweets: 56, favorites: 45 (10/27/2020 09:10:21)
  • links: abs | pdf
  • cs.LG

We propose the use of hyperedge replacement graph grammars for factor graphs, or factor graph grammars (FGGs) for short. FGGs generate sets of factor graphs and can describe a more general class of models than plate notation, dynamic graphical models, case-factor diagrams, and sum-product networks can. Moreover, inference can be done on FGGs without enumerating all the generated factor graphs. For finite variable domains (but possibly infinite sets of graphs), a generalization of variable elimination to FGGs allows exact and tractable inference in many situations. For finite sets of graphs (but possibly infinite variable domains), a FGG can be converted to a single factor graph amenable to standard inference techniques.

6. Noise2Same: Optimizing A Self-Supervised Bound for Image Denoising

Yaochen Xie, Zhengyang Wang, Shuiwang Ji

  • retweets: 42, favorites: 45 (10/27/2020 09:10:21)
  • links: abs | pdf
  • cs.CV | cs.LG

Self-supervised frameworks that learn denoising models with merely individual noisy images have shown strong capability and promising performance in various image denoising tasks. Existing self-supervised denoising frameworks are mostly built upon the same theoretical foundation, where the denoising models are required to be J-invariant. However, our analyses indicate that the current theory and the J-invariance may lead to denoising models with reduced performance. In this work, we introduce Noise2Same, a novel self-supervised denoising framework. In Noise2Same, a new self-supervised loss is proposed by deriving a self-supervised upper bound of the typical supervised loss. In particular, Noise2Same requires neither J-invariance nor extra information about the noise model and can be used in a wider range of denoising applications. We analyze our proposed Noise2Same both theoretically and experimentally. The experimental results show that our Noise2Same remarkably outperforms previous self-supervised denoising methods in terms of denoising performance and training efficiency. Our code is available at https://github.com/divelab/Noise2Same.

7. The nanoPU: Redesigning the CPU-Network Interface to Minimize RPC Tail Latency

Stephen Ibanez, Alex Mallery, Serhat Arslan, Theo Jepsen, Muhammad Shahbaz, Nick McKeown, Changhoon Kim

  • retweets: 42, favorites: 44 (10/27/2020 09:10:21)
  • links: abs | pdf
  • cs.AR | cs.NI

The nanoPU is a new networking-optimized CPU designed to minimize tail latency for RPCs. By bypassing the cache and memory hierarchy, the nanoPU directly places arriving messages into the CPU register file. The wire-to-wire latency through the application is just 65ns, about 13x faster than the current state-of-the-art. The nanoPU moves key functions from software to hardware: reliable network transport, congestion control, core selection, and thread scheduling. It also supports a unique feature to bound the tail latency experienced by high-priority applications. Our prototype nanoPU is based on a modified RISC-V CPU; we evaluate its performance using cycle-accurate simulations of 324 cores on AWS FPGAs, including real applications (MICA and chain replication).

8. Toward Expressive Singing Voice Correction: On Perceptual Validity of Evaluation Metrics for Vocal Melody Extraction

Yin-Jyun Luo, Yuen-Jen Lin, Li Su

Singing voice correction (SVC) is an appealing application for amateur singers. Commercial products automate SVC by snapping pitch contours to equal-tempered scales, which could lead to deadpan modifications. Together with the neglect of rhythmic errors, extensive manual corrections are still necessary. In this paper, we present a streamlined system to automate expressive SVC for both pitch and rhythmic errors. Particularly, we extend a previous work by integrating advanced techniques for singing voice separation (SVS) and vocal melody extraction. SVC is achieved by temporally aligning the source-target pair, followed by replacing pitch and rhythm of the source with those of the target. We evaluate the framework by a comparative study for melody extraction which involves both subjective and objective evaluations, whereby we investigate perceptual validity of the standard metrics through the lens of SVC. The results suggest that the high pitch accuracy obtained by the metrics does not signify good perceptual scores.

9. Language Models are Open Knowledge Graphs

Chenguang Wang, Xiao Liu, Dawn Song

This paper shows how to construct knowledge graphs (KGs) from pre-trained language models (e.g., BERT, GPT-2/3), without human supervision. Popular KGs (e.g, Wikidata, NELL) are built in either a supervised or semi-supervised manner, requiring humans to create knowledge. Recent deep language models automatically acquire knowledge from large-scale corpora via pre-training. The stored knowledge has enabled the language models to improve downstream NLP tasks, e.g., answering questions, and writing code and articles. In this paper, we propose an unsupervised method to cast the knowledge contained within language models into KGs. We show that KGs are constructed with a single forward pass of the pre-trained language models (without fine-tuning) over the corpora. We demonstrate the quality of the constructed KGs by comparing to two KGs (Wikidata, TAC KBP) created by humans. Our KGs also provide open factual knowledge that is new in the existing KGs. Our code and KGs will be made publicly available.

10. Reinforcement Learning with Combinatorial Actions: An Application to Vehicle Routing

Arthur Delarue, Ross Anderson, Christian Tjandraatmadja

Value-function-based methods have long played an important role in reinforcement learning. However, finding the best next action given a value function of arbitrary complexity is nontrivial when the action space is too large for enumeration. We develop a framework for value-function-based deep reinforcement learning with a combinatorial action space, in which the action selection problem is explicitly formulated as a mixed-integer optimization problem. As a motivating example, we present an application of this framework to the capacitated vehicle routing problem (CVRP), a combinatorial optimization problem in which a set of locations must be covered by a single vehicle with limited capacity. On each instance, we model an action as the construction of a single route, and consider a deterministic policy which is improved through a simple policy iteration algorithm. Our approach is competitive with other reinforcement learning methods and achieves an average gap of 1.7% with state-of-the-art OR methods on standard library instances of medium size.

11. Show and Speak: Directly Synthesize Spoken Description of Images

Xinsheng Wang, Siyuan Feng, Jihua Zhu, Mark Hasegawa-Johnson, Odette Scharenborg

  • retweets: 26, favorites: 27 (10/27/2020 09:10:22)
  • links: abs | pdf
  • cs.CV | cs.CL

This paper proposes a new model, referred to as the show and speak (SAS) model that, for the first time, is able to directly synthesize spoken descriptions of images, bypassing the need for any text or phonemes. The basic structure of SAS is an encoder-decoder architecture that takes an image as input and predicts the spectrogram of speech that describes this image. The final speech audio is obtained from the predicted spectrogram via WaveNet. Extensive experiments on the public benchmark database Flickr8k demonstrate that the proposed SAS is able to synthesize natural spoken descriptions for images, indicating that synthesizing spoken descriptions for images while bypassing text and phonemes is feasible.

12. Sequence-to-sequence Singing Voice Synthesis with Perceptual Entropy Loss

Jiatong Shi, Shuai Guo, Nan Huo, Yuekai Zhang, Qin Jin

The neural network (NN) based singing voice synthesis (SVS) systems require sufficient data to train well. However, due to high data acquisition and annotation cost, we often encounter data limitation problem in building SVS systems. The NN based models are prone to over-fitting due to data scarcity. In this work, we propose a Perceptual Entropy (PE) loss derived from a psycho-acoustic hearing model to regularize the network. With a one-hour open-source singing voice database, we explore the impact of the PE loss on various mainstream sequence-to-sequence models, including the RNN-based model, transformer-based model, and conformer-based model. Our experiments show that the PE loss can mitigate the over-fitting problem and significantly improve the synthesized singing quality reflected in objective and subjective evaluations. Furthermore, incorporating the PE loss in model training is shown to help the F0-contour and high-frequency-band spectrum prediction.

13. Adaptive Gradient Quantization for Data-Parallel SGD

Fartash Faghri, Iman Tabrizian, Ilia Markov, Dan Alistarh, Daniel Roy, Ali Ramezani-Kebrya

Many communication-efficient variants of SGD use gradient quantization schemes. These schemes are often heuristic and fixed over the course of training. We empirically observe that the statistics of gradients of deep models change during the training. Motivated by this observation, we introduce two adaptive quantization schemes, ALQ and AMQ. In both schemes, processors update their compression schemes in parallel by efficiently computing sufficient statistics of a parametric distribution. We improve the validation accuracy by almost 2% on CIFAR-10 and 1% on ImageNet in challenging low-cost communication setups. Our adaptive methods are also significantly more robust to the choice of hyperparameters.