Hot Papers 2020-08-04

1. Multiple Descent: Design Your Own Generalization Curve

Lin Chen, Yifei Min, Mikhail Belkin, Amin Karbasi

retweets: 35, favorites: 222 (08/05/2020 09:11:32)
links: abs | pdf
cs.LG | math.ST | stat.ML

This paper explores the generalization loss of linear regression in variably parameterized families of models, both under-parameterized and over-parameterized. We show that the generalization curve can have an arbitrary number of peaks, and moreover, locations of those peaks can be explicitly controlled. Our results highlight the fact that both classical U-shaped generalization curve and the recently observed double descent curve are not intrinsic properties of the model family. Instead, their emergence is due to the interaction between the properties of the data and the inductive biases of learning algorithms.

Can the existence of a multiple descent generalization curve be rigorously proven? Can an arbitrary number of descents occur? Can the generalization curve and the locations of descents be designed? We answer yes to all three of these questions. https://t.co/ArXzWjtj7N pic.twitter.com/yxHrkV8wiW
— Amin Karbasi (@aminkarbasi) August 4, 2020

Can the existence of a multiple descent generalization curve be rigorously proven? Can an arbitrary number of descents occur? Can the generalization curve and the locations of descents be designed? We answer yes to all three of these questions. https://t.co/SFTm1jJnjH pic.twitter.com/B4SzzZqWmm
— Lin Chen (@linchen_lc) August 4, 2020

2. Memory-augmented Dense Predictive Coding for Video Representation Learning

Tengda Han, Weidi Xie, Andrew Zisserman

retweets: 42, favorites: 137 (08/05/2020 09:11:34)
links: abs | pdf
cs.CV

The objective of this paper is self-supervised learning from video, in particular for representations for action recognition. We make the following contributions: (i) We propose a new architecture and learning framework Memory-augmented Dense Predictive Coding (MemDPC) for the task. It is trained with a predictive attention mechanism over the set of compressed memories, such that any future states can always be constructed by a convex combination of the condense representations, allowing to make multiple hypotheses efficiently. (ii) We investigate visual-only self-supervised video representation learning from RGB frames, or from unsupervised optical flow, or both. (iii) We thoroughly evaluate the quality of learnt representation on four different downstream tasks: action recognition, video retrieval, learning with scarce annotations, and unintentional action classification. In all cases, we demonstrate state-of-the-art or comparable performance over other approaches with orders of magnitude fewer training data.

Self-supervised learning by predicting the future! Check out our ECCV spotlight paper "Memory-augmented Dense Predictive Coding for Video Representation Learning" with @WeidiXie and Andrew Zisserman at @Oxford_VGG.
Arxiv: https://t.co/Dm92q5Aozf
Code: https://t.co/FufKTZIzMM pic.twitter.com/iTwEGTyOua
— Tengda Han (@TengdaHan) August 4, 2020

Check out our #ECCV2020 spotlight work "MemDPC"!
We learn video representation by predicting the future with an external memory module in a self-supervised way.
by @TengdaHan, @WeidiXie and Andrew Zisserman

Arxiv: https://t.co/cO4BJVOlhC
Code: https://t.co/ZaCUc7Wvms pic.twitter.com/scxZ9SUQMT
— Visual Geometry Group (VGG) (@Oxford_VGG) August 4, 2020

3. Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation

Elad Richardson, Yuval Alaluf, Or Patashnik, Yotam Nitzan, Yaniv Azar, Stav Shapiro, Daniel Cohen-Or

retweets: 25, favorites: 111 (08/05/2020 09:11:34)
links: abs | pdf
cs.CV

We present a generic image-to-image translation framework, Pixel2Style2Pixel (pSp). Our pSp framework is based on a novel encoder network that directly generates a series of style vectors which are fed into a pretrained StyleGAN generator, forming the extended W+ latent space. We first show that our encoder can directly embed real images into W+, with no additional optimization. We further introduce a dedicated identity loss which is shown to achieve improved performance in the reconstruction of an input image. We demonstrate pSp to be a simple architecture that, by leveraging a well-trained, fixed generator network, can be easily applied on a wide-range of image-to-image translation tasks. Solving these tasks through the style representation results in a global approach that does not rely on a local pixel-to-pixel correspondence and further supports multi-modal synthesis via the resampling of styles. Notably, we demonstrate that pSp can be trained to align a face image to a frontal pose without any labeled data, generate multi-modal results for ambiguous tasks such as conditional face generation from segmentation maps, and construct high-resolution images from corresponding low-resolution images.

Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation
pdf: https://t.co/pAGNSvFSf9
abs: https://t.co/DpF5yJKz8d pic.twitter.com/OXCVyRWg8r
— AK (@ak92501) August 4, 2020

4. Bringing UMAP Closer to the Speed of Light with GPU Acceleration

Corey J. Nolet, Victor Lafargue, Edward Raff, Thejaswi Nanditale, Tim Oates, John Zedlewski, Joshua Patterson

retweets: 28, favorites: 104 (08/05/2020 09:11:34)
links: abs | pdf
cs.LG | cs.DS | stat.ML

The Uniform Manifold Approximation and Projection (UMAP) algorithm has become widely popular for its ease of use, quality of results, and support for exploratory, unsupervised, supervised, and semi-supervised learning. While many algorithms can be ported to a GPU in a simple and direct fashion, such efforts have resulted in inefficent and inaccurate versions of UMAP. We show a number of techniques that can be used to make a faster and more faithful GPU version of UMAP, and obtain speedups of up to 100x in practice. Many of these design choices/lessons are general purpose and may inform the conversion of other graph and manifold learning algorithms to use GPUs. Our implementation has been made publicly available as part of the open source RAPIDS cuML library(https://github.com/rapidsai/cuml).

"Bringing UMAP Closer to the Speed of Light with GPU Acceleration" -- as if @leland_mcinnes 's uniform manifold approximation& projection algorithm wasn't already fast enough, @cjnolet @datametrician et al now have a cuML version offering a 100x speedup :) https://t.co/KMwUOxrb41
— Sebastian Raschka (@rasbt) August 4, 2020

5. DeLighT: Very Deep and Light-weight Transformer

Sachin Mehta, Marjan Ghazvininejad, Srinivasan Iyer, Luke Zettlemoyer, Hannaneh Hajishirzi

retweets: 20, favorites: 81 (08/05/2020 09:11:34)
links: abs | pdf
cs.LG | cs.CL

We introduce a very deep and light-weight transformer, DeLighT, that delivers similar or better performance than transformer-based models with significantly fewer parameters. DeLighT more efficiently allocates parameters both (1) within each Transformer block using DExTra, a deep and light-weight transformation and (2) across blocks using block-wise scaling, that allows for shallower and narrower DeLighT blocks near the input and wider and deeper DeLighT blocks near the output. Overall, DeLighT networks are 2.5 to 4 times deeper than standard transformer models and yet have fewer parameters and operations. Experiments on machine translation and language modeling tasks show that DeLighT matches the performance of baseline Transformers with significantly fewer parameters. On the WMT’14 En-Fr high resource dataset, DeLighT requires 1.8 times fewer parameters and 2 times fewer operations and achieves better performance (+0.4 BLEU score) than baseline transformers. On the WMT’16 En-Ro low resource dataset, DeLighT delivers similar performance with 2.8 times fewer parameters than baseline transformers.

DeLighT: Very Deep and Light-weight Transformer
pdf: https://t.co/3BGksd53Bs
abs: https://t.co/QgKDzHmYy9
github: https://t.co/nZzapo7NCF pic.twitter.com/h0qsg58MRU
— AK (@ak92501) August 4, 2020

6. The Amazing Power of Randomness: NP=RP

András Faragó

retweets: 18, favorites: 67 (08/05/2020 09:11:35)
links: abs | pdf
cs.CC | math.CO | math.PR

We (claim to) prove the extremely surprising fact that NP=RP. It is achieved by creating a Fully Polynomial-Time Randomized Approximation Scheme (FPRAS) for approximately counting the number of independent sets in bounded degree graphs, with any fixed degree bound, which is known to imply NP=RP. While our method is rooted in the well known Markov Chain Monte Carlo (MCMC) approach, we overcome the notorious problem of slow mixing by a new idea for generating a random sample from among the independent sets. A key tool that enables the result is a solution to a novel sampling task that we call Subset Sampling. In its basic form, a stationary sample is given from the (exponentially large) state space of a Markov chain, as input, and we want to transform it into another stationary sample that is conditioned on falling into a given subset, which is still exponentially large. In general, Subset Sampling can be both harder and easier than stationary sampling from a Markov chain. It can be harder, due to the conditioning on a subset, which may have more complex structure than the original state space. But it may also be easier, since a stationary sample is already given, which, in a sense, already encompasses “most of the hardness” of such sampling tasks, being already in the stationary distribution, which is hard to reach in a slowly mixing chain. We show that it is possible to efficiently balance the two sides: we can capitalize on already having a stationary sample from the original space, so that the complexity of confining it to a subset is mitigated. We prove that an efficient approximation is possible for the considered sampling task, and then it is applied recursively to create the FPRAS.

Claimed proof of NP = RPhttps://t.co/iXXiFCD4nE
— Computer Science (@CompSciFact) August 4, 2020

i'm sorry, whathttps://t.co/5pBRpBaOl2
— etaoin shrdlu (@Theophite) August 4, 2020

7. The Rate-Distortion-Accuracy Tradeoff: JPEG Case Study

Xiyang Luo, Hossein Talebi, Feng Yang, Michael Elad, Peyman Milanfar

retweets: 12, favorites: 50 (08/05/2020 09:11:35)
links: abs | pdf
eess.IV | cs.CV

Handling digital images is almost always accompanied by a lossy compression in order to facilitate efficient transmission and storage. This introduces an unavoidable tension between the allocated bit-budget (rate) and the faithfulness of the resulting image to the original one (distortion). An additional complicating consideration is the effect of the compression on recognition performance by given classifiers (accuracy). This work aims to explore this rate-distortion-accuracy tradeoff. As a case study, we focus on the design of the quantization tables in the JPEG compression standard. We offer a novel optimal tuning of these tables via continuous optimization, leveraging a differential implementation of both the JPEG encoder-decoder and an entropy estimator. This enables us to offer a unified framework that considers the interplay between rate, distortion and classification accuracy. In all these fronts, we report a substantial boost in performance by a simple and easily implemented modification of these tables.

We study a compression framework that considers the interplay between rate, distortion and classification accuracy. Optimizing the quantization tables in JPEG yields a nice boost in performance using an easily-implemented modification of these tables. https://t.co/FiEUkQQ4uS pic.twitter.com/FM40LGemDK
— Peyman Milanfar (@docmilanfar) August 4, 2020

Xiaoming Li, Chaofeng Chen, Shangchen Zhou, Xianhui Lin, Wangmeng Zuo, Lei Zhang

retweets: 10, favorites: 45 (08/05/2020 09:11:35)
links: abs | pdf
cs.CV

Recent reference-based face restoration methods have received considerable attention due to their great capability in recovering high-frequency details on real low-quality images. However, most of these methods require a high-quality reference image of the same identity, making them only applicable in limited scenes. To address this issue, this paper suggests a deep face dictionary network (termed as DFDNet) to guide the restoration process of degraded observations. To begin with, we use K-means to generate deep dictionaries for perceptually significant face components (\ie, left/right eyes, nose and mouth) from high-quality images. Next, with the degraded input, we match and select the most similar component features from their corresponding dictionaries and transfer the high-quality details to the input via the proposed dictionary feature transfer (DFT) block. In particular, component AdaIN is leveraged to eliminate the style diversity between the input and dictionary features (\eg, illumination), and a confidence score is proposed to adaptively fuse the dictionary feature to the input. Finally, multi-scale dictionaries are adopted in a progressive manner to enable the coarse-to-fine restoration. Experiments show that our proposed method can achieve plausible performance in both quantitative and qualitative evaluation, and more importantly, can generate realistic and promising results on real degraded images without requiring an identity-belonging reference. The source code and models are available at \url{https://github.com/csxmli2016/DFDNet}.

Blind Face Restoration via Deep Multi-scale Component Dictionaries
pdf: https://t.co/h8CCsPt2TT
abs: https://t.co/kvj2etTlCn
github: https://t.co/lHJkj04my2 pic.twitter.com/q1YoE063Uv
— AK (@ak92501) August 4, 2020

👨 DFDNet (opensource analogue of Remini) 👩

Blind Face Restoration via Deep Multi-scale Component Dictionaries

📰 Article: https://t.co/z3QNqGrdcf
😺 GitHub: https://t.co/ZMcptoRJcM pic.twitter.com/VCqRdpHhSI
— Bomze (@tg_bomze) August 4, 2020

9. From Design Draft to Real Attire: Unaligned Fashion Image Translation

Yu Han, Shuai Yang, Wenjing Wang, Jiaying Liu

retweets: 11, favorites: 41 (08/05/2020 09:11:36)
links: abs | pdf
cs.CV | cs.MM

Fashion manipulation has attracted growing interest due to its great application value, which inspires many researches towards fashion images. However, little attention has been paid to fashion design draft. In this paper, we study a new unaligned translation problem between design drafts and real fashion items, whose main challenge lies in the huge misalignment between the two modalities. We first collect paired design drafts and real fashion item images without pixel-wise alignment. To solve the misalignment problem, our main idea is to train a sampling network to adaptively adjust the input to an intermediate state with structure alignment to the output. Moreover, built upon the sampling network, we present design draft to real fashion item translation network (D2RNet), where two separate translation streams that focus on texture and shape, respectively, are combined tactfully to get both benefits. D2RNet is able to generate realistic garments with both texture and shape consistency to their design drafts. We show that this idea can be effectively applied to the reverse translation problem and present R2DNet accordingly. Extensive experiments on unaligned fashion design translation demonstrate the superiority of our method over state-of-the-art methods. Our project website is available at: https://victoriahy.github.io/MM2020/ .

From Design Draft to Real Attire: Unaligned Fashion Image Translation
pdf: https://t.co/4zON2vlQK5
abs: https://t.co/fH537dVbCy
project page: https://t.co/AaAZgeNA6t pic.twitter.com/Rj0sPvz1sr
— AK (@ak92501) August 4, 2020

10. One Model, Many Languages: Meta-learning for Multilingual Text-to-Speech

Tomáš Nekvinda, Ondřej Dušek

retweets: 10, favorites: 40 (08/05/2020 09:11:36)
links: abs | pdf
eess.AS | cs.CL | cs.LG

We introduce an approach to multilingual speech synthesis which uses the meta-learning concept of contextual parameter generation and produces natural-sounding multilingual speech using more languages and less training data than previous approaches. Our model is based on Tacotron 2 with a fully convolutional input text encoder whose weights are predicted by a separate parameter generator network. To boost voice cloning, the model uses an adversarial speaker classifier with a gradient reversal layer that removes speaker-specific information from the encoder. We arranged two experiments to compare our model with baselines using various levels of cross-lingual parameter sharing, in order to evaluate: (1) stability and performance when training on low amounts of data, (2) pronunciation accuracy and voice quality of code-switching synthesis. For training, we used the CSS10 dataset and our new small dataset based on Common Voice recordings in five languages. Our model is shown to effectively share information across languages and according to a subjective evaluation test, it produces more natural and accurate code-switching speech than the baselines.

One Model, Many Languages: Meta-learning for Multilingual Text-to-Speech
pdf: https://t.co/REpffdfFJk
abs: https://t.co/4Rtqpr5n0n
github: https://t.co/3TbNGpIzaS
colab: https://t.co/U6C62JIVLZ
samples: https://t.co/M1jIKMqfWb pic.twitter.com/bFISfBgFtq
— AK (@ak92501) August 4, 2020

Published 5 Aug 2020

ML Lead at Beatrust. (https://beatrust.com)Tatsuya Shirakawa on Twitter