All Articles

Hot Papers 2020-08-13

1. Sampling using SU(N)SU(N) gauge equivariant flows

Denis Boyda, Gurtej Kanwar, Sébastien Racanière, Danilo Jimenez Rezende, Michael S. Albergo, Kyle Cranmer, Daniel C. Hackett, Phiala E. Shanahan

We develop a flow-based sampling algorithm for SU(N)SU(N) lattice gauge theories that is gauge-invariant by construction. Our key contribution is constructing a class of flows on an SU(N)SU(N) variable (or on a U(N)U(N) variable by a simple alternative) that respect matrix conjugation symmetry. We apply this technique to sample distributions of single SU(N)SU(N) variables and to construct flow-based samplers for SU(2)SU(2) and SU(3)SU(3) lattice gauge theory in two dimensions.

2. Audio- and Gaze-driven Facial Animation of Codec Avatars

Alexander Richard, Colin Lea, Shugao Ma, Juergen Gall, Fernando de la Torre, Yaser Sheikh

  • retweets: 31, favorites: 118 (08/14/2020 13:07:36)
  • links: abs | pdf
  • cs.CV

Codec Avatars are a recent class of learned, photorealistic face models that accurately represent the geometry and texture of a person in 3D (i.e., for virtual reality), and are almost indistinguishable from video. In this paper we describe the first approach to animate these parametric models in real-time which could be deployed on commodity virtual reality hardware using audio and/or eye tracking. Our goal is to display expressive conversations between individuals that exhibit important social signals such as laughter and excitement solely from latent cues in our lossy input signals. To this end we collected over 5 hours of high frame rate 3D face scans across three participants including traditional neutral speech as well as expressive and conversational speech. We investigate a multimodal fusion approach that dynamically identifies which sensor encoding should animate which parts of the face at any time. See the supplemental video which demonstrates our ability to generate full face motion far beyond the typically neutral lip articulations seen in competing work: https://research.fb.com/videos/audio-and-gaze-driven-facial-animation-of-codec-avatars/

3. Hypergraph reconstruction from network data

Jean-Gabriel Young, Giovanni Petri, Tiago P. Peixoto

Networks can describe the structure of a wide variety of complex systems by specifying how pairs of nodes interact. This choice of representation is flexible, but not necessarily appropriate when joint interactions between groups of nodes are needed to explain empirical phenomena. Networks remain the de facto standard, however, as relational datasets often fail to record higher-order interactions. To address this gap, we here introduce a Bayesian approach to reconstruct the higher-order interactions from pairwise network data. Our method is based on the principle of parsimony and does not reconstruct higher-order structures when there is scant statistical evidence. We demonstrate that our approach successfully uncovers higher-order interactions in synthetic and empirical network data.

4. Learning to Caricature via Semantic Shape Transform

Wenqing Chu, Wei-Chih Hung, Yi-Hsuan Tsai, Yu-Ting Chang, Yijun Li, Deng Cai, Ming-Hsuan Yang

  • retweets: 16, favorites: 74 (08/14/2020 13:07:37)
  • links: abs | pdf
  • cs.CV

Caricature is an artistic drawing created to abstract or exaggerate facial features of a person. Rendering visually pleasing caricatures is a difficult task that requires professional skills, and thus it is of great interest to design a method to automatically generate such drawings. To deal with large shape changes, we propose an algorithm based on a semantic shape transform to produce diverse and plausible shape exaggerations. Specifically, we predict pixel-wise semantic correspondences and perform image warping on the input photo to achieve dense shape transformation. We show that the proposed framework is able to render visually pleasing shape exaggerations while maintaining their facial structures. In addition, our model allows users to manipulate the shape via the semantic map. We demonstrate the effectiveness of our approach on a large photograph-caricature benchmark dataset with comparisons to the state-of-the-art methods.

5. Short Shor-style syndrome sequences

Nicolas Delfosse, Ben W. Reichardt

We optimize fault-tolerant quantum error correction to reduce the number of syndrome bit measurements. Speeding up error correction will also speed up an encoded quantum computation, and should reduce its effective error rate. We give both code-specific and general methods, using a variety of techniques and in a variety of settings. We design new quantum error-correcting codes specifically for efficient error correction, e.g., allowing single-shot error correction. For codes with multiple logical qubits, we give methods for combining error correction with partial logical measurements. There are tradeoffs in choosing a code and error-correction technique. While to date most work has concentrated on optimizing the syndrome-extraction procedure, we show that there are also substantial benefits to optimizing how the measured syndromes are chosen and used. As an example, we design single-shot measurement sequences for fault-tolerant quantum error correction with the 16-qubit extended Hamming code. Our scheme uses 10 syndrome bit measurements, compared to 40 measurements with the Shor scheme. We design single-shot logical measurements as well: any logical Z measurement can be made together with fault-tolerant error correction using only 11 measurements. For comparison, using the Shor scheme a basic implementation of such a non-destructive logical measurement uses 63 measurements. We also offer ten open problems, the solutions of which could lead to substantial improvements of fault-tolerant error correction.

6. Speaker Conditional WaveRNN: Towards Universal Neural Vocoder for Unseen Speaker and Recording Conditions

Dipjyoti Paul, Yannis Pantazis, Yannis Stylianou

Recent advancements in deep learning led to human-level performance in single-speaker speech synthesis. However, there are still limitations in terms of speech quality when generalizing those systems into multiple-speaker models especially for unseen speakers and unseen recording qualities. For instance, conventional neural vocoders are adjusted to the training speaker and have poor generalization capabilities to unseen speakers. In this work, we propose a variant of WaveRNN, referred to as speaker conditional WaveRNN (SC-WaveRNN). We target towards the development of an efficient universal vocoder even for unseen speakers and recording conditions. In contrast to standard WaveRNN, SC-WaveRNN exploits additional information given in the form of speaker embeddings. Using publicly-available data for training, SC-WaveRNN achieves significantly better performance over baseline WaveRNN on both subjective and objective metrics. In MOS, SC-WaveRNN achieves an improvement of about 23% for seen speaker and seen recording condition and up to 95% for unseen speaker and unseen condition. Finally, we extend our work by implementing a multi-speaker text-to-speech (TTS) synthesis similar to zero-shot speaker adaptation. In terms of performance, our system has been preferred over the baseline TTS system by 60% over 15.5% and by 60.9% over 32.6%, for seen and unseen speakers, respectively.