All Articles

Hot Papers 2021-05-07

1. Reliability Testing for Natural Language Processing Systems

Samson Tan, Shafiq Joty, Kathy Baxter, Araz Taeihagh, Gregory A. Bennett, Min-Yen Kan

  • retweets: 650, favorites: 97 (05/08/2021 09:16:31)
  • links: abs | pdf
  • cs.LG | cs.CL

Questions of fairness, robustness, and transparency are paramount to address before deploying NLP systems. Central to these concerns is the question of reliability: Can NLP systems reliably treat different demographics fairly and function correctly in diverse and noisy environments? To address this, we argue for the need for reliability testing and contextualize it among existing work on improving accountability. We show how adversarial attacks can be reframed for this goal, via a framework for developing reliability tests. We argue that reliability testing — with an emphasis on interdisciplinary collaboration — will enable rigorous and targeted testing, and aid in the enactment and enforcement of industry standards.

2. Neural Algorithmic Reasoning

Petar Veličković, Charles Blundell

Algorithms have been fundamental to recent global technological advances and, in particular, they have been the cornerstone of technical advances in one field rapidly being applied to another. We argue that algorithms possess fundamentally different qualities to deep learning methods, and this strongly suggests that, were deep learning methods better able to mimic algorithms, generalisation of the sort seen with algorithms would become possible with deep learning — something far out of the reach of current machine learning methods. Furthermore, by representing elements in a continuous space of learnt algorithms, neural networks are able to adapt known algorithms more closely to real-world problems, potentially finding more efficient and pragmatic solutions than those proposed by human computer scientists. Here we present neural algorithmic reasoning — the art of building neural networks that are able to execute algorithmic computation — and provide our opinion on its transformative potential for running classical algorithms on inputs previously considered inaccessible to them.

3. PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation

Kehong Gong, Jianfeng Zhang, Jiashi Feng

  • retweets: 272, favorites: 121 (05/08/2021 09:16:31)
  • links: abs | pdf
  • cs.CV

Existing 3D human pose estimators suffer poor generalization performance to new datasets, largely due to the limited diversity of 2D-3D pose pairs in the training data. To address this problem, we present PoseAug, a new auto-augmentation framework that learns to augment the available training poses towards a greater diversity and thus improve generalization of the trained 2D-to-3D pose estimator. Specifically, PoseAug introduces a novel pose augmentor that learns to adjust various geometry factors (e.g., posture, body size, view point and position) of a pose through differentiable operations. With such differentiable capacity, the augmentor can be jointly optimized with the 3D pose estimator and take the estimation error as feedback to generate more diverse and harder poses in an online manner. Moreover, PoseAug introduces a novel part-aware Kinematic Chain Space for evaluating local joint-angle plausibility and develops a discriminative module accordingly to ensure the plausibility of the augmented poses. These elaborate designs enable PoseAug to generate more diverse yet plausible poses than existing offline augmentation methods, and thus yield better generalization of the pose estimator. PoseAug is generic and easy to be applied to various 3D pose estimators. Extensive experiments demonstrate that PoseAug brings clear improvements on both intra-scenario and cross-scenario datasets. Notably, it achieves 88.6% 3D PCK on MPI-INF-3DHP under cross-dataset evaluation setup, improving upon the previous best data augmentation based method by 9.1%. Code can be found at: https://github.com/jfzhang95/PoseAug.

4. Training Quantum Embedding Kernels on Near-Term Quantum Computers

Thomas Hubregtsen, David Wierichs, Elies Gil-Fuster, Peter-Jan H. S. Derks, Paul K. Faehrmann, Johannes Jakob Meyer

Kernel methods are a cornerstone of classical machine learning. The idea of using quantum computers to compute kernels has recently attracted attention. Quantum embedding kernels (QEKs) constructed by embedding data into the Hilbert space of a quantum computer are a particular quantum kernel technique that allows to gather insights into learning problems and that are particularly suitable for noisy intermediate-scale quantum devices. In this work, we first provide an accessible introduction to quantum embedding kernels and then analyze the practical issues arising when realizing them on a noisy near-term quantum computer. We focus on quantum embedding kernels with variational parameters. These variational parameters are optimized for a given dataset by increasing the kernel-target alignment, a heuristic connected to the achievable classification accuracy. We further show under which conditions noise from device imperfections influences the predicted kernel and provide a strategy to mitigate these detrimental effects which is tailored to quantum embedding kernels. We also address the influence of finite sampling and derive bounds that put guarantees on the quality of the kernel matrix. We illustrate our findings by numerical experiments and tests on actual hardware.

5. Animatable Neural Radiance Fields for Human Body Modeling

Sida Peng, Junting Dong, Qianqian Wang, Shangzhan Zhang, Qing Shuai, Hujun Bao, Xiaowei Zhou

  • retweets: 196, favorites: 78 (05/08/2021 09:16:31)
  • links: abs | pdf
  • cs.CV

This paper addresses the challenge of reconstructing an animatable human model from a multi-view video. Some recent works have proposed to decompose a dynamic scene into a canonical neural radiance field and a set of deformation fields that map observation-space points to the canonical space, thereby enabling them to learn the dynamic scene from images. However, they represent the deformation field as translational vector field or SE(3) field, which makes the optimization highly under-constrained. Moreover, these representations cannot be explicitly controlled by input motions. Instead, we introduce neural blend weight fields to produce the deformation fields. Based on the skeleton-driven deformation, blend weight fields are used with 3D human skeletons to generate observation-to-canonical and canonical-to-observation correspondences. Since 3D human skeletons are more observable, they can regularize the learning of deformation fields. Moreover, the learned blend weight fields can be combined with input skeletal motions to generate new deformation fields to animate the human model. Experiments show that our approach significantly outperforms recent human synthesis methods. The code will be available at https://zju3dv.github.io/animatable_nerf/.

6. A Guide for New Program Committee Members at Theoretical Computer Science Conferences

Yfke Dulek, Stacey Jeffery, Christian Majenz, Christian Schaffner, Florian Speelman, Ronald de Wolf

  • retweets: 196, favorites: 42 (05/08/2021 09:16:31)
  • links: abs | pdf
  • cs.GL

In theoretical computer science, conferences play an important role in the scientific process. The decisions whether to accept or reject articles is taken by the program committee (PC) members. Serving on a PC for the first time can be a daunting experience. This guide will help new program-committee members to understand how the system works, and provide useful tips and guidelines. It discusses every phase of the paper-selection process, and the tasks associated to it.

7. A Unifying and Canonical Description of Measure-Preserving Diffusions

Alessandro Barp, So Takao, Michael Betancourt, Alexis Arnaudon, Mark Girolami

A complete recipe of measure-preserving diffusions in Euclidean space was recently derived unifying several MCMC algorithms into a single framework. In this paper, we develop a geometric theory that improves and generalises this construction to any manifold. We thereby demonstrate that the completeness result is a direct consequence of the topology of the underlying manifold and the geometry induced by the target measure PP; there is no need to introduce other structures such as a Riemannian metric, local coordinates, or a reference measure. Instead, our framework relies on the intrinsic geometry of PP and in particular its canonical derivative, the deRham rotationnel, which allows us to parametrise the Fokker—Planck currents of measure-preserving diffusions using potentials. The geometric formalism can easily incorporate constraints and symmetries, and deliver new important insights, for example, a new complete recipe of Langevin-like diffusions that are suited to the construction of samplers. We also analyse the reversibility and dissipative properties of the diffusions, the associated deterministic flow on the space of measures, and the geometry of Langevin processes. Our article connects ideas from various literature and frames the theory of measure-preserving diffusions in its appropriate mathematical context.

8. TABBIE: Pretrained Representations of Tabular Data

Hiroshi Iida, Dung Thai, Varun Manjunatha, Mohit Iyyer

  • retweets: 132, favorites: 62 (05/08/2021 09:16:32)
  • links: abs | pdf
  • cs.CL

Existing work on tabular representation learning jointly models tables and associated text using self-supervised objective functions derived from pretrained language models such as BERT. While this joint pretraining improves tasks involving paired tables and text (e.g., answering questions about tables), we show that it underperforms on tasks that operate over tables without any associated text (e.g., populating missing cells). We devise a simple pretraining objective (corrupt cell detection) that learns exclusively from tabular data and reaches the state-of-the-art on a suite of table based prediction tasks. Unlike competing approaches, our model (TABBIE) provides embeddings of all table substructures (cells, rows, and columns), and it also requires far less compute to train. A qualitative analysis of our model’s learned cell, column, and row representations shows that it understands complex table semantics and numerical trends.

9. Capturing the diversity of multilingual societies

Thomas Louf, David Sanchez, Jose J. Ramasco

Cultural diversity encoded within languages of the world is at risk, as many languages have become endangered in the last decades in a context of growing globalization. To preserve this diversity, it is first necessary to understand what drives language extinction, and which mechanisms might enable coexistence. Here, we consider the processes at work in language shift through a conjunction of theoretical and data-driven perspectives. A large-scale empirical study of spatial patterns of languages in multilingual societies using Twitter and census data yields a wide diversity. It ranges from an almost complete mixing of language speakers, including multilinguals, to segregation with a neat separation of the linguistic domains and with multilinguals mainly at their boundaries. To understand how these different states can emerge and, especially, become stable, we propose a model in which coexistence of languages may be reached when learning the other language is facilitated and when bilinguals favor the use of the endangered language. Simulations carried out in a metapopulation framework highlight the importance of spatial interactions arising from people mobility to explain the stability of a mixed state or the presence of a boundary between two linguistic regions. Changes in the parameters regulating the relation between the languages can destabilize a system, which undergoes global transitions. According to our model, the evolution of the system once it undergoes a transition is highly history-dependent. It is easy to change the status quo but going back to a previous state may not be simple or even possible.

10. Unsupervised Visual Representation Learning by Tracking Patches in Video

Guangting Wang, Yizhou Zhou, Chong Luo, Wenxuan Xie, Wenjun Zeng, Zhiwei Xiong

  • retweets: 44, favorites: 52 (05/08/2021 09:16:32)
  • links: abs | pdf
  • cs.CV

Inspired by the fact that human eyes continue to develop tracking ability in early and middle childhood, we propose to use tracking as a proxy task for a computer vision system to learn the visual representations. Modelled on the Catch game played by the children, we design a Catch-the-Patch (CtP) game for a 3D-CNN model to learn visual representations that would help with video-related tasks. In the proposed pretraining framework, we cut an image patch from a given video and let it scale and move according to a pre-set trajectory. The proxy task is to estimate the position and size of the image patch in a sequence of video frames, given only the target bounding box in the first frame. We discover that using multiple image patches simultaneously brings clear benefits. We further increase the difficulty of the game by randomly making patches invisible. Extensive experiments on mainstream benchmarks demonstrate the superior performance of CtP against other video pretraining methods. In addition, CtP-pretrained features are less sensitive to domain gaps than those trained by a supervised action recognition task. When both trained on Kinetics-400, we are pleasantly surprised to find that CtP-pretrained representation achieves much higher action classification accuracy than its fully supervised counterpart on Something-Something dataset. Code is available online: github.com/microsoft/CtP.

11. CombOptNet: Fit the Right NP-Hard Problem by Learning Integer Programming Constraints

Anselm Paulus, Michal Rolínek, Vít Musil, Brandon Amos, Georg Martius

  • retweets: 56, favorites: 22 (05/08/2021 09:16:32)
  • links: abs | pdf
  • cs.LG

Bridging logical and algorithmic reasoning with modern machine learning techniques is a fundamental challenge with potentially transformative impact. On the algorithmic side, many NP-hard problems can be expressed as integer programs, in which the constraints play the role of their “combinatorial specification”. In this work, we aim to integrate integer programming solvers into neural network architectures as layers capable of learning both the cost terms and the constraints. The resulting end-to-end trainable architectures jointly extract features from raw data and solve a suitable (learned) combinatorial problem with state-of-the-art integer programming solvers. We demonstrate the potential of such layers with an extensive performance analysis on synthetic data and with a demonstration on a competitive computer vision keypoint matching benchmark.

12. Inverting Generative Adversarial Renderer for Face Reconstruction

Jingtan Piao, Keqiang Sun, Kwanyee Lin, Hongshneg Li

  • retweets: 36, favorites: 28 (05/08/2021 09:16:32)
  • links: abs | pdf
  • cs.CV

Given a monocular face image as input, 3D face geometry reconstruction aims to recover a corresponding 3D face mesh. Recently, both optimization-based and learning-based face reconstruction methods have taken advantage of the emerging differentiable renderer and shown promising results. However, the differentiable renderer, mainly based on graphics rules, simplifies the realistic mechanism of the illumination, reflection, \etc, of the real world, thus cannot produce realistic images. This brings a lot of domain-shift noise to the optimization or training process. In this work, we introduce a novel Generative Adversarial Renderer (GAR) and propose to tailor its inverted version to the general fitting pipeline, to tackle the above problem. Specifically, the carefully designed neural renderer takes a face normal map and a latent code representing other factors as inputs and renders a realistic face image. Since the GAR learns to model the complicated real-world image, instead of relying on the simplified graphics rules, it is capable of producing realistic images, which essentially inhibits the domain-shift noise in training and optimization. Equipped with the elaborated GAR, we further proposed a novel approach to predict 3D face parameters, in which we first obtain fine initial parameters via Renderer Inverting and then refine it with gradient-based optimizers. Extensive experiments have been conducted to demonstrate the effectiveness of the proposed generative adversarial renderer and the novel optimization-based face reconstruction framework. Our method achieves state-of-the-art performances on multiple face reconstruction datasets.

13. DiffSinger: Diffusion Acoustic Model for Singing Voice Synthesis

Jinglin Liu, Chengxi Li, Yi Ren, Feiyang Chen, Peng Liu, Zhou Zhao

Singing voice synthesis (SVS) system is built to synthesize high-quality and expressive singing voice, in which the acoustic model generates the acoustic features (e.g., mel-spectrogram) given a music score. Previous singing acoustic models adopt simple loss (e.g., L1 and L2) or generative adversarial network (GAN) to reconstruct the acoustic features, while they suffer from over-smoothing and unstable training issues respectively, which hinder the naturalness of synthesized singing. In this work, we propose DiffSinger, an acoustic model for SVS based on the diffusion probabilistic model. DiffSinger is a parameterized Markov chain which iteratively converts the noise into mel-spectrogram conditioned on the music score. By implicitly optimizing variational bound, DiffSinger can be stably trained and generates realistic outputs. To further improve the voice quality, we introduce a \textbf{shallow diffusion mechanism} to make better use of the prior knowledge learned by the simple loss. Specifically, DiffSinger starts generation at a shallow step smaller than the total number of diffusion steps, according to the intersection of the diffusion trajectories of the ground-truth mel-spectrogram and the one predicted by a simple mel-spectrogram decoder. Besides, we train a boundary prediction network to locate the intersection and determine the shallow step adaptively. The evaluations conducted on the Chinese singing dataset demonstrate that DiffSinger outperforms state-of-the-art SVS work with a notable margin (0.11 MOS gains). Our extensional experiments also prove the generalization of DiffSinger on text-to-speech task.

14. ACORN: Adaptive Coordinate Networks for Neural Scene Representation

Julien N. P. Martel, David B. Lindell, Connor Z. Lin, Eric R. Chan, Marco Monteiro, Gordon Wetzstein

Neural representations have emerged as a new paradigm for applications in rendering, imaging, geometric modeling, and simulation. Compared to traditional representations such as meshes, point clouds, or volumes they can be flexibly incorporated into differentiable learning-based pipelines. While recent improvements to neural representations now make it possible to represent signals with fine details at moderate resolutions (e.g., for images and 3D shapes), adequately representing large-scale or complex scenes has proven a challenge. Current neural representations fail to accurately represent images at resolutions greater than a megapixel or 3D scenes with more than a few hundred thousand polygons. Here, we introduce a new hybrid implicit-explicit network architecture and training strategy that adaptively allocates resources during training and inference based on the local complexity of a signal of interest. Our approach uses a multiscale block-coordinate decomposition, similar to a quadtree or octree, that is optimized during training. The network architecture operates in two stages: using the bulk of the network parameters, a coordinate encoder generates a feature grid in a single forward pass. Then, hundreds or thousands of samples within each block can be efficiently evaluated using a lightweight feature decoder. With this hybrid implicit-explicit network architecture, we demonstrate the first experiments that fit gigapixel images to nearly 40 dB peak signal-to-noise ratio. Notably this represents an increase in scale of over 1000x compared to the resolution of previously demonstrated image-fitting experiments. Moreover, our approach is able to represent 3D shapes significantly faster and better than previous techniques; it reduces training times from days to hours or minutes and memory requirements by over an order of magnitude.

15. De Finetti’s Theorem in Categorical Probability

Tobias Fritz, Tomáš Gonda, Paolo Perrone

We present a novel proof of de Finetti’s Theorem characterizing permutation-invariant probability measures of infinite sequences of variables, so-called exchangeable measures. The proof is phrased in the language of Markov categories, which provide an abstract categorical framework for probability and information flow. This abstraction allows for multiple versions of the original theorem to arise as consequences merely by interpreting the categorical result in different Markov categories. Moreover, the diagrammatic and abstract nature of the arguments makes the proof intuitive and easy to follow.