All Articles

Hot Papers 2021-03-02

1. Narratives and Counternarratives on Data Sharing in Africa

Rediet Abebe, Kehinde Aruleba, Abeba Birhane, Sara Kingsley, George Obaido, Sekou L. Remy, Swathi Sadagopan

  • retweets: 6032, favorites: 199 (03/03/2021 08:27:03)
  • links: abs | pdf
  • cs.CY

As machine learning and data science applications grow ever more prevalent, there is an increased focus on data sharing and open data initiatives, particularly in the context of the African continent. Many argue that data sharing can support research and policy design to alleviate poverty, inequality, and derivative effects in Africa. Despite the fact that the datasets in question are often extracted from African communities, conversations around the challenges of accessing and sharing African data are too often driven by nonAfrican stakeholders. These perspectives frequently employ a deficit narratives, often focusing on lack of education, training, and technological resources in the continent as the leading causes of friction in the data ecosystem. We argue that these narratives obfuscate and distort the full complexity of the African data sharing landscape. In particular, we use storytelling via fictional personas built from a series of interviews with African data experts to complicate dominant narratives and to provide counternarratives. Coupling these personas with research on data practices within the continent, we identify recurring barriers to data sharing as well as inequities in the distribution of data sharing benefits. In particular, we discuss issues arising from power imbalances resulting from the legacies of colonialism, ethno-centrism, and slavery, disinvestment in building trust, lack of acknowledgement of historical and present-day extractive practices, and Western-centric policies that are ill-suited to the African context. After outlining these problems, we discuss avenues for addressing them when sharing data generated in the continent.

2. Generative Adversarial Transformers

Drew A. Hudson, C. Lawrence Zitnick

We introduce the GANsformer, a novel and efficient type of transformer, and explore it for the task of visual generative modeling. The network employs a bipartite structure that enables long-range interactions across the image, while maintaining computation of linearly efficiency, that can readily scale to high-resolution synthesis. It iteratively propagates information from a set of latent variables to the evolving visual features and vice versa, to support the refinement of each in light of the other and encourage the emergence of compositional representations of objects and scenes. In contrast to the classic transformer architecture, it utilizes multiplicative integration that allows flexible region-based modulation, and can thus be seen as a generalization of the successful StyleGAN network. We demonstrate the model’s strength and robustness through a careful evaluation over a range of datasets, from simulated multi-object environments to rich real-world indoor and outdoor scenes, showing it achieves state-of-the-art results in terms of image quality and diversity, while enjoying fast learning and better data-efficiency. Further qualitative and quantitative experiments offer us an insight into the model’s inner workings, revealing improved interpretability and stronger disentanglement, and illustrating the benefits and efficacy of our approach. An implementation of the model is available at github.com/dorarad/gansformer.

3. Transformer in Transformer

Kai Han, An Xiao, Enhua Wu, Jianyuan Guo, Chunjing Xu, Yunhe Wang

  • retweets: 2525, favorites: 423 (03/03/2021 08:27:04)
  • links: abs | pdf
  • cs.CV | cs.AI

Transformer is a type of self-attention-based neural networks originally applied for NLP tasks. Recently, pure transformer-based models are proposed to solve computer vision problems. These visual transformers usually view an image as a sequence of patches while they ignore the intrinsic structure information inside each patch. In this paper, we propose a novel Transformer-iN-Transformer (TNT) model for modeling both patch-level and pixel-level representation. In each TNT block, an outer transformer block is utilized to process patch embeddings, and an inner transformer block extracts local features from pixel embeddings. The pixel-level feature is projected to the space of patch embedding by a linear transformation layer and then added into the patch. By stacking the TNT blocks, we build the TNT model for image recognition. Experiments on ImageNet benchmark and downstream tasks demonstrate the superiority and efficiency of the proposed TNT architecture. For example, our TNT achieves 81.3%81.3\% top-1 accuracy on ImageNet which is 1.5%1.5\% higher than that of DeiT with similar computational cost. The code will be available at https://github.com/huawei-noah/noah-research/tree/master/TNT.

4. Training Generative Adversarial Networks in One Stage

Chengchao Shen, Youtan Yin, Xinchao Wang, Xubin LI, Jie Song, Mingli Song

Generative Adversarial Networks (GANs) have demonstrated unprecedented success in various image generation tasks. The encouraging results, however, come at the price of a cumbersome training process, during which the generator and discriminator are alternately updated in two stages. In this paper, we investigate a general training scheme that enables training GANs efficiently in only one stage. Based on the adversarial losses of the generator and discriminator, we categorize GANs into two classes, Symmetric GANs and Asymmetric GANs, and introduce a novel gradient decomposition method to unify the two, allowing us to train both classes in one stage and hence alleviate the training effort. Computational analysis and experimental results on several datasets and various network architectures demonstrate that, the proposed one-stage training scheme yields a solid 1.5×\times acceleration over conventional training schemes, regardless of the network architectures of the generator and discriminator. Furthermore, we show that the proposed method is readily applicable to other adversarial-training scenarios, such as data-free knowledge distillation. Our source code will be published soon.

5. Persistent Message Passing

Heiko Strathmann, Mohammadamin Barekatain, Charles Blundell, Petar Veličković

Graph neural networks (GNNs) are a powerful inductive bias for modelling algorithmic reasoning procedures and data structures. Their prowess was mainly demonstrated on tasks featuring Markovian dynamics, where querying any associated data structure depends only on its latest state. For many tasks of interest, however, it may be highly beneficial to support efficient data structure queries dependent on previous states. This requires tracking the data structure’s evolution through time, placing significant pressure on the GNN’s latent representations. We introduce Persistent Message Passing (PMP), a mechanism which endows GNNs with capability of querying past state by explicitly persisting it: rather than overwriting node representations, it creates new nodes whenever required. PMP generalises out-of-distribution to more than 2x larger test inputs on dynamic temporal range queries, significantly outperforming GNNs which overwrite states.

6. OmniNet: Omnidirectional Representations from Transformers

Yi Tay, Mostafa Dehghani, Vamsi Aribandi, Jai Gupta, Philip Pham, Zhen Qin, Dara Bahri, Da-Cheng Juan, Donald Metzler

This paper proposes Omnidirectional Representations from Transformers (OmniNet). In OmniNet, instead of maintaining a strictly horizontal receptive field, each token is allowed to attend to all tokens in the entire network. This process can also be interpreted as a form of extreme or intensive attention mechanism that has the receptive field of the entire width and depth of the network. To this end, the omnidirectional attention is learned via a meta-learner, which is essentially another self-attention based model. In order to mitigate the computationally expensive costs of full receptive field attention, we leverage efficient self-attention models such as kernel-based (Choromanski et al.), low-rank attention (Wang et al.) and/or Big Bird (Zaheer et al.) as the meta-learner. Extensive experiments are conducted on autoregressive language modeling (LM1B, C4), Machine Translation, Long Range Arena (LRA), and Image Recognition. The experiments show that OmniNet achieves considerable improvements across these tasks, including achieving state-of-the-art performance on LM1B, WMT’14 En-De/En-Fr, and Long Range Arena. Moreover, using omnidirectional representation in Vision Transformers leads to significant improvements on image recognition tasks on both few-shot learning and fine-tuning setups.

7. M6: A Chinese Multimodal Pretrainer

Junyang Lin, Rui Men, An Yang, Chang Zhou, Ming Ding, Yichang Zhang, Peng Wang, Ang Wang, Le Jiang, Xianyan Jia, Jie Zhang, Jianwei Zhang, Xu Zou, Zhikang Li, Xiaodong Deng, Jie Liu, Jinbao Xue, Huiling Zhou, Jianxin Ma, Jin Yu, Yong Li, Wei Lin, Jingren Zhou, J ie Tang, Hongxia Yang

  • retweets: 357, favorites: 129 (03/03/2021 08:27:05)
  • links: abs | pdf
  • cs.CL

In this work, we construct the largest dataset for multimodal pretraining in Chinese, which consists of over 1.9TB images and 292GB texts that cover a wide range of domains. We propose a cross-modal pretraining method called M6, referring to Multi-Modality to Multi-Modality Multitask Mega-transformer, for unified pretraining on the data of single modality and multiple modalities. We scale the model size up to 10 billion and 100 billion parameters, and build the largest pretrained model in Chinese. We apply the model to a series of downstream applications, and demonstrate its outstanding performance in comparison with strong baselines. Furthermore, we specifically design a downstream task of text-guided image generation, and show that the finetuned M6 can create high-quality images with high resolution and abundant details.

8. Computing the Information Content of Trained Neural Networks

Jeremy Bernstein, Yisong Yue

  • retweets: 374, favorites: 108 (03/03/2021 08:27:06)
  • links: abs | pdf
  • cs.LG | cs.NE

How much information does a learning algorithm extract from the training data and store in a neural network’s weights? Too much, and the network would overfit to the training data. Too little, and the network would not fit to anything at all. Na”ively, the amount of information the network stores should scale in proportion to the number of trainable weights. This raises the question: how can neural networks with vastly more weights than training data still generalise? A simple resolution to this conundrum is that the number of weights is usually a bad proxy for the actual amount of information stored. For instance, typical weight vectors may be highly compressible. Then another question occurs: is it possible to compute the actual amount of information stored? This paper derives both a consistent estimator and a closed-form upper bound on the information content of infinitely wide neural networks. The derivation is based on an identification between neural information content and the negative log probability of a Gaussian orthant. This identification yields bounds that analytically control the generalisation behaviour of the entire solution space of infinitely wide networks. The bounds have a simple dependence on both the network architecture and the training data. Corroborating the findings of Valle-P’erez et al. (2019), who conducted a similar analysis using approximate Gaussian integration techniques, the bounds are found to be both non-vacuous and correlated with the empirical generalisation behaviour at finite width.

9. Ultra-Data-Efficient GAN Training: Drawing A Lottery Ticket First, Then Training It Toughly

Tianlong Chen, Yu Cheng, Zhe Gan, Jingjing Liu, Zhangyang Wang

Training generative adversarial networks (GANs) with limited data generally results in deteriorated performance and collapsed models. To conquer this challenge, we are inspired by the latest observation of Kalibhat et al. (2020); Chen et al.(2021d), that one can discover independently trainable and highly sparse subnetworks (a.k.a., lottery tickets) from GANs. Treating this as an inductive prior, we decompose the data-hungry GAN training into two sequential sub-problems: (i) identifying the lottery ticket from the original GAN; then (ii) training the found sparse subnetwork with aggressive data and feature augmentations. Both sub-problems re-use the same small training set of real images. Such a coordinated framework enables us to focus on lower-complexity and more data-efficient sub-problems, effectively stabilizing training and improving convergence. Comprehensive experiments endorse the effectiveness of our proposed ultra-data-efficient training framework, across various GAN architectures (SNGAN, BigGAN, and StyleGAN2) and diverse datasets (CIFAR-10, CIFAR-100, Tiny-ImageNet, and ImageNet). Besides, our training framework also displays powerful few-shot generalization ability, i.e., generating high-fidelity images by training from scratch with just 100 real images, without any pre-training. Codes are available at: https://github.com/VITA-Group/Ultra-Data-Efficient-GAN-Training.

10. NeuTex: Neural Texture Mapping for Volumetric Neural Rendering

Fanbo Xiang, Zexiang Xu, Miloš Hašan, Yannick Hold-Geoffroy, Kalyan Sunkavalli, Hao Su

  • retweets: 256, favorites: 69 (03/03/2021 08:27:06)
  • links: abs | pdf
  • cs.CV

Recent work has demonstrated that volumetric scene representations combined with differentiable volume rendering can enable photo-realistic rendering for challenging scenes that mesh reconstruction fails on. However, these methods entangle geometry and appearance in a “black-box” volume that cannot be edited. Instead, we present an approach that explicitly disentangles geometry—represented as a continuous 3D volume—from appearance—represented as a continuous 2D texture map. We achieve this by introducing a 3D-to-2D texture mapping (or surface parameterization) network into volumetric representations. We constrain this texture mapping network using an additional 2D-to-3D inverse mapping network and a novel cycle consistency loss to make 3D surface points map to 2D texture points that map back to the original 3D points. We demonstrate that this representation can be reconstructed using only multi-view image supervision and generates high-quality rendering results. More importantly, by separating geometry and texture, we allow users to edit appearance by simply editing 2D texture maps.

11. Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP

Timo Schick, Sahana Udupa, Hinrich Schütze

  • retweets: 210, favorites: 84 (03/03/2021 08:27:06)
  • links: abs | pdf
  • cs.CL

When trained on large, unfiltered crawls from the internet, language models pick up and reproduce all kinds of undesirable biases that can be found in the data: they often generate racist, sexist, violent or otherwise toxic language. As large models often require millions of training examples to achieve good performance, it is difficult to completely prevent them from being exposed to such content. In this paper, we investigate whether pretrained language models at least know when they exhibit some undesirable bias or produce toxic content. Based on our findings, we propose a decoding algorithm that reduces the probability of a model producing problematic text given only a textual description of the undesired behavior. This algorithm does not rely on manually curated word lists, nor does it require any training data or changes to the model’s parameters. While our approach does by no means eliminate the issue of language models generating biased text, we believe it to be an important step in this direction.

12. Single-Shot Motion Completion with Transformer

Yinglin Duan, Tianyang Shi, Zhengxia Zou, Yenan Lin, Zhehui Qian, Bohan Zhang, Yi Yuan

  • retweets: 100, favorites: 68 (03/03/2021 08:27:06)
  • links: abs | pdf
  • cs.CV | cs.GR

Motion completion is a challenging and long-discussed problem, which is of great significance in film and game applications. For different motion completion scenarios (in-betweening, in-filling, and blending), most previous methods deal with the completion problems with case-by-case designs. In this work, we propose a simple but effective method to solve multiple motion completion problems under a unified framework and achieves a new state of the art accuracy under multiple evaluation settings. Inspired by the recent great success of attention-based models, we consider the completion as a sequence to sequence prediction problem. Our method consists of two modules - a standard transformer encoder with self-attention that learns long-range dependencies of input motions, and a trainable mixture embedding module that models temporal information and discriminates key-frames. Our method can run in a non-autoregressive manner and predict multiple missing frames within a single forward propagation in real time. We finally show the effectiveness of our method in music-dance applications.

13. Dynamic Stochastic Blockmodel Regression for Network Data: Application to International Militarized Conflicts

Santiago Olivella, Tyler Pratt, Kosuke Imai

A primary goal of social science research is to understand how latent group memberships predict the dynamic process of network evolution. In the modeling of international conflicts, for example, scholars hypothesize that membership in geopolitical coalitions shapes the decision to engage in militarized conflict. Such theories explain the ways in which nodal and dyadic characteristics affect the evolution of relational ties over time via their effects on group memberships. To aid the empirical testing of these arguments, we develop a dynamic model of network data by combining a hidden Markov model with a mixed-membership stochastic blockmodel that identifies latent groups underlying the network structure. Unlike existing models, we incorporate covariates that predict node membership in latent groups as well as the direct formation of edges between dyads. While prior substantive research often assumes the decision to engage in militarized conflict is independent across states and static over time, we demonstrate that conflict patterns are driven by states’ evolving membership in geopolitical blocs. Changes in monadic covariates like democracy shift states between coalitions, generating heterogeneous effects on conflict over time and across states. The proposed methodology, which relies on a variational approximation to a collapsed posterior distribution as well as stochastic optimization for scalability, is implemented through an open-source software package.

14. A Survey on Stance Detection for Mis- and Disinformation Identification

Momchil Hardalov, Arnav Arora, Preslav Nakov, Isabelle Augenstein

  • retweets: 55, favorites: 31 (03/03/2021 08:27:07)
  • links: abs | pdf
  • cs.CL | cs.SI

Detecting attitudes expressed in texts, also known as stance detection, has become an important task for the detection of false information online, be it misinformation (unintentionally false) or disinformation (intentionally false, spread deliberately with malicious intent). Stance detection has been framed in different ways, including: (a) as a component of fact-checking, rumour detection, and detecting previously fact-checked claims; or (b) as a task in its own right. While there have been prior efforts to contrast stance detection with other related social media tasks such as argumentation mining and sentiment analysis, there is no survey examining the relationship between stance detection detection and mis- and disinformation detection from a holistic viewpoint, which is the focus of this survey. We review and analyse existing work in this area, before discussing lessons learnt and future challenges.

15. Categorical Depth Distribution Network for Monocular 3D Object Detection

Cody Reading, Ali Harakeh, Julia Chae, Steven L. Waslander

  • retweets: 64, favorites: 18 (03/03/2021 08:27:07)
  • links: abs | pdf
  • cs.CV

Monocular 3D object detection is a key problem for autonomous vehicles, as it provides a solution with simple configuration compared to typical multi-sensor systems. The main challenge in monocular 3D detection lies in accurately predicting object depth, which must be inferred from object and scene cues due to the lack of direct range measurement. Many methods attempt to directly estimate depth to assist in 3D detection, but show limited performance as a result of depth inaccuracy. Our proposed solution, Categorical Depth Distribution Network (CaDDN), uses a predicted categorical depth distribution for each pixel to project rich contextual feature information to the appropriate depth interval in 3D space. We then use the computationally efficient bird’s-eye-view projection and single-stage detector to produce the final output bounding boxes. We design CaDDN as a fully differentiable end-to-end approach for joint depth estimation and object detection. We validate our approach on the KITTI 3D object detection benchmark, where we rank 1st among published monocular methods. We also provide the first monocular 3D detection results on the newly released Waymo Open Dataset. The source code for CaDDN will be made publicly available before publication.

16. Detecting Abusive Language on Online Platforms: A Critical Analysis

Preslav Nakov, Vibha Nayak, Kyle Dent, Ameya Bhatawdekar, Sheikh Muhammad Sarwar, Momchil Hardalov, Yoan Dinkov, Dimitrina Zlatkova, Guillaume Bouchard, Isabelle Augenstein

  • retweets: 46, favorites: 32 (03/03/2021 08:27:07)
  • links: abs | pdf
  • cs.CL | cs.SI

Abusive language on online platforms is a major societal problem, often leading to important societal problems such as the marginalisation of underrepresented minorities. There are many different forms of abusive language such as hate speech, profanity, and cyber-bullying, and online platforms seek to moderate it in order to limit societal harm, to comply with legislation, and to create a more inclusive environment for their users. Within the field of Natural Language Processing, researchers have developed different methods for automatically detecting abusive language, often focusing on specific subproblems or on narrow communities, as what is considered abusive language very much differs by context. We argue that there is currently a dichotomy between what types of abusive language online platforms seek to curb, and what research efforts there are to automatically detect abusive language. We thus survey existing methods as well as content moderation policies by online platforms in this light, and we suggest directions for future work.

17. Gender Typicality of Behavior Predicts Success on Creative Platforms

Orsolya Vasarhelyi, Balazs Vedres

  • retweets: 56, favorites: 17 (03/03/2021 08:27:07)
  • links: abs | pdf
  • cs.SI

Collaboration platforms on the Internet have become crucial tools for independent creative workers, facilitating connections with collaborators, users, and buyers. Such platforms carried the promise of better opportunities for women and other underrepresented groups to access markets and collaborators, but the evidence is mounting that they rather perpetuate existing biases and inequalities. In previous work, we had found that the majority of women’s disadvantage in success and survival on GitHub stems from what they do the gender typicality of their behavior in open source programming rather than from categorical discrimination of their gender. In this article, we replicate our findings on another platform with a markedly different focus Behance, a community for graphic artists. We also study attention as a new outcome on both platforms. We found that female typicality of behavior is a significant negative predictor of attention, success, and survival on creative platforms, while the impact of categorical gender varies by outcome and field. We found support for the visibility paradox of women in technical fields while female typicality of behaviors is negatively related to attention, being female predicts a higher level of attention. We quantified the indirect impact of gender homophily on success via gendered behavior that accounts for 37 percent of the disadvantage of women in success. Our findings suggest that the negative impact of the gender typicality of behavior is a more general phenomenon than our first study indicated, underlining the scope of the challenge of countering unconscious gender bias in the platform economy.

18. Practical and Private (Deep) Learning without Sampling or Shuffling

Peter Kairouz, Brendan McMahan, Shuang Song, Om Thakkar, Abhradeep Thakurta, Zheng Xu

  • retweets: 38, favorites: 29 (03/03/2021 08:27:07)
  • links: abs | pdf
  • cs.CR | cs.LG

We consider training models with differential privacy (DP) using mini-batch gradients. The existing state-of-the-art, Differentially Private Stochastic Gradient Descent (DP-SGD), requires privacy amplification by sampling or shuffling to obtain the best privacy/accuracy/computation trade-offs. Unfortunately, the precise requirements on exact sampling and shuffling can be hard to obtain in important practical scenarios, particularly federated learning (FL). We design and analyze a DP variant of Follow-The-Regularized-Leader (DP-FTRL) that compares favorably (both theoretically and empirically) to amplified DP-SGD, while allowing for much more flexible data access patterns. DP-FTRL does not use any form of privacy amplification.

19. Perspectives on individual animal identification from biology and computer vision

Maxime Vidal, Nathan Wolf, Beth Rosenberg, Bradley P. Harris, Alexander Mathis

Identifying individual animals is crucial for many biological investigations. In response to some of the limitations of current identification methods, new automated computer vision approaches have emerged with strong performance. Here, we review current advances of computer vision identification techniques to provide both computer scientists and biologists with an overview of the available tools and discuss their applications. We conclude by offering recommendations for starting an animal identification project, illustrate current limitations and propose how they might be addressed in the future.

20. Transformers with Competitive Ensembles of Independent Mechanisms

Alex Lamb, Di He, Anirudh Goyal, Guolin Ke, Chien-Feng Liao, Mirco Ravanelli, Yoshua Bengio

  • retweets: 44, favorites: 18 (03/03/2021 08:27:07)
  • links: abs | pdf
  • cs.LG | cs.AI

An important development in deep learning from the earliest MLPs has been a move towards architectures with structural inductive biases which enable the model to keep distinct sources of information and routes of processing well-separated. This structure is linked to the notion of independent mechanisms from the causality literature, in which a mechanism is able to retain the same processing as irrelevant aspects of the world are changed. For example, convnets enable separation over positions, while attention-based architectures (especially Transformers) learn which combination of positions to process dynamically. In this work we explore a way in which the Transformer architecture is deficient: it represents each position with a large monolithic hidden representation and a single set of parameters which are applied over the entire hidden representation. This potentially throws unrelated sources of information together, and limits the Transformer’s ability to capture independent mechanisms. To address this, we propose Transformers with Independent Mechanisms (TIM), a new Transformer layer which divides the hidden representation and parameters into multiple mechanisms, which only exchange information through attention. Additionally, we propose a competition mechanism which encourages these mechanisms to specialize over time steps, and thus be more independent. We study TIM on a large-scale BERT model, on the Image Transformer, and on speech enhancement and find evidence for semantically meaningful specialization as well as improved performance.

21. Accelerated Jarzynski Estimator with Deterministic Virtual Trajectories

Nobumasa Ishida, Yoshihiko Hasegawa

The Jarzynski estimator is a powerful tool that uses nonequilibrium statistical physics to numerically obtain partition functions of probability distributions. The estimator reconstructs partition functions with trajectories of simulated Langevin dynamics through the Jarzynski equality. However, the original estimator suffers from its slow convergence because it depends on rare trajectories of stochastic dynamics. In this paper we present a method to significantly accelerate the convergence by introducing deterministic virtual trajectories generated in augmented state space under Hamiltonian dynamics. We theoretically show that our approach achieves second-order acceleration compared to a naive estimator with Langevin dynamics and zero variance estimation on harmonic potentials. Moreover, we conduct numerical experiments on three multimodal distributions where the proposed method outperforms the conventional method, and provide theoretical explanations.

22. Generative chemical transformer: attention makes neural machine learn molecular geometric structures via text

Hyunseung Kim, Jonggeol Na, Won Bo Lee

Chemical formula is an artificial language that expresses molecules as text. Neural machines that have learned chemical language can be used as a tool for inverse molecular design. Here, we propose a neural machine that creates molecules that meet some desired conditions based on a deep understanding of chemical language (generative chemical Transformer, GCT). Attention-mechanism in GCT allows a deeper understanding of molecular structures, beyond the limitations of chemical language itself that cause semantic discontinuity, by paying attention to characters sparsely. We investigate the significance of language models to inverse molecular design problems by quantitatively evaluating the quality of generated molecules. GCT generates highly realistic chemical strings that satisfy both a chemical rule and grammars of a language. Molecules parsed from generated strings simultaneously satisfy the multiple target properties and are various for a single condition set. GCT generates de novo molecules, and this is done in a short time that human experts cannot. These advances will contribute to improving the quality of human life by accelerating the process of desired material discovery.

23. Sim-to-Real Transfer for Robotic Manipulation with Tactile Sensory

Zihan Ding, Ya-Yen Tsai, Wang Wei Lee, Bidan Huang

  • retweets: 12, favorites: 43 (03/03/2021 08:27:07)
  • links: abs | pdf
  • cs.RO

Reinforcement Learning (RL) methods have been widely applied for robotic manipulations via sim-to-real transfer, typically with proprioceptive and visual information. However, the incorporation of tactile sensing into RL for contact-rich tasks lacks investigation. In this paper, we model a tactile sensor in simulation and study the effects of its feedback in RL-based robotic control via a zero-shot sim-to-real approach with domain randomization. We demonstrate that learning and controlling with feedback from tactile sensor arrays at the gripper, both in simulation and reality, can enhance grasping stability, which leads to a significant improvement in robotic manipulation performance for a door opening task. In real-world experiments, the door open angle was increased by 45% on average for transferred policies with tactile sensing over those without it.

24. The Mathematics Behind Spectral Clustering And The Equivalence To PCA

T Shen

Spectral clustering is a popular algorithm that clusters points using the eigenvalues and eigenvectors of Laplacian matrices derived from the data. For years, spectral clustering has been working mysteriously. This paper explains spectral clustering by dividing it into two categories based on whether the graph Laplacian is fully connected or not. For a fully connected graph, this paper demonstrates the dimension reduction part by offering an objective function: the covariance between the original data points’ similarities and the mapped data points’ similarities. For a multi-connected graph, this paper proves that with a proper kk, the first kk eigenvectors are the indicators of the connected components. This paper also proves there is an equivalence between spectral embedding and PCA.

25. AdaSpeech: Adaptive Text to Speech for Custom Voice

Mingjian Chen, Xu Tan, Bohan Li, Yanqing Liu, Tao Qin, Sheng Zhao, Tie-Yan Liu

Custom voice, a specific text to speech (TTS) service in commercial speech platforms, aims to adapt a source TTS model to synthesize personal voice for a target speaker using few speech data. Custom voice presents two unique challenges for TTS adaptation: 1) to support diverse customers, the adaptation model needs to handle diverse acoustic conditions that could be very different from source speech data, and 2) to support a large number of customers, the adaptation parameters need to be small enough for each target speaker to reduce memory usage while maintaining high voice quality. In this work, we propose AdaSpeech, an adaptive TTS system for high-quality and efficient customization of new voices. We design several techniques in AdaSpeech to address the two challenges in custom voice: 1) To handle different acoustic conditions, we use two acoustic encoders to extract an utterance-level vector and a sequence of phoneme-level vectors from the target speech during training; in inference, we extract the utterance-level vector from a reference speech and use an acoustic predictor to predict the phoneme-level vectors. 2) To better trade off the adaptation parameters and voice quality, we introduce conditional layer normalization in the mel-spectrogram decoder of AdaSpeech, and fine-tune this part in addition to speaker embedding for adaptation. We pre-train the source TTS model on LibriTTS datasets and fine-tune it on VCTK and LJSpeech datasets (with different acoustic conditions from LibriTTS) with few adaptation data, e.g., 20 sentences, about 1 minute speech. Experiment results show that AdaSpeech achieves much better adaptation quality than baseline methods, with only about 5K specific parameters for each speaker, which demonstrates its effectiveness for custom voice. Audio samples are available at https://speechresearch.github.io/adaspeech/.

26. High-productivity, high-performance workflow for virus-scale electrostatic simulations with Bempp-Exafmm

Tingyu Wang, Christopher D. Cooper, Timo Betcke, Lorena A. Barba

Biomolecular electrostatics is key in protein function and the chemical processes affecting it. Implicit-solvent models expressed by the Poisson-Boltzmann (PB) equation can provide insights with less computational power than full atomistic models, making large-system studies — at the scale of viruses, for example — accessible to more researchers. This paper presents a high-productivity and high-performance PB solver based on Exafmm, a fast multipole method (FMM) library, and Bempp, a Galerkin boundary element method (BEM) package.Bempp-Exafmm tightly integrates an easy-to-use Python interface with well-optimized computational kernels that are written in compiled languages. Thanks to Python’s rich ecosystem in scientific computing, users can perform PB simulations interactively via Jupyter notebooks, which opens up the possibility for faster prototyping and analyzing. We provide results showcasing the capability of our software, confirming correctness, and evaluating its performance with problem sizes between 8,000 and 2 million boundary elements. A small study comparing two variants of the boundary integral formulation in regards to algebraic conditioning showcases the power of this interactive computing platform to give useful answers with just a few lines of code. As a form of solution verification, mesh refinement studies with a spherical geometry as well as with a real biological structure (5PTI) confirm convergence at the expected 1/N1/N rate, for NN boundary elements. Performance results include timings, breakdowns, and computational complexity. Exafmm offers evaluation speeds of just a few seconds for tens of millions of points, and O(N)\mathcal{O}(N) scaling. This allowed computing the solvation free energy of a Zika virus, represented by 1.6 million atoms and 10 million boundary elements, at 80-min runtime on a single compute node (dual 20-core Intel Xeon Gold 6148).