All Articles

Hot Papers 2021-08-23

1. Fastformer: Additive Attention is All You Need

Chuhan Wu, Fangzhao Wu, Tao Qi, Yongfeng Huang

  • retweets: 4276, favorites: 393 (08/24/2021 07:55:14)
  • links: abs | pdf
  • cs.CL

Transformer is a powerful model for text understanding. However, it is inefficient due to its quadratic complexity to input sequence length. Although there are many methods on Transformer acceleration, they are still either inefficient on long sequences or not effective enough. In this paper, we propose Fastformer, which is an efficient Transformer model based on additive attention. In Fastformer, instead of modeling the pair-wise interactions between tokens, we first use additive attention mechanism to model global contexts, and then further transform each token representation based on its interaction with global context representations. In this way, Fastformer can achieve effective context modeling with linear complexity. Extensive experiments on five datasets show that Fastformer is much more efficient than many existing Transformer models and can meanwhile achieve comparable or even better long text modeling performance.

2. Smart Bird: Learnable Sparse Attention for Efficient and Effective Transformer

Chuhan Wu, Fangzhao Wu, Tao Qi, Yongfeng Huang

  • retweets: 156, favorites: 90 (08/24/2021 07:55:15)
  • links: abs | pdf
  • cs.CL

Transformer has achieved great success in NLP. However, the quadratic complexity of the self-attention mechanism in Transformer makes it inefficient in handling long sequences. Many existing works explore to accelerate Transformers by computing sparse self-attention instead of a dense one, which usually attends to tokens at certain positions or randomly selected tokens. However, manually selected or random tokens may be uninformative for context modeling. In this paper, we propose Smart Bird, which is an efficient and effective Transformer with learnable sparse attention. In Smart Bird, we first compute a sketched attention matrix with a single-head low-dimensional Transformer, which aims to find potential important interactions between tokens. We then sample token pairs based on their probability scores derived from the sketched attention matrix to generate different sparse attention index matrices for different attention heads. Finally, we select token embeddings according to the index matrices to form the input of sparse attention networks. Extensive experiments on six benchmark datasets for different tasks validate the efficiency and effectiveness of Smart Bird in text modeling.

3. An Empirical Cybersecurity Evaluation of GitHub Copilot’s Code Contributions

Hammond Pearce, Baleegh Ahmad, Benjamin Tan, Brendan Dolan-Gavitt, Ramesh Karri

  • retweets: 169, favorites: 60 (08/24/2021 07:55:15)
  • links: abs | pdf
  • cs.CR | cs.AI

There is burgeoning interest in designing AI-based systems to assist humans in designing computing systems, including tools that automatically generate computer code. The most notable of these comes in the form of the first self-described `AI pair programmer’, GitHub Copilot, a language model trained over open-source GitHub code. However, code often contains bugs - and so, given the vast quantity of unvetted code that Copilot has processed, it is certain that the language model will have learned from exploitable, buggy code. This raises concerns on the security of Copilot’s code contributions. In this work, we systematically investigate the prevalence and conditions that can cause GitHub Copilot to recommend insecure code. To perform this analysis we prompt Copilot to generate code in scenarios relevant to high-risk CWEs (e.g. those from MITRE’s “Top 25” list). We explore Copilot’s performance on three distinct code generation axes — examining how it performs given diversity of weaknesses, diversity of prompts, and diversity of domains. In total, we produce 89 different scenarios for Copilot to complete, producing 1,692 programs. Of these, we found approximately 40% to be vulnerable.

4. Towards Photorealistic Colorization by Imagination

Chenyang Lei, Yue Wu, Qifeng Chen

  • retweets: 140, favorites: 71 (08/24/2021 07:55:15)
  • links: abs | pdf
  • cs.CV

We present a novel approach to automatic image colorization by imitating the imagination process of human experts. Our imagination module is designed to generate color images that are context-correlated with black-and-white photos. Given a black-and-white image, our imagination module firstly extracts the context information, which is then used to synthesize colorful and diverse images using a conditional image synthesis network (e.g., semantic image synthesis model). We then design a colorization module to colorize the black-and-white images with the guidance of imagination for photorealistic colorization. Experimental results show that our work produces more colorful and diverse results than state-of-the-art image colorization methods. Our source codes will be publicly available.

5. Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models

Jianmo Ni, Gustavo Hernández {Á}brego, Noah Constant, Ji Ma, Keith B. Hall, Daniel Cer, Yinfei Yang

  • retweets: 132, favorites: 76 (08/24/2021 07:55:15)
  • links: abs | pdf
  • cs.CL

We provide the first exploration of text-to-text transformers (T5) sentence embeddings. Sentence embeddings are broadly useful for language processing tasks. While T5 achieves impressive performance on language tasks cast as sequence-to-sequence mapping problems, it is unclear how to produce sentence embeddings from encoder-decoder models. We investigate three methods for extracting T5 sentence embeddings: two utilize only the T5 encoder and one uses the full T5 encoder-decoder model. Our encoder-only models outperforms BERT-based sentence embeddings on both transfer tasks and semantic textual similarity (STS). Our encoder-decoder method achieves further improvement on STS. Scaling up T5 from millions to billions of parameters is found to produce consistent improvements on downstream tasks. Finally, we introduce a two-stage contrastive learning approach that achieves a new state-of-art on STS using sentence embeddings, outperforming both Sentence BERT and SimCSE.

6. GAN Inversion for Out-of-Range Images with Geometric Transformations

Kyoungkook Kang, Seongtae Kim, Sunghyun Cho

  • retweets: 72, favorites: 64 (08/24/2021 07:55:15)
  • links: abs | pdf
  • cs.CV

For successful semantic editing of real images, it is critical for a GAN inversion method to find an in-domain latent code that aligns with the domain of a pre-trained GAN model. Unfortunately, such in-domain latent codes can be found only for in-range images that align with the training images of a GAN model. In this paper, we propose BDInvert, a novel GAN inversion approach to semantic editing of out-of-range images that are geometrically unaligned with the training images of a GAN model. To find a latent code that is semantically editable, BDInvert inverts an input out-of-range image into an alternative latent space than the original latent space. We also propose a regularized inversion method to find a solution that supports semantic editing in the alternative space. Our experiments show that BDInvert effectively supports semantic editing of out-of-range images with geometric transformations.

7. Uniformity Testing in the Shuffle Model: Simpler, Better, Faster

Clément L. Canonne, Hongyi Lyu

Uniformity testing, or testing whether independent observations are uniformly distributed, is the prototypical question in distribution testing. Over the past years, a line of work has been focusing on uniformity testing under privacy constraints on the data, and obtained private and data-efficient algorithms under various privacy models such as central differential privacy (DP), local privacy (LDP), pan-privacy, and, very recently, the shuffle model of differential privacy. In this work, we considerably simplify the analysis of the known uniformity testing algorithm in the shuffle model, and, using a recent result on “privacy amplification via shuffling,” provide an alternative algorithm attaining the same guarantees with an elementary and streamlined argument.