1. Fastformer: Additive Attention is All You Need
Chuhan Wu, Fangzhao Wu, Tao Qi, Yongfeng Huang
Transformer is a powerful model for text understanding. However, it is inefficient due to its quadratic complexity to input sequence length. Although there are many methods on Transformer acceleration, they are still either inefficient on long sequences or not effective enough. In this paper, we propose Fastformer, which is an efficient Transformer model based on additive attention. In Fastformer, instead of modeling the pair-wise interactions between tokens, we first use additive attention mechanism to model global contexts, and then further transform each token representation based on its interaction with global context representations. In this way, Fastformer can achieve effective context modeling with linear complexity. Extensive experiments on five datasets show that Fastformer is much more efficient than many existing Transformer models and can meanwhile achieve comparable or even better long text modeling performance.
Fastformer: Additive Attention is All You Need
— AK (@ak92501) August 23, 2021
pdf: https://t.co/HelF2hT4Te
abs: https://t.co/ch8O4kG6oA
a Transformer variant based on additive attention
that can handle long sequences efficiently with linear complexity pic.twitter.com/GJULdoMd0L
2. Smart Bird: Learnable Sparse Attention for Efficient and Effective Transformer
Chuhan Wu, Fangzhao Wu, Tao Qi, Yongfeng Huang
Transformer has achieved great success in NLP. However, the quadratic complexity of the self-attention mechanism in Transformer makes it inefficient in handling long sequences. Many existing works explore to accelerate Transformers by computing sparse self-attention instead of a dense one, which usually attends to tokens at certain positions or randomly selected tokens. However, manually selected or random tokens may be uninformative for context modeling. In this paper, we propose Smart Bird, which is an efficient and effective Transformer with learnable sparse attention. In Smart Bird, we first compute a sketched attention matrix with a single-head low-dimensional Transformer, which aims to find potential important interactions between tokens. We then sample token pairs based on their probability scores derived from the sketched attention matrix to generate different sparse attention index matrices for different attention heads. Finally, we select token embeddings according to the index matrices to form the input of sparse attention networks. Extensive experiments on six benchmark datasets for different tasks validate the efficiency and effectiveness of Smart Bird in text modeling.
Smart Bird: Learnable Sparse Attention for Efficient and Effective Transformer
— AK (@ak92501) August 23, 2021
abs: https://t.co/bu8NDj4Buc
propose an efficient and effective Transformer variant named Smart Bird, which can smartly attend to important token pairs based on a learnable sparse attention mechanism pic.twitter.com/QfdkE2Vyv6
3. An Empirical Cybersecurity Evaluation of GitHub Copilot’s Code Contributions
Hammond Pearce, Baleegh Ahmad, Benjamin Tan, Brendan Dolan-Gavitt, Ramesh Karri
There is burgeoning interest in designing AI-based systems to assist humans in designing computing systems, including tools that automatically generate computer code. The most notable of these comes in the form of the first self-described `AI pair programmer’, GitHub Copilot, a language model trained over open-source GitHub code. However, code often contains bugs - and so, given the vast quantity of unvetted code that Copilot has processed, it is certain that the language model will have learned from exploitable, buggy code. This raises concerns on the security of Copilot’s code contributions. In this work, we systematically investigate the prevalence and conditions that can cause GitHub Copilot to recommend insecure code. To perform this analysis we prompt Copilot to generate code in scenarios relevant to high-risk CWEs (e.g. those from MITRE’s “Top 25” list). We explore Copilot’s performance on three distinct code generation axes — examining how it performs given diversity of weaknesses, diversity of prompts, and diversity of domains. In total, we produce 89 different scenarios for Copilot to complete, producing 1,692 programs. Of these, we found approximately 40% to be vulnerable.
An Empirical Cybersecurity Evaluation of GitHub Copilot’s Code Contributions
— AK (@ak92501) August 23, 2021
pdf: https://t.co/HQpCOxkkqX
abs: https://t.co/k8TSAB6FLi
produce 89 different scenarios for Copilot to complete, producing 1,692 programs. Of these, found approximately 40 % to be vulnerable pic.twitter.com/BhQVzyVF5U
4. Towards Photorealistic Colorization by Imagination
Chenyang Lei, Yue Wu, Qifeng Chen
We present a novel approach to automatic image colorization by imitating the imagination process of human experts. Our imagination module is designed to generate color images that are context-correlated with black-and-white photos. Given a black-and-white image, our imagination module firstly extracts the context information, which is then used to synthesize colorful and diverse images using a conditional image synthesis network (e.g., semantic image synthesis model). We then design a colorization module to colorize the black-and-white images with the guidance of imagination for photorealistic colorization. Experimental results show that our work produces more colorful and diverse results than state-of-the-art image colorization methods. Our source codes will be publicly available.
Towards Photorealistic Colorization by Imagination
— AK (@ak92501) August 23, 2021
pdf: https://t.co/4NPefmh6FA
abs: https://t.co/06Qig0CqZr pic.twitter.com/tPleHnctp9
5. Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models
Jianmo Ni, Gustavo Hernández {Á}brego, Noah Constant, Ji Ma, Keith B. Hall, Daniel Cer, Yinfei Yang
We provide the first exploration of text-to-text transformers (T5) sentence embeddings. Sentence embeddings are broadly useful for language processing tasks. While T5 achieves impressive performance on language tasks cast as sequence-to-sequence mapping problems, it is unclear how to produce sentence embeddings from encoder-decoder models. We investigate three methods for extracting T5 sentence embeddings: two utilize only the T5 encoder and one uses the full T5 encoder-decoder model. Our encoder-only models outperforms BERT-based sentence embeddings on both transfer tasks and semantic textual similarity (STS). Our encoder-decoder method achieves further improvement on STS. Scaling up T5 from millions to billions of parameters is found to produce consistent improvements on downstream tasks. Finally, we introduce a two-stage contrastive learning approach that achieves a new state-of-art on STS using sentence embeddings, outperforming both Sentence BERT and SimCSE.
Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models
— AK (@ak92501) August 23, 2021
pdf: https://t.co/NbGb2W1tfk
abs: https://t.co/q4eJor0Zpd
Scaling up T5 from millions to billions of parameters is found to produce consistent improvements on downstream tasks pic.twitter.com/ajZuvk1U3q
6. GAN Inversion for Out-of-Range Images with Geometric Transformations
Kyoungkook Kang, Seongtae Kim, Sunghyun Cho
For successful semantic editing of real images, it is critical for a GAN inversion method to find an in-domain latent code that aligns with the domain of a pre-trained GAN model. Unfortunately, such in-domain latent codes can be found only for in-range images that align with the training images of a GAN model. In this paper, we propose BDInvert, a novel GAN inversion approach to semantic editing of out-of-range images that are geometrically unaligned with the training images of a GAN model. To find a latent code that is semantically editable, BDInvert inverts an input out-of-range image into an alternative latent space than the original latent space. We also propose a regularized inversion method to find a solution that supports semantic editing in the alternative space. Our experiments show that BDInvert effectively supports semantic editing of out-of-range images with geometric transformations.
GAN Inversion for Out-of-Range Images with Geometric Transformations
— AK (@ak92501) August 23, 2021
pdf: https://t.co/2X9R9Cd9O1
abs: https://t.co/PHAPmxHlee pic.twitter.com/raa8XmCOws
7. Uniformity Testing in the Shuffle Model: Simpler, Better, Faster
Clément L. Canonne, Hongyi Lyu
Uniformity testing, or testing whether independent observations are uniformly distributed, is the prototypical question in distribution testing. Over the past years, a line of work has been focusing on uniformity testing under privacy constraints on the data, and obtained private and data-efficient algorithms under various privacy models such as central differential privacy (DP), local privacy (LDP), pan-privacy, and, very recently, the shuffle model of differential privacy. In this work, we considerably simplify the analysis of the known uniformity testing algorithm in the shuffle model, and, using a recent result on “privacy amplification via shuffling,” provide an alternative algorithm attaining the same guarantees with an elementary and streamlined argument.
My first foray into shuffle #privacy, spearheaded by my impressive winter research intern Hongyi Lyu (maths undergrad @UniMelb), who managed to learn about DP+shuffle DP+distribution testing, all this in ~6 weeks!
— Clément Canonne (@ccanonne_) August 23, 2021
Comments welcome! 📝 https://t.co/MuhjHQA7f8
1/4 pic.twitter.com/nkFxir991J