Hot Papers 2021-05-19

1. Relative Positional Encoding for Transformers with Linear Complexity

Antoine Liutkus, Ondřej Cífka, Shih-Lun Wu, Umut Şimşekli, Yi-Hsuan Yang, Gaël Richard

retweets: 797, favorites: 136 (05/20/2021 10:03:17)
links: abs | pdf
cs.LG | cs.CL | cs.SD | eess.AS | stat.ML

Recent advances in Transformer models allow for unprecedented sequence lengths, due to linear space and time complexity. In the meantime, relative positional encoding (RPE) was proposed as beneficial for classical Transformers and consists in exploiting lags instead of absolute positions for inference. Still, RPE is not available for the recent linear-variants of the Transformer, because it requires the explicit computation of the attention matrix, which is precisely what is avoided by such methods. In this paper, we bridge this gap and present Stochastic Positional Encoding as a way to generate PE that can be used as a replacement to the classical additive (sinusoidal) PE and provably behaves like RPE. The main theoretical contribution is to make a connection between positional encoding and cross-covariance structures of correlated Gaussian processes. We illustrate the performance of our approach on the Long-Range Arena benchmark and on music generation.

Relative Positional Encoding for Transformers with Linear Complexity
pdf: https://t.co/9pcRuWsYWU
abs: https://t.co/XlR6XupOBF
project page: https://t.co/YDr3QHo38N

way to generate PE that can be used as a replacement to the classical additive PE and provably behaves like RPE pic.twitter.com/v3YIHouVJc
— AK (@ak92501) May 19, 2021

2. BookSum: A Collection of Datasets for Long-form Narrative Summarization

Wojciech Kryściński, Nazneen Rajani, Divyansh Agarwal, Caiming Xiong, Dragomir Radev

retweets: 518, favorites: 97 (05/20/2021 10:03:17)
links: abs | pdf
cs.CL

The majority of available text summarization datasets include short-form source documents that lack long-range causal and temporal dependencies, and often contain strong layout and stylistic biases. While relevant, such datasets will offer limited challenges for future generations of text summarization systems. We address these issues by introducing BookSum, a collection of datasets for long-form narrative summarization. Our dataset covers source documents from the literature domain, such as novels, plays and stories, and includes highly abstractive, human written summaries on three levels of granularity of increasing difficulty: paragraph-, chapter-, and book-level. The domain and structure of our dataset poses a unique set of challenges for summarization systems, which include: processing very long documents, non-trivial causal and temporal dependencies, and rich discourse structures. To facilitate future work, we trained and evaluated multiple extractive and abstractive summarization models as baselines for our dataset.

BookSum: A Collection of Datasets for Long-form Narrative Summarization
pdf: https://t.co/vGGWkKY6Ji
abs: https://t.co/LG8gIuHhWN
github: https://t.co/YFKHW0hx1e

includes annotations on three levels of granularity of increasing difficulty: paragraph, chapter, and full-book pic.twitter.com/moistKhAxm
— AK (@ak92501) May 19, 2021

Excited to share our new paper “BookSum: A Collection of Datasets for Long-form Narrative Summarization" 📚📖🤖

w/ @nazneenrajani, @jigsaw2212, @CaimingXiong, and Dragomir Radev

Paper: https://t.co/NpiG04OwVT
Code: https://t.co/6YjxNfKV9w

Thread 🧵:
— Wojciech Kryściński (@iam_wkr) May 19, 2021

3. Image Cropping on Twitter: Fairness Metrics, their Limitations, and the Importance of Representation, Design, and Agency

Kyra Yee, Uthaipon Tantipongpipat, Shubhanshu Mishra

retweets: 210, favorites: 112 (05/20/2021 10:03:17)
links: abs | pdf
cs.CY | cs.CV | cs.HC | cs.LG

Twitter uses machine learning to crop images, where crops are centered around the part predicted to be the most salient. In fall 2020, Twitter users raised concerns that the automated image cropping system on Twitter favored light-skinned over dark-skinned individuals, as well as concerns that the system favored cropping woman’s bodies instead of their heads. In order to address these concerns, we conduct an extensive analysis using formalized group fairness metrics. We find systematic disparities in cropping and identify contributing factors, including the fact that the cropping based on the single most salient point can amplify the disparities. However, we demonstrate that formalized fairness metrics and quantitative analysis on their own are insufficient for capturing the risk of representational harm in automatic cropping. We suggest the removal of saliency-based cropping in favor of a solution that better preserves user agency. For developing a new solution that sufficiently address concerns related to representational harm, our critique motivates a combination of quantitative and qualitative methods that include human-centered design.

As part of our commitment to transparency, we’ve also published our analysis on ArXiv and are sharing our source code so you can reproduce and better our analysis.

Paper: https://t.co/Fn4BW5wYBk
Code: https://t.co/MqE9LgFEDO
— Twitter Engineering (@TwitterEng) May 19, 2021

4. Finding an Unsupervised Image Segmenter in Each of Your Deep Generative Models

Luke Melas-Kyriazi, Christian Rupprecht, Iro Laina, Andrea Vedaldi

retweets: 210, favorites: 96 (05/20/2021 10:03:17)
links: abs | pdf
cs.CV | cs.AI

Recent research has shown that numerous human-interpretable directions exist in the latent space of GANs. In this paper, we develop an automatic procedure for finding directions that lead to foreground-background image separation, and we use these directions to train an image segmentation model without human supervision. Our method is generator-agnostic, producing strong segmentation results with a wide range of different GAN architectures. Furthermore, by leveraging GANs pretrained on large datasets such as ImageNet, we are able to segment images from a range of domains without further training or finetuning. Evaluating our method on image segmentation benchmarks, we compare favorably to prior work while using neither human supervision nor access to the training data. Broadly, our results demonstrate that automatically extracting foreground-background structure from pretrained deep generative models can serve as a remarkably effective substitute for human supervision.

Finding an Unsupervised Image Segmenter in Each of Your Deep Generative Models
pdf: https://t.co/5aw1Z3ufzK
abs: https://t.co/3rbmfBuMPn
project page: https://t.co/GpkEiiKbQJ pic.twitter.com/QCv2Da0zaD
— AK (@ak92501) May 19, 2021

5. Identifying Undercompensated Groups Defined By Multiple Attributes in Risk Adjustment

Anna Zink, Sherri Rose

retweets: 256, favorites: 44 (05/20/2021 10:03:18)
links: abs | pdf
stat.AP | cs.CY

Risk adjustment in health care aims to redistribute payments to insurers based on costs. However, risk adjustment formulas are known to underestimate costs for some groups of patients. This undercompensation makes these groups unprofitable to insurers and creates incentives for insurers to discriminate. We develop a machine learning method for “group importance” to identify unprofitable groups defined by multiple attributes, improving on the arbitrary nature of existing evaluations. This procedure was designed to evaluate the risk adjustment formulas used in the U.S. health insurance Marketplaces as well as Medicare, and we find a number of previously unidentified undercompensated groups. Our work provides policy makers with new information on potential targets of discrimination in the health care system and a path towards more equitable health coverage.

New preprint on identifying undercompensated groups defined by multiple attributes (w/Anna Zink) https://t.co/IvJOUI83ho

We construct a group importance measure & find previously unidentified groups at risk of discrimination in the healthcare system

Code https://t.co/9BDPDquS5j pic.twitter.com/4na8eLty8l
— Sherri Rose (@sherrirose) May 19, 2021

Jonah Siekmann, Kevin Green, John Warila, Alan Fern, Jonathan Hurst

retweets: 148, favorites: 80 (05/20/2021 10:03:18)
links: abs | pdf
cs.RO

Accurate and precise terrain estimation is a difficult problem for robot locomotion in real-world environments. Thus, it is useful to have systems that do not depend on accurate estimation to the point of fragility. In this paper, we explore the limits of such an approach by investigating the problem of traversing stair-like terrain without any external perception or terrain models on a bipedal robot. For such blind bipedal platforms, the problem appears difficult (even for humans) due to the surprise elevation changes. Our main contribution is to show that sim-to-real reinforcement learning (RL) can achieve robust locomotion over stair-like terrain on the bipedal robot Cassie using only proprioceptive feedback. Importantly, this only requires modifying an existing flat-terrain training RL framework to include stair-like terrain randomization, without any changes in reward function. To our knowledge, this is the first controller for a bipedal, human-scale robot capable of reliably traversing a variety of real-world stairs and other stair-like disturbances using only proprioception.

Blind Bipedal Stair Traversal via Sim-to-Real Reinforcement Learninghttps://t.co/ivFzTBvbCU https://t.co/qCEMxlglL9 https://t.co/vmIY6qw5Xn pic.twitter.com/xx2j0Ge3QS
— sim2real (@sim2realAIorg) May 19, 2021

Blind Bipedal Stair Traversal via Sim-to-Real Reinforcement Learning
pdf: https://t.co/RSStkxOL3d
abs: https://t.co/doWahODv4u pic.twitter.com/2bSfzGqdEw
— AK (@ak92501) May 19, 2021

7. Parallel and Flexible Sampling from Autoregressive Models via Langevin Dynamics

Vivek Jayaram, John Thickstun

retweets: 132, favorites: 43 (05/20/2021 10:03:18)
links: abs | pdf
cs.LG | cs.SD | eess.AS | stat.ML

This paper introduces an alternative approach to sampling from autoregressive models. Autoregressive models are typically sampled sequentially, according to the transition dynamics defined by the model. Instead, we propose a sampling procedure that initializes a sequence with white noise and follows a Markov chain defined by Langevin dynamics on the global log-likelihood of the sequence. This approach parallelizes the sampling process and generalizes to conditional sampling. Using an autoregressive model as a Bayesian prior, we can steer the output of a generative model using a conditional likelihood or constraints. We apply these techniques to autoregressive models in the visual and audio domains, with competitive results for audio source separation, super-resolution, and inpainting.

Parallel and Flexible Sampling from Autoregressive Models via Langevin Dynamics
pdf: https://t.co/40kwXydB2f
abs: https://t.co/vHH7vNcCRc
project page: https://t.co/mrx3K3p73I
github: https://t.co/Fj7ZWiNTOI pic.twitter.com/3lWMSrnfqD
— AK (@ak92501) May 19, 2021

8. A Measure of Research Taste

Vladlen Koltun, David Hafner

retweets: 97, favorites: 74 (05/20/2021 10:03:18)
links: abs | pdf
cs.DL | cs.AI

Researchers are often evaluated by citation-based metrics. Such metrics can inform hiring, promotion, and funding decisions. Concerns have been expressed that popular citation-based metrics incentivize researchers to maximize the production of publications. Such incentives may not be optimal for scientific progress. Here we present a citation-based measure that rewards both productivity and taste: the researcher’s ability to focus on impactful contributions. The presented measure, CAP, balances the impact of publications and their quantity, thus incentivizing researchers to consider whether a publication is a useful addition to the literature. CAP is simple, interpretable, and parameter-free. We analyze the characteristics of CAP for highly-cited researchers in biology, computer science, economics, and physics, using a corpus of millions of publications and hundreds of millions of citations with yearly temporal granularity. CAP produces qualitatively plausible outcomes and has a number of advantages over prior metrics. Results can be explored at https://cap-measure.org/

A Measure of Research Taste
pdf: https://t.co/XhtoezVihm
abs: https://t.co/R6eXxUhz27
project page: https://t.co/E8Nnk0VC0t

a citation-based measure that rewards both productivity and taste, balances impact of publications and their quantity pic.twitter.com/uRb4WWOE6P
— AK (@ak92501) May 19, 2021

9. Finding a Needle in a Haystack: Tiny Flying Object Detection in 4K Videos using a Joint Detection-and-Tracking Approach

Ryota Yoshihashi, Rei Kawakami, Shaodi You, Tu Tuan Trinh, Makoto Iida, Takeshi Naemura

retweets: 100, favorites: 47 (05/20/2021 10:03:18)
links: abs | pdf
cs.CV

Detecting tiny objects in a high-resolution video is challenging because the visual information is little and unreliable. Specifically, the challenge includes very low resolution of the objects, MPEG artifacts due to compression and a large searching area with many hard negatives. Tracking is equally difficult because of the unreliable appearance, and the unreliable motion estimation. Luckily, we found that by combining this two challenging tasks together, there will be mutual benefits. Following the idea, in this paper, we present a neural network model called the Recurrent Correlational Network, where detection and tracking are jointly performed over a multi-frame representation learned through a single, trainable, and end-to-end network. The framework exploits a convolutional long short-term memory network for learning informative appearance changes for detection, while the learned representation is shared in tracking for enhancing its performance. In experiments with datasets containing images of scenes with small flying objects, such as birds and unmanned aerial vehicles, the proposed method yielded consistent improvements in detection performance over deep single-frame detectors and existing motion-based detectors. Furthermore, our network performs as well as state-of-the-art generic object trackers when it was evaluated as a tracker on a bird image dataset.

Finding a Needle in a Haystack: Tiny Flying Object Detection in 4K Videos using a Joint Detection-and-Tracking Approach
pdf: https://t.co/6fbzvZW7yp
abs: https://t.co/egUWswFSgq
project page: https://t.co/kigfpt4wBR pic.twitter.com/f5hmcfje2w
— AK (@ak92501) May 19, 2021

10. Exemplar-Based Open-Set Panoptic Segmentation Network

Jaedong Hwang, Seoung Wug Oh, Joon-Young Lee, Bohyung Han

retweets: 102, favorites: 37 (05/20/2021 10:03:18)
links: abs | pdf
cs.CV

We extend panoptic segmentation to the open-world and introduce an open-set panoptic segmentation (OPS) task. This task requires performing panoptic segmentation for not only known classes but also unknown ones that have not been acknowledged during training. We investigate the practical challenges of the task and construct a benchmark on top of an existing dataset, COCO. In addition, we propose a novel exemplar-based open-set panoptic segmentation network (EOPSN) inspired by exemplar theory. Our approach identifies a new class based on exemplars, which are identified by clustering and employed as pseudo-ground-truths. The size of each class increases by mining new exemplars based on the similarities to the existing ones associated with the class. We evaluate EOPSN on the proposed benchmark and demonstrate the effectiveness of our proposals. The primary goal of our work is to draw the attention of the community to the recognition in the open-world scenarios. The implementation of our algorithm is available on the project webpage: https://cv.snu.ac.kr/research/EOPSN.

Exemplar-Based Open-Set Panoptic Segmentation Network
pdf: https://t.co/0l4JQALked
abs: https://t.co/HkBRNiZvDr
project page: https://t.co/jwtZNvBaXm
github: https://t.co/PnswRgWfA9 pic.twitter.com/ztoSapewDr
— AK (@ak92501) May 19, 2021

11. Coach-Player Multi-Agent Reinforcement Learning for Dynamic Team Composition

Bo Liu, Qiang Liu, Peter Stone, Animesh Garg, Yuke Zhu, Animashree Anandkumar

retweets: 90, favorites: 31 (05/20/2021 10:03:18)
links: abs | pdf
cs.AI

In real-world multiagent systems, agents with different capabilities may join or leave without altering the team’s overarching goals. Coordinating teams with such dynamic composition is challenging: the optimal team strategy varies with the composition. We propose COPA, a coach-player framework to tackle this problem. We assume the coach has a global view of the environment and coordinates the players, who only have partial views, by distributing individual strategies. Specifically, we 1) adopt the attention mechanism for both the coach and the players; 2) propose a variational objective to regularize learning; and 3) design an adaptive communication method to let the coach decide when to communicate with the players. We validate our methods on a resource collection task, a rescue game, and the StarCraft micromanagement tasks. We demonstrate zero-shot generalization to new team compositions. Our method achieves comparable or better performance than the setting where all players have a full view of the environment. Moreover, we see that the performance remains high even when the coach communicates as little as 13% of the time using the adaptive communication strategy.

Coach-Player Multi-Agent Reinforcement Learning
for Dynamic Team Composition
pdf: https://t.co/lxRxH35vFO
abs: https://t.co/ZoWypMtnby

COPA achieves strong zero-shot generalization performance with relatively low communication frequency pic.twitter.com/RGvSRGFvnM
— AK (@ak92501) May 19, 2021

12. Understanding the Properties of Minimum Bayes Risk Decoding in Neural Machine Translation

Mathias Müller, Rico Sennrich

retweets: 64, favorites: 54 (05/20/2021 10:03:19)
links: abs | pdf
cs.CL | cs.LG

Neural Machine Translation (NMT) currently exhibits biases such as producing translations that are too short and overgenerating frequent words, and shows poor robustness to copy noise in training data or domain shift. Recent work has tied these shortcomings to beam search — the de facto standard inference algorithm in NMT — and Eikema & Aziz (2020) propose to use Minimum Bayes Risk (MBR) decoding on unbiased samples instead. In this paper, we empirically investigate the properties of MBR decoding on a number of previously reported biases and failure cases of beam search. We find that MBR still exhibits a length and token frequency bias, owing to the MT metrics used as utility functions, but that MBR also increases robustness against copy noise in the training data and domain shift.

#ACL2021NLP paper by @bricksdont: Understanding the Properties of MBR Decoding in NMT

How much is beam search to blame for deficiencies in MT output? MBR decoding still has biases, but is more robust to common failures (copy mode, hallucination).https://t.co/QZl0ppwWBW #NLProc
— Rico Sennrich (@RicoSennrich) May 19, 2021

13. Fast and Slow Learning of Recurrent Independent Mechanisms

Kanika Madan, Nan Rosemary Ke, Anirudh Goyal, Bernhard Schölkopf, Yoshua Bengio

retweets: 56, favorites: 45 (05/20/2021 10:03:19)
links: abs | pdf
cs.LG | cs.AI

Decomposing knowledge into interchangeable pieces promises a generalization advantage when there are changes in distribution. A learning agent interacting with its environment is likely to be faced with situations requiring novel combinations of existing pieces of knowledge. We hypothesize that such a decomposition of knowledge is particularly relevant for being able to generalize in a systematic manner to out-of-distribution changes. To study these ideas, we propose a particular training framework in which we assume that the pieces of knowledge an agent needs and its reward function are stationary and can be re-used across tasks. An attention mechanism dynamically selects which modules can be adapted to the current task, and the parameters of the selected modules are allowed to change quickly as the learner is confronted with variations in what it experiences, while the parameters of the attention mechanisms act as stable, slowly changing, meta-parameters. We focus on pieces of knowledge captured by an ensemble of modules sparsely communicating with each other via a bottleneck of attention. We find that meta-learning the modular aspects of the proposed system greatly helps in achieving faster adaptation in a reinforcement learning setup involving navigation in a partially observed grid world with image-level input. We also find that reversing the role of parameters and meta-parameters does not work nearly as well, suggesting a particular role for fast adaptation of the dynamically selected modules.

Fast and Slow Learning of Recurrent Independent Mechanisms
pdf: https://t.co/rV9BG4LftF
abs: https://t.co/VfYPLBTIyZ pic.twitter.com/JOKHcrz2GW
— AK (@ak92501) May 19, 2021

Michael Scherbela, Rafael Reisenhofer, Leon Gerard, Philipp Marquetand, Philipp Grohs

retweets: 40, favorites: 49 (05/20/2021 10:03:19)
links: abs | pdf
physics.comp-ph | cs.LG | physics.chem-ph

Accurate numerical solutions for the Schr”odinger equation are of utmost importance in quantum chemistry. However, the computational cost of current high-accuracy methods scales poorly with the number of interacting particles. Combining Monte Carlo methods with unsupervised training of neural networks has recently been proposed as a promising approach to overcome the curse of dimensionality in this setting and to obtain accurate wavefunctions for individual molecules at a moderately scaling computational cost. These methods currently do not exploit the regularity exhibited by wavefunctions with respect to their molecular geometries. Inspired by recent successful applications of deep transfer learning in machine translation and computer vision tasks, we attempt to leverage this regularity by introducing a weight-sharing constraint when optimizing neural network-based models for different molecular geometries. That is, we restrict the optimization process such that up to 95 percent of weights in a neural network model are in fact equal across varying molecular geometries. We find that this technique can accelerate optimization when considering sets of nuclear geometries of the same molecule by an order of magnitude and that it opens a promising route towards pre-trained neural network wavefunctions that yield high accuracy even across different molecules.

'Solving the electronic Schrödinger equation for multiple nuclear geometries with weight-sharing deep neural networks' - Promising work from @marquetand and co-workers from @univienna_vda https://t.co/v8oNMuB53t
Should be of interest to @jhrmnn @FrankNoeBerlin @pfau #compchem
— Prof von Lilienfeld (@ProfvLilienfeld) May 19, 2021

15. Learning and Certification under Instance-targeted Poisoning

Ji Gao, Amin Karbasi, Mohammad Mahmoody

retweets: 36, favorites: 50 (05/20/2021 10:03:19)
links: abs | pdf
cs.LG | cs.CR

In this paper, we study PAC learnability and certification under instance-targeted poisoning attacks, where the adversary may change a fraction of the training set with the goal of fooling the learner at a specific target instance. Our first contribution is to formalize the problem in various settings, and explicitly discussing subtle aspects such as learner’s randomness and whether (or not) adversary’s attack can depend on it. We show that when the budget of the adversary scales sublinearly with the sample complexity, PAC learnability and certification are achievable. In contrast, when the adversary’s budget grows linearly with the sample complexity, the adversary can potentially drive up the expected 0-1 loss to one. We further extend our results to distribution-specific PAC learning in the same attack model and show that proper learning with certification is possible for learning halfspaces under Gaussian distribution. Finally, we empirically study the robustness of K nearest neighbour, logistic regression, multi-layer perceptron, and convolutional neural network on real data sets, and test them against targeted-poisoning attacks. Our experimental results show that many models, especially state-of-the-art neural networks, are indeed vulnerable to these strong attacks. Interestingly, we observe that methods with high standard accuracy might be more vulnerable to instance-targeted poisoning attacks.

In this work, just accepted to UAI 2021, we studcy PAC learnability and certification under poisoning attacks. We provide sharp results for when learning is possible/impossible and when the predictions can be certified. https://t.co/vCP5Wt6140 pic.twitter.com/w0k0CBMFT8
— Amin Karbasi (@aminkarbasi) May 19, 2021

Published 20 May 2021

ML Lead at Beatrust. (https://beatrust.com)Tatsuya Shirakawa on Twitter