Hot Papers 2021-02-03

1. The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics

Sebastian Gehrmann, Tosin Adewumi, Karmanya Aggarwal, Pawan Sasanka Ammanamanchi, Aremu Anuoluwapo, Antoine Bosselut, Khyathi Raghavi Chandu, Miruna Clinciu, Dipanjan Das, Kaustubh D. Dhole, Wanyu Du, Esin Durmus, Ondřej Dušek, Chris Emezue, Varun Gangal, Cristina Garbacea, Tatsunori Hashimoto, Yufang Hou, Yacine Jernite, Harsh Jhamtani, Yangfeng Ji, Shailza Jolly, Dhruv Kumar, Faisal Ladhak, Aman Madaan, Mounica Maddela, Khyati Mahajan, Saad Mahamood, Bodhisattwa Prasad Majumder, Pedro Henrique Martins, Angelina McMillan-Major, Simon Mille, Emiel van Miltenburg, Moin Nadeem, Shashi Narayan, Vitaly Nikolaev, Rubungo Andre Niyongabo, Salomey Osei, Ankur Parikh, Laura Perez-Beltrachini, Niranjan Ramesh Rao, Vikas Raunak, Juan Diego Rodriguez, Sashank Santhanam, João Sedoc

retweets: 5634, favorites: 209 (02/04/2021 09:36:17)
links: abs | pdf
cs.CL | cs.AI | cs.LG

We introduce GEM, a living benchmark for natural language Generation (NLG), its Evaluation, and Metrics. Measuring progress in NLG relies on a constantly evolving ecosystem of automated metrics, datasets, and human evaluation standards. However, due to this moving target, new models often still evaluate on divergent anglo-centric corpora with well-established, but flawed, metrics. This disconnect makes it challenging to identify the limitations of current models and opportunities for progress. Addressing this limitation, GEM provides an environment in which models can easily be applied to a wide set of corpora and evaluation strategies can be tested. Regular updates to the benchmark will help NLG research become more multilingual and evolve the challenge alongside models. This paper serves as the description of the initial release for which we are organizing a shared task at our ACL 2021 Workshop and to which we invite the entire NLG community to participate.

Introducing 💎GEM, a living benchmark for natural language Generation (NLG), its Evaluation, and Metrics. We are organizing shared tasks for our ACL 2021 workshop - Please consider participating!

Website: https://t.co/TAs4F40mga
Paper: https://t.co/VWdcdNv6iu #NLProc

🧵1/X pic.twitter.com/B20fM3OXhs
— Sebastian Gehrmann (@sebgehr) February 3, 2021

2. Scaling Laws for Transfer

Danny Hernandez, Jared Kaplan, Tom Henighan, Sam McCandlish

retweets: 2912, favorites: 131 (02/04/2021 09:36:17)
links: abs | pdf
cs.LG

We study empirical scaling laws for transfer learning between distributions in an unsupervised, fine-tuning setting. When we train increasingly large neural networks from-scratch on a fixed-size dataset, they eventually become data-limited and stop improving in performance (cross-entropy loss). When we do the same for models pre-trained on a large language dataset, the slope in performance gains is merely reduced rather than going to zero. We calculate the effective data “transferred” from pre-training by determining how much data a transformer of the same size would have required to achieve the same loss when training from scratch. In other words, we focus on units of data while holding everything else fixed. We find that the effective data transferred is described well in the low data regime by a power-law of parameter count and fine-tuning dataset size. We believe the exponents in these power-laws correspond to measures of the generality of a model and proximity of distributions (in a directed rather than symmetric sense). We find that pre-training effectively multiplies the fine-tuning dataset size. Transfer, like overall performance, scales predictably in terms of parameters, data, and compute.

AI systems often use more direct experience than a human could get in a lifetime. Humans require less experience, because they "transfer" past experience to new tasks. Recent work I led found an equation to characterize transfer in a simple setting.https://t.co/2OezsOi8lV

👇 pic.twitter.com/5GVEwV6Z3o
— Danny Hernandez (@Hernandez_Danny) February 3, 2021

Scaling Laws for Transferhttps://t.co/m7Vh5aOTjx pic.twitter.com/5lw9GyOAp5
— hardmaru (@hardmaru) February 3, 2021

3. Deep Online Fused Video Stabilization

Zhenmei Shi, Fuhao Shi, Wei-Sheng Lai, Chia-Kai Liang, Yingyu Liang

retweets: 1058, favorites: 153 (02/04/2021 09:36:17)
links: abs | pdf
cs.CV

We present a deep neural network (DNN) that uses both sensor data (gyroscope) and image content (optical flow) to stabilize videos through unsupervised learning. The network fuses optical flow with real/virtual camera pose histories into a joint motion representation. Next, the LSTM block infers the new virtual camera pose, and this virtual pose is used to generate a warping grid that stabilizes the frame. Novel relative motion representation as well as a multi-stage training process are presented to optimize our model without any supervision. To the best of our knowledge, this is the first DNN solution that adopts both sensor data and image for stabilization. We validate the proposed framework through ablation studies and demonstrated the proposed method outperforms the state-of-art alternative solutions via quantitative evaluations and a user study.

Deep Online Fused Video Stabilization
pdf: https://t.co/JQpSGdfVJe
abs: https://t.co/G24PJsClCi
project page: https://t.co/MOOSqmG5Gg pic.twitter.com/M4tCGKWZcy
— AK (@ak92501) February 3, 2021

4. Metrics and continuity in reinforcement learning

Charline Le Lan, Marc G. Bellemare, Pablo Samuel Castro

retweets: 508, favorites: 76 (02/04/2021 09:36:18)
links: abs | pdf
cs.LG | cs.AI | stat.ML

In most practical applications of reinforcement learning, it is untenable to maintain direct estimates for individual states; in continuous-state systems, it is impossible. Instead, researchers often leverage state similarity (whether explicitly or implicitly) to build models that can generalize well from a limited set of samples. The notion of state similarity used, and the neighbourhoods and topologies they induce, is thus of crucial importance, as it will directly affect the performance of the algorithms. Indeed, a number of recent works introduce algorithms assuming the existence of “well-behaved” neighbourhoods, but leave the full specification of such topologies for future work. In this paper we introduce a unified formalism for defining these topologies through the lens of metrics. We establish a hierarchy amongst these metrics and demonstrate their theoretical implications on the Markov Decision Process specifying the reinforcement learning problem. We complement our theoretical results with empirical evaluations showcasing the differences between the metrics considered.

Excited to present our #AAAI2021 paper “Metrics and continuity in Reinforcement learning” with @marcgbellemare and @pcastr tomorrow ! We propose a new perspective on representation learning and generalization in RL 🌟
📜Paper: https://t.co/NzXEernOUn

Intro Video👇 1/8 pic.twitter.com/r96G8VHQgy
— Charline Le Lan (@charlinelelan) February 3, 2021

5. The Privatization of AI Research(-ers): Causes and Potential Consequences — From university-industry interaction to public research brain-drain?

Roman Jurowetzki, Daniel Hain, Juan Mateos-Garcia, Konstantinos Stathoulopoulos

retweets: 462, favorites: 71 (02/04/2021 09:36:18)
links: abs | pdf
cs.CY

In this paper, we analyze the causes and discuss potential consequences of perceived privatization of AI research, particularly the transition of AI researchers from academia to industry. We explore the scale of the phenomenon by quantifying transition flows between industry and academia, and providing a descriptive account and exploratory analysis of characteristics of industry transition. Here we find that industry researchers and those transitioning into industry produce more impactful research as measured by citations. Using a survival regression approach we identify mechanisms that trigger these university-industry transitions focusing on researcher characteristics, performance, and research field as documented in bibliographic data. We find that researchers working within the field of deep learning as well as those with higher average impact tend to transition into industry. These findings highlight the importance of strengthening academic research in public organizations within AI to balance a potential dominance of private companies and to maintain public supervision of the development and application of this technology.

The privatisation of AI research(ers)

In a new working paper, we map career AI researcher career transitions between academia and industry, providing evidence about the scale, drivers and potential consequences of the AI brain drain. Deeper thread soonhttps://t.co/5VxBk4E97e pic.twitter.com/iU8NI1lm0h
— Juan Mateos Garcia (@JMateosGarcia) February 3, 2021

6. Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search

Federico A. Galatolo, Mario G.C.A. Cimino, Gigliola Vaglini

retweets: 306, favorites: 95 (02/04/2021 09:36:18)
links: abs | pdf
cs.NE | cs.AI | cs.LG

In this research work we present GLaSS, a novel zero-shot framework to generate an image(or a caption) corresponding to a given caption(or image). GLaSS is based on the CLIP neural network which given an image and a descriptive caption provides similar embeddings. Differently, GLaSS takes a caption (or an image) as an input, and generates the image (or the caption) whose CLIP embedding is most similar to the input one. This optimal image (or caption) is produced via a generative network after an exploration by a genetic algorithm. Promising results are shown, based on the experimentation of the image generators BigGAN and StyleGAN2, and of the text generator GPT2.

Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search
pdf: https://t.co/V0qNqUmsPL
abs: https://t.co/PaXhTnhVe2
github: https://t.co/uKdIv9PniN pic.twitter.com/Ze4eBMy5hQ
— AK (@ak92501) February 3, 2021

7. Neural Data Augmentation via Example Extrapolation

Kenton Lee, Kelvin Guu, Luheng He, Tim Dozat, Hyung Won Chung

retweets: 225, favorites: 75 (02/04/2021 09:36:18)
links: abs | pdf
cs.CL | cs.AI

In many applications of machine learning, certain categories of examples may be underrepresented in the training data, causing systems to underperform on such “few-shot” cases at test time. A common remedy is to perform data augmentation, such as by duplicating underrepresented examples, or heuristically synthesizing new examples. But these remedies often fail to cover the full diversity and complexity of real examples. We propose a data augmentation approach that performs neural Example Extrapolation (Ex2). Given a handful of exemplars sampled from some distribution, Ex2 synthesizes new examples that also belong to the same distribution. The Ex2 model is learned by simulating the example generation procedure on data-rich slices of the data, and it is applied to underrepresented, few-shot slices. We apply Ex2 to a range of language understanding tasks and significantly improve over state-of-the-art methods on multiple few-shot learning benchmarks, including for relation extraction (FewRel) and intent classification + slot filling (SNIPS).

Neural Data Augmentation via Example Extrapolation
pdf: https://t.co/YdS0zTtw8V
abs: https://t.co/lieu0BC9Dk pic.twitter.com/YVSiR58mei
— AK (@ak92501) February 3, 2021

8. Reinforcement Learning for Decision-Making and Control in Power Systems: Tutorial, Review, and Vision

Xin Chen, Guannan Qu, Yujie Tang, Steven Low, Na Li

retweets: 188, favorites: 12 (02/04/2021 09:36:18)
links: abs | pdf
cs.LG | cs.AI | eess.SY

With large-scale integration of renewable generation and ubiquitous distributed energy resources (DERs), modern power systems confront a series of new challenges in operation and control, such as growing complexity, increasing uncertainty, and aggravating volatility. While the upside is that more and more data are available owing to the widely-deployed smart meters, smart sensors, and upgraded communication networks. As a result, data-driven control techniques, especially reinforcement learning (RL), have attracted surging attention in recent years. In this paper, we focus on RL and aim to provide a tutorial on various RL techniques and how they can be applied to the decision-making and control in power systems. In particular, we select three key applications, including frequency regulation, voltage control, and energy management, for illustration, and present the typical ways to model and tackle them with RL methods. We conclude by emphasizing two critical issues in the application of RL, i.e., safety and scalability. Several potential future directions are discussed as well.

9. Occluded Video Instance Segmentation

Jiyang Qi, Yan Gao, Xiaoyu Liu, Yao Hu, Xinggang Wang, Xiang Bai, Philip H.S. Torr, Serge Belongie, Alan Yuille, Song Bai

retweets: 112, favorites: 38 (02/04/2021 09:36:18)
links: abs | pdf
cs.CV

Can our video understanding systems perceive objects when a heavy occlusion exists in a scene? To answer this question, we collect a large scale dataset called OVIS for occluded video instance segmentation, that is, to simultaneously detect, segment, and track instances in occluded scenes. OVIS consists of 296k high-quality instance masks from 25 semantic categories, where object occlusions usually occur. While our human vision systems can understand those occluded instances by contextual reasoning and association, our experiments suggest that current video understanding systems are not satisfying. On the OVIS dataset, the highest AP achieved by state-of-the-art algorithms is only 14.4, which reveals that we are still at a nascent stage for understanding objects, instances, and videos in a real-world scenario. Moreover, to complement missing object cues caused by occlusion, we propose a plug-and-play module called temporal feature calibration. Built upon MaskTrack R-CNN and SipMask, we report an AP of 15.2 and 15.0 respectively. The OVIS dataset is released at http://songbai.site/ovis , and the project code will be available soon.

Occluded Video Instance Segmentation
pdf: https://t.co/q7MWpAEmwo
abs: https://t.co/n0w8B4LADa
project page: https://t.co/mGIMdX8K32 pic.twitter.com/gvuMRAAa26
— AK (@ak92501) February 3, 2021

10. Capacity and quantum geometry of parametrized quantum circuits

Tobias Haug, Kishor Bharti, M. S. Kim

retweets: 28, favorites: 46 (02/04/2021 09:36:18)
links: abs | pdf
quant-ph | cs.LG | stat.ML

To harness the potential of noisy intermediate-scale quantum devices, it is paramount to find the best type of circuits to run hybrid quantum-classical algorithms. Key candidates are parametrized quantum circuits that can be effectively implemented on current devices. Here, we evaluate the capacity and trainability of these circuits using the geometric structure of the parameter space via the effective quantum dimension, which reveals the expressive power of circuits in general as well as of particular initialization strategies. We assess the representation power of various popular circuit types and find striking differences depending on the type of entangling gates used. Particular circuits are characterized by scaling laws in their expressiveness. We identify a transition in the quantum geometry of the parameter space, which leads to a decay of the quantum natural gradient for deep circuits. For shallow circuits, the quantum natural gradient can be orders of magnitude larger in value compared to the regular gradient; however, both of them can suffer from vanishing gradients. By tuning a fixed set of circuit parameters to randomized ones, we find a region where the circuit is expressive, but does not suffer from barren plateaus, hinting at a good way to initialize circuits. Our results enhance the understanding of parametrized quantum circuits for improving variational quantum algorithms.

Our new work is out on arXiv!
We use quantum geometry to evaluate the expressive power and initialization strategies of parametrized quantum circuits.
Thanks to @CQT_Kishor, M.S. Kimhttps://t.co/5XShZeRXPA https://t.co/mXBc3Kw7xM
Code available at https://t.co/rEppbFEj47
— Tobias Haug (@TobiasHaug_Q) February 3, 2021

11. Machine-Learned Phase Diagrams of Generalized Kitaev Honeycomb Magnets

Nihal Rao, Ke Liu, Marc Machaczek, Lode Pollet

retweets: 36, favorites: 30 (02/04/2021 09:36:18)
links: abs | pdf
cond-mat.str-el | cond-mat.mtrl-sci | cs.LG

We use a recently developed interpretable and unsupervised machine-learning method, the tensorial kernel support vector machine (TK-SVM), to investigate the low-temperature classical phase diagram of a generalized Heisenberg-Kitaev- $\Gamma$ ( $J$ - $K$ - $\Gamma$ ) model on a honeycomb lattice. Aside from reproducing phases reported by previous quantum and classical studies, our machine finds a hitherto missed nested zigzag-stripy order and establishes the robustness of a recently identified modulated $S_3 \times Z_3$ phase, which emerges through the competition between the Kitaev and $\Gamma$ spin liquids, against Heisenberg interactions. The results imply that, in the restricted parameter space spanned by the three primary exchange interactions — $J$ , $K$ , and $\Gamma$ , the representative Kitaev material $\alpha$ - ${\rm RuCl}_3$ lies close to the interface of several phases, including a simple ferromagnet, and the unconventional $S_3 \times Z_3$ and nested zigzag-stripy magnets. A zigzag order is stabilized by a finite $\Gamma^{\prime}$ and/or $J_3$ term, whereas the four magnetic orders may compete in particular if $\Gamma^{\prime}$ is anti-ferromagnetic.

Machine-Learned Phase Diagrams of Generalized Kitaev Honeycomb Magnets https://t.co/WKOAOCNfpp "our machine finds a hitherto missed nested zigzag-stripy order and establishes the robustness of a recently identified modulated S3×Z3 phase." By Rao, Liu, Machaczek, and Pollet
— Juan Felipe Carrasquilla Álvarez (@carrasqu) February 3, 2021

12. Generative Spoken Language Modeling from Raw Audio

Kushal Lakhotia, Evgeny Kharitonov, Wei-Ning Hsu, Yossi Adi, Adam Polyak, Benjamin Bolte, Tu-Anh Nguyen, Jade Copet, Alexei Baevski, Adelrahman Mohamed, Emmanuel Dupoux

retweets: 19, favorites: 34 (02/04/2021 09:36:18)
links: abs | pdf
cs.CL

Generative spoken language modeling involves learning jointly the acoustic and linguistic characteristics of a language from raw audio only (without text or labels). We introduce metrics to automatically evaluate the generated output in terms of acoustic and linguistic quality in two associated end-to-end tasks, respectively: speech resynthesis (repeating the speech input using the system’s own voice), and speech generation (producing novel speech outputs conditional on a spoken prompt, or unconditionally), and validate these metrics with human judgment. We test baseline systems consisting of a discrete speech encoder (returning discrete, low bitrate, pseudo-text units), a generative language model (trained on pseudo-text units), and a speech decoder (generating a waveform from pseudo-text). By comparing three state-of-the-art unsupervised speech encoders (Contrastive Predictive Coding (CPC), wav2vec 2.0, HuBERT), and varying the number of discrete units (50, 100, 200), we investigate how the generative performance depends on the quality of the learned units as measured by unsupervised metrics (zero-shot probe tasks). We will open source our evaluation stack and baseline models.

Generative Spoken Language Modeling from Raw Audio
pdf: https://t.co/HukhXvOci3
abs: https://t.co/MjP4C4YXww pic.twitter.com/0lhhuL8uKY
— AK (@ak92501) February 3, 2021

Published 4 Feb 2021

ML Lead at Beatrust. (https://beatrust.com)Tatsuya Shirakawa on Twitter