Hot Papers 2021-02-10

1. Introduction to Machine Learning for the Sciences

Titus Neupert, Mark H Fischer, Eliska Greplova, Kenny Choo, Michael Denner

retweets: 678, favorites: 116 (02/12/2021 09:00:52)
links: abs | pdf
physics.comp-ph | cond-mat.dis-nn | cs.LG

This is an introductory machine learning course specifically developed with STEM students in mind. We discuss supervised, unsupervised, and reinforcement learning. The notes start with an exposition of machine learning methods without neural networks, such as principle component analysis, t-SNE, and linear regression. We continue with an introduction to both basic and advanced neural network structures such as conventional neural networks, (variational) autoencoders, generative adversarial networks, restricted Boltzmann machines, and recurrent neural networks. Questions of interpretability are discussed using the examples of dreaming and adversarial attacks.

Super excited to share the "Machine Learning for Scientists" course we developed together with @UZHMat - can't wait to launch it @tudelft 🚀Jupyterbook here: https://t.co/ALxjoM1ie8 Lecture notes here: https://t.co/HYDeEJAqzL pic.twitter.com/bZh1hMTrxz
— Eliska Greplova (@EliskaGreplova) February 11, 2021

2. Reverb: A Framework For Experience Replay

Albin Cassirer, Gabriel Barth-Maron, Eugene Brevdo, Sabela Ramos, Toby Boyd, Thibault Sottiaux, Manuel Kroiss

retweets: 646, favorites: 142 (02/12/2021 09:00:52)
links: abs | pdf
cs.LG | cs.AI | cs.DC

A central component of training in Reinforcement Learning (RL) is Experience: the data used for training. The mechanisms used to generate and consume this data have an important effect on the performance of RL algorithms. In this paper, we introduce Reverb: an efficient, extensible, and easy to use system designed specifically for experience replay in RL. Reverb is designed to work efficiently in distributed configurations with up to thousands of concurrent clients. The flexible API provides users with the tools to easily and accurately configure the replay buffer. It includes strategies for selecting and removing elements from the buffer, as well as options for controlling the ratio between sampled and inserted elements. This paper presents the core design of Reverb, gives examples of how it can be applied, and provides empirical results of Reverb’s performance characteristics.

Reverb is an efficient, extensible and easy to use system designed specifically for experience replay in RL. In a new paper, our team presents the core design, examples of how it can be applied & empirical results of Reverb's performance characteristics: https://t.co/DgcIWJqceV https://t.co/rSQ2NLfIXC
— DeepMind (@DeepMind) February 11, 2021

3. SwiftNet: Real-time Video Object Segmentation

Haochen Wang, Xiaolong Jiang, Haibing Ren, Yao Hu, Song Bai

retweets: 362, favorites: 64 (02/12/2021 09:00:53)
links: abs | pdf
cs.CV

In this work we present SwiftNet for real-time semi-supervised video object segmentation (one-shot VOS), which reports 77.8% J&F and 70 FPS on DAVIS 2017 validation dataset, leading all present solutions in overall accuracy and speed performance. We achieve this by elaborately compressing spatiotemporal redundancy in matching-based VOS via Pixel-Adaptive Memory (PAM). Temporally, PAM adaptively triggers memory updates on frames where objects display noteworthy inter-frame variations. Spatially, PAM selectively performs memory update and match on dynamic pixels while ignoring the static ones, significantly reducing redundant computations wasted on segmentation-irrelevant pixels. To promote efficient reference encoding, light-aggregation encoder is also introduced in SwiftNet deploying reversed sub-pixel. We hope SwiftNet could set a strong and efficient baseline for real-time VOS and facilitate its application in mobile vision.

SwiftNet: Real-time Video Object Segmentation
pdf: https://t.co/BTXA2eonez
abs: https://t.co/HcUJ6SGO7O pic.twitter.com/EicOS5bhbh
— AK (@ak92501) February 10, 2021

4. Bayesian Transformer Language Models for Speech Recognition

Boyang Xue, Jianwei Yu, Junhao Xu, Shansong Liu, Shoukang Hu, Zi Ye, Mengzhe Geng, Xunying Liu, Helen Meng

retweets: 310, favorites: 101 (02/12/2021 09:00:53)
links: abs | pdf
cs.CL

State-of-the-art neural language models (LMs) represented by Transformers are highly complex. Their use of fixed, deterministic parameter estimates fail to account for model uncertainty and lead to over-fitting and poor generalization when given limited training data. In order to address these issues, this paper proposes a full Bayesian learning framework for Transformer LM estimation. Efficient variational inference based approaches are used to estimate the latent parameter posterior distributions associated with different parts of the Transformer model architecture including multi-head self-attention, feed forward and embedding layers. Statistically significant word error rate (WER) reductions up to 0.5% absolute (3.18% relative) and consistent perplexity gains were obtained over the baseline Transformer LMs on state-of-the-art Switchboard corpus trained LF-MMI factored TDNN systems with i-Vector speaker adaptation. Performance improvements were also obtained on a cross domain LM adaptation task requiring porting a Transformer LM trained on the Switchboard and Fisher data to a low-resource DementiaBank elderly speech corpus.

Bayesian Transformer Language Models for Speech Recognition
pdf: https://t.co/JTcqOzTRqV
abs: https://t.co/L90zMGjNmC pic.twitter.com/ghiNa2asGp
— AK (@ak92501) February 10, 2021

5. Transparency to hybrid open access through publisher-provided metadata: An article-level study of Elsevier

Najko Jahn, Lisa Matthias, Mikael Laakso

retweets: 344, favorites: 29 (02/12/2021 09:00:53)
links: abs | pdf
cs.DL

With the growth of open access (OA), the financial flows in scholarly journal publishing have become increasingly complex, but comprehensive data and transparency into these flows are still lacking. The opaqueness is especially concerning for hybrid OA, where subscription-based journals publish individual articles as OA if an optional fee is paid. This study addresses the lack of transparency by leveraging Elsevier article metadata and provides the first publisher-level study of hybrid OA uptake and invoicing. Our results show that Elsevier’s hybrid OA uptake has grown steadily but slowly from 2015-2019, doubling the number of hybrid OA articles published per year and increasing the share of OA articles in Elsevier’s hybrid journals from 2.6% to 3.7% of all articles. Further, we find that most hybrid OA articles were invoiced directly to authors, followed by articles invoiced through agreements with research funders, institutions, or consortia, with only a few funding bodies driving hybrid OA uptake. As such, our findings point to the role of publishing agreements and OA policies in hybrid OA publishing. Our results further demonstrate the value of publisher-provided metadata to improve the transparency in scholarly publishing by linking invoicing data to bibliometrics.

In a new preprint, @l_matthia, @mikaellaakso and I investigate Elsevier's hybrid #openaccess uptake and invoicing. Between 2015-19, uptake was at ~3%, and APCs were most often invoiced to the authors, followed by agreements and some waivers.https://t.co/jDAF8FmbNS pic.twitter.com/yHn6EN8B09
— Najko Jahn (@najkoja) February 10, 2021

6. Bootstrapping Relation Extractors using Syntactic Search by Examples

Matan Eyal, Asaf Amrami, Hillel Taub-Tabib, Yoav Goldberg

retweets: 56, favorites: 25 (02/12/2021 09:00:53)
links: abs | pdf
cs.CL

The advent of neural-networks in NLP brought with it substantial improvements in supervised relation extraction. However, obtaining a sufficient quantity of training data remains a key challenge. In this work we propose a process for bootstrapping training datasets which can be performed quickly by non-NLP-experts. We take advantage of search engines over syntactic-graphs (Such as Shlain et al. (2020)) which expose a friendly by-example syntax. We use these to obtain positive examples by searching for sentences that are syntactically similar to user input examples. We apply this technique to relations from TACRED and DocRED and show that the resulting models are competitive with models trained on manually annotated data and on data obtained from distant supervision. The models also outperform models trained using NLG data augmentation techniques. Extending the search-based approach with the NLG method further improves the results.

How do you train a relation extractor without training data?
We show simple and effective bootstrap approaches that work with as little as 3 user-provided examples.

Our EACL is now available on https://t.co/fw2Xvp7Jko
With @asaf_amr @hilleltt and @yoavgo pic.twitter.com/Z3jkJFu8TA
— Matan Eyal (@mataneyal1) February 11, 2021

7. When does gradient descent with logistic loss interpolate using deep networks with smoothed ReLU activations?

Niladri S. Chatterji, Philip M. Long, Peter L. Bartlett

retweets: 25, favorites: 40 (02/12/2021 09:00:53)
links: abs | pdf
stat.ML | cs.AI | cs.LG | math.OC

We establish conditions under which gradient descent applied to fixed-width deep networks drives the logistic loss to zero, and prove bounds on the rate of convergence. Our analysis applies for smoothed approximations to the ReLU, such as Swish and the Huberized ReLU, proposed in previous applied work. We provide two sufficient conditions for convergence. The first is simply a bound on the loss at initialization. The second is a data separation condition used in prior analyses.

New paper with @niladrichat and Peter Bartlett called "When does gradient descent with logistic loss interpolate using deep networks with smoothed ReLU activations?": https://t.co/NVEE7F0WtC.
— Phil Long (@philipmlong) February 10, 2021

8. Quantum Technologies: A Review of the Patent Landscape

Mathew Alex

retweets: 42, favorites: 18 (02/12/2021 09:00:53)
links: abs | pdf
cs.DL | quant-ph

Quantum Technologies is a term that is getting broader with every passing year. Nanotechnology and electronics operate in this realm. With the invention of industry-disrupting algorithms like Shor’s algorithm that can break RSA encryption on a quantum computer and Quantum Key Distribution, which offers unconditional security in theory, investment is pouring in. Here we taxonomize and analyze 48,577 patents in this area from 2015 to present captured with a comprehensive query in Relecura’s patent database. The author’s subject experience, along with the company’s AI-based tools and scholarly literature, were utilized to make this highly subjective choice of taxonomy. Though most Patent Landscape Analysis Reports consider a single technology, geography, or company, we have tried to give a holistic overview of these technologies as a whole due to their collaborative and intertwined nature. The physics of each technology and its role in the industry is briefly explained where possible.

#キャルちゃんのarXiv読み
量子技術の特許数などのレヴュー。特許数では中国が1位、アメリカが2位。日本は5位。https://t.co/Y3eNPW9Az5 pic.twitter.com/O95HfJOxGL
— 🇺🇸キャルちゃん、移住7ヶ月目。 (@tweet_nakasho) February 10, 2021

9. MALI: A memory efficient and reverse accurate integrator for Neural ODEs

Juntang Zhuang, Nicha C. Dvornek, Sekhar Tatikonda, James S. Duncan

retweets: 30, favorites: 29 (02/12/2021 09:00:53)
links: abs | pdf
cs.LG

Neural ordinary differential equations (Neural ODEs) are a new family of deep-learning models with continuous depth. However, the numerical estimation of the gradient in the continuous case is not well solved: existing implementations of the adjoint method suffer from inaccuracy in reverse-time trajectory, while the naive method and the adaptive checkpoint adjoint method (ACA) have a memory cost that grows with integration time. In this project, based on the asynchronous leapfrog (ALF) solver, we propose the Memory-efficient ALF Integrator (MALI), which has a constant memory cost \textit{w.r.t} number of solver steps in integration similar to the adjoint method, and guarantees accuracy in reverse-time trajectory (hence accuracy in gradient estimation). We validate MALI in various tasks: on image recognition tasks, to our knowledge, MALI is the first to enable feasible training of a Neural ODE on ImageNet and outperform a well-tuned ResNet, while existing methods fail due to either heavy memory burden or inaccuracy; for time series modeling, MALI significantly outperforms the adjoint method; and for continuous generative models, MALI achieves new state-of-the-art performance.

MALI: A memory efficient and reverse accurate integrator for Neural ODEs
pdf: https://t.co/50sXvExg7Q
abs: https://t.co/pIlJ0xObt7 pic.twitter.com/3fHSpdd9Uh
— AK (@ak92501) February 10, 2021

10. What we are is more than what we do

Larissa Albantakis, Giulio Tononi

retweets: 42, favorites: 9 (02/12/2021 09:00:53)
links: abs | pdf
q-bio.NC | cs.AI

If we take the subjective character of consciousness seriously, consciousness becomes a matter of “being” rather than “doing”. Because “doing” can be dissociated from “being”, functional criteria alone are insufficient to decide whether a system possesses the necessary requirements for being a physical substrate of consciousness. The dissociation between “being” and “doing” is most salient in artificial general intelligence, which may soon replicate any human capacity: computers can perform complex functions (in the limit resembling human behavior) in the absence of consciousness. Complex behavior becomes meaningless if it is not performed by a conscious being.

Published 12 Feb 2021

ML Lead at Beatrust. (https://beatrust.com)Tatsuya Shirakawa on Twitter