All Articles

Hot Papers 2021-02-10

1. Introduction to Machine Learning for the Sciences

Titus Neupert, Mark H Fischer, Eliska Greplova, Kenny Choo, Michael Denner

This is an introductory machine learning course specifically developed with STEM students in mind. We discuss supervised, unsupervised, and reinforcement learning. The notes start with an exposition of machine learning methods without neural networks, such as principle component analysis, t-SNE, and linear regression. We continue with an introduction to both basic and advanced neural network structures such as conventional neural networks, (variational) autoencoders, generative adversarial networks, restricted Boltzmann machines, and recurrent neural networks. Questions of interpretability are discussed using the examples of dreaming and adversarial attacks.

2. Reverb: A Framework For Experience Replay

Albin Cassirer, Gabriel Barth-Maron, Eugene Brevdo, Sabela Ramos, Toby Boyd, Thibault Sottiaux, Manuel Kroiss

A central component of training in Reinforcement Learning (RL) is Experience: the data used for training. The mechanisms used to generate and consume this data have an important effect on the performance of RL algorithms. In this paper, we introduce Reverb: an efficient, extensible, and easy to use system designed specifically for experience replay in RL. Reverb is designed to work efficiently in distributed configurations with up to thousands of concurrent clients. The flexible API provides users with the tools to easily and accurately configure the replay buffer. It includes strategies for selecting and removing elements from the buffer, as well as options for controlling the ratio between sampled and inserted elements. This paper presents the core design of Reverb, gives examples of how it can be applied, and provides empirical results of Reverb’s performance characteristics.

3. SwiftNet: Real-time Video Object Segmentation

Haochen Wang, Xiaolong Jiang, Haibing Ren, Yao Hu, Song Bai

  • retweets: 362, favorites: 64 (02/12/2021 09:00:53)
  • links: abs | pdf
  • cs.CV

In this work we present SwiftNet for real-time semi-supervised video object segmentation (one-shot VOS), which reports 77.8% J&F and 70 FPS on DAVIS 2017 validation dataset, leading all present solutions in overall accuracy and speed performance. We achieve this by elaborately compressing spatiotemporal redundancy in matching-based VOS via Pixel-Adaptive Memory (PAM). Temporally, PAM adaptively triggers memory updates on frames where objects display noteworthy inter-frame variations. Spatially, PAM selectively performs memory update and match on dynamic pixels while ignoring the static ones, significantly reducing redundant computations wasted on segmentation-irrelevant pixels. To promote efficient reference encoding, light-aggregation encoder is also introduced in SwiftNet deploying reversed sub-pixel. We hope SwiftNet could set a strong and efficient baseline for real-time VOS and facilitate its application in mobile vision.

4. Bayesian Transformer Language Models for Speech Recognition

Boyang Xue, Jianwei Yu, Junhao Xu, Shansong Liu, Shoukang Hu, Zi Ye, Mengzhe Geng, Xunying Liu, Helen Meng

  • retweets: 310, favorites: 101 (02/12/2021 09:00:53)
  • links: abs | pdf
  • cs.CL

State-of-the-art neural language models (LMs) represented by Transformers are highly complex. Their use of fixed, deterministic parameter estimates fail to account for model uncertainty and lead to over-fitting and poor generalization when given limited training data. In order to address these issues, this paper proposes a full Bayesian learning framework for Transformer LM estimation. Efficient variational inference based approaches are used to estimate the latent parameter posterior distributions associated with different parts of the Transformer model architecture including multi-head self-attention, feed forward and embedding layers. Statistically significant word error rate (WER) reductions up to 0.5% absolute (3.18% relative) and consistent perplexity gains were obtained over the baseline Transformer LMs on state-of-the-art Switchboard corpus trained LF-MMI factored TDNN systems with i-Vector speaker adaptation. Performance improvements were also obtained on a cross domain LM adaptation task requiring porting a Transformer LM trained on the Switchboard and Fisher data to a low-resource DementiaBank elderly speech corpus.

5. Transparency to hybrid open access through publisher-provided metadata: An article-level study of Elsevier

Najko Jahn, Lisa Matthias, Mikael Laakso

  • retweets: 344, favorites: 29 (02/12/2021 09:00:53)
  • links: abs | pdf
  • cs.DL

With the growth of open access (OA), the financial flows in scholarly journal publishing have become increasingly complex, but comprehensive data and transparency into these flows are still lacking. The opaqueness is especially concerning for hybrid OA, where subscription-based journals publish individual articles as OA if an optional fee is paid. This study addresses the lack of transparency by leveraging Elsevier article metadata and provides the first publisher-level study of hybrid OA uptake and invoicing. Our results show that Elsevier’s hybrid OA uptake has grown steadily but slowly from 2015-2019, doubling the number of hybrid OA articles published per year and increasing the share of OA articles in Elsevier’s hybrid journals from 2.6% to 3.7% of all articles. Further, we find that most hybrid OA articles were invoiced directly to authors, followed by articles invoiced through agreements with research funders, institutions, or consortia, with only a few funding bodies driving hybrid OA uptake. As such, our findings point to the role of publishing agreements and OA policies in hybrid OA publishing. Our results further demonstrate the value of publisher-provided metadata to improve the transparency in scholarly publishing by linking invoicing data to bibliometrics.

6. Bootstrapping Relation Extractors using Syntactic Search by Examples

Matan Eyal, Asaf Amrami, Hillel Taub-Tabib, Yoav Goldberg

  • retweets: 56, favorites: 25 (02/12/2021 09:00:53)
  • links: abs | pdf
  • cs.CL

The advent of neural-networks in NLP brought with it substantial improvements in supervised relation extraction. However, obtaining a sufficient quantity of training data remains a key challenge. In this work we propose a process for bootstrapping training datasets which can be performed quickly by non-NLP-experts. We take advantage of search engines over syntactic-graphs (Such as Shlain et al. (2020)) which expose a friendly by-example syntax. We use these to obtain positive examples by searching for sentences that are syntactically similar to user input examples. We apply this technique to relations from TACRED and DocRED and show that the resulting models are competitive with models trained on manually annotated data and on data obtained from distant supervision. The models also outperform models trained using NLG data augmentation techniques. Extending the search-based approach with the NLG method further improves the results.

7. When does gradient descent with logistic loss interpolate using deep networks with smoothed ReLU activations?

Niladri S. Chatterji, Philip M. Long, Peter L. Bartlett

We establish conditions under which gradient descent applied to fixed-width deep networks drives the logistic loss to zero, and prove bounds on the rate of convergence. Our analysis applies for smoothed approximations to the ReLU, such as Swish and the Huberized ReLU, proposed in previous applied work. We provide two sufficient conditions for convergence. The first is simply a bound on the loss at initialization. The second is a data separation condition used in prior analyses.

8. Quantum Technologies: A Review of the Patent Landscape

Mathew Alex

Quantum Technologies is a term that is getting broader with every passing year. Nanotechnology and electronics operate in this realm. With the invention of industry-disrupting algorithms like Shor’s algorithm that can break RSA encryption on a quantum computer and Quantum Key Distribution, which offers unconditional security in theory, investment is pouring in. Here we taxonomize and analyze 48,577 patents in this area from 2015 to present captured with a comprehensive query in Relecura’s patent database. The author’s subject experience, along with the company’s AI-based tools and scholarly literature, were utilized to make this highly subjective choice of taxonomy. Though most Patent Landscape Analysis Reports consider a single technology, geography, or company, we have tried to give a holistic overview of these technologies as a whole due to their collaborative and intertwined nature. The physics of each technology and its role in the industry is briefly explained where possible.

9. MALI: A memory efficient and reverse accurate integrator for Neural ODEs

Juntang Zhuang, Nicha C. Dvornek, Sekhar Tatikonda, James S. Duncan

  • retweets: 30, favorites: 29 (02/12/2021 09:00:53)
  • links: abs | pdf
  • cs.LG

Neural ordinary differential equations (Neural ODEs) are a new family of deep-learning models with continuous depth. However, the numerical estimation of the gradient in the continuous case is not well solved: existing implementations of the adjoint method suffer from inaccuracy in reverse-time trajectory, while the naive method and the adaptive checkpoint adjoint method (ACA) have a memory cost that grows with integration time. In this project, based on the asynchronous leapfrog (ALF) solver, we propose the Memory-efficient ALF Integrator (MALI), which has a constant memory cost \textit{w.r.t} number of solver steps in integration similar to the adjoint method, and guarantees accuracy in reverse-time trajectory (hence accuracy in gradient estimation). We validate MALI in various tasks: on image recognition tasks, to our knowledge, MALI is the first to enable feasible training of a Neural ODE on ImageNet and outperform a well-tuned ResNet, while existing methods fail due to either heavy memory burden or inaccuracy; for time series modeling, MALI significantly outperforms the adjoint method; and for continuous generative models, MALI achieves new state-of-the-art performance.

10. What we are is more than what we do

Larissa Albantakis, Giulio Tononi

If we take the subjective character of consciousness seriously, consciousness becomes a matter of “being” rather than “doing”. Because “doing” can be dissociated from “being”, functional criteria alone are insufficient to decide whether a system possesses the necessary requirements for being a physical substrate of consciousness. The dissociation between “being” and “doing” is most salient in artificial general intelligence, which may soon replicate any human capacity: computers can perform complex functions (in the limit resembling human behavior) in the absence of consciousness. Complex behavior becomes meaningless if it is not performed by a conscious being.