Hot Papers 2021-01-05

1. The Atlas for the Aspiring Network Scientist

Michele Coscia

retweets: 2534, favorites: 153 (01/06/2021 10:15:20)
links: abs | pdf
cs.CY | cs.SI | physics.data-an | physics.soc-ph

Network science is the field dedicated to the investigation and analysis of complex systems via their representations as networks. We normally model such networks as graphs: sets of nodes connected by sets of edges and a number of node and edge attributes. This deceptively simple object is the starting point of never-ending complexity, due to its ability to represent almost every facet of reality: chemical interactions, protein pathways inside cells, neural connections inside the brain, scientific collaborations, financial relations, citations in art history, just to name a few examples. If we hope to make sense of complex networks, we need to master a large analytic toolbox: graph and probability theory, linear algebra, statistical physics, machine learning, combinatorics, and more. This book aims at providing the first access to all these tools. It is intended as an “Atlas”, because its interest is not in making you a specialist in using any of these techniques. Rather, after reading this book, you will have a general understanding about the existence and the mechanics of all these approaches. You can use such an understanding as the starting point of your own career in the field of network science. This has been, so far, an interdisciplinary endeavor. The founding fathers of this field come from many different backgrounds: mathematics, sociology, computer science, physics, history, digital humanities, and more. This Atlas is charting your path to be something different from all of that: a pure network scientist.

The Atlas for the Aspiring Network Scientist. (arXiv:2101.00863v1 [https://t.co/umDFgVJac8]) https://t.co/ozPxYKDJns
— NetScience (@net_science) January 5, 2021

2. Cross-Document Language Modeling

Avi Caciularu, Arman Cohan, Iz Beltagy, Matthew E. Peters, Arie Cattan, Ido Dagan

retweets: 361, favorites: 91 (01/06/2021 10:15:21)
links: abs | pdf
cs.CL

We introduce a new pretraining approach for language models that are geared to support multi-document NLP tasks. Our cross-document language model (CD-LM) improves masked language modeling for these tasks with two key ideas. First, we pretrain with multiple related documents in a single input, via cross-document masking, which encourages the model to learn cross-document and long-range relationships. Second, extending the recent Longformer model, we pretrain with long contexts of several thousand tokens and introduce a new attention pattern that uses sequence-level global attention to predict masked tokens, while retaining the familiar local attention elsewhere. We show that our CD-LM sets new state-of-the-art results for several multi-text tasks, including cross-document event and entity coreference resolution, paper citation recommendation, and documents plagiarism detection, while using a significantly reduced number of training parameters relative to prior works.

Cross-Document Language Modeling

Achieves a substantial improvement in MLM perplexity (2.34 -> 1.76) by training an improved Longformer across similar documents concatenated together.

https://t.co/bywFQSZRnT pic.twitter.com/cYn4EkgQxu
— Aran Komatsuzaki (@arankomatsuzaki) January 5, 2021

3. KM-BART: Knowledge Enhanced Multimodal BART for Visual Commonsense Generation

Yiran Xing, Zai Shi, Zhao Meng, Yunpu Ma, Roger Wattenhofer

retweets: 100, favorites: 42 (01/06/2021 10:15:21)
links: abs | pdf
cs.CL

We present Knowledge Enhanced Multimodal BART (KM-BART), which is a Transformer-based sequence-to-sequence model capable of reasoning about commonsense knowledge from multimodal inputs of images and texts. We extend the popular BART architecture to a multi-modal model. We design a new pretraining task to improve the model performance on Visual Commonsense Generation task. Our pretraining task improves the Visual Commonsense Generation performance by leveraging knowledge from a large language model pretrained on an external knowledge graph. To the best of our knowledge, we are the first to propose a dedicated task for improving model performance on Visual Commonsense Generation. Experimental results show that by pretraining, our model reaches state-of-the-art performance on the Visual Commonsense Generation task.

KM-BART: Knowledge Enhanced Multimodal BART for Visual Commonsense Generation
pdf: https://t.co/uOkla7D4Ou
abs: https://t.co/Y1www5fOSf pic.twitter.com/IC9faUPhXQ
— AK (@ak92501) January 5, 2021

4. Jamming Attacks and Anti-Jamming Strategies in Wireless Networks: A Comprehensive Survey

Hossein Pirayesh, Huacheng Zeng

retweets: 90, favorites: 13 (01/06/2021 10:15:22)
links: abs | pdf
cs.CR

Wireless networks are a key component of the telecommunications infrastructure in our society, and wireless services become increasingly important as the applications of wireless devices have penetrated every aspect of our lives. Although wireless technologies have significantly advanced in the past decades, most wireless networks are still vulnerable to radio jamming attacks due to the openness nature of wireless channels, and the progress in the design of jamming-resistant wireless networking systems remains limited. This stagnation can be attributed to the lack of practical physical-layer wireless technologies that can efficiently decode data packets in the presence of jamming attacks. This article surveys existing jamming attacks and anti-jamming strategies in wireless local area networks (WLANs), cellular networks, cognitive radio networks (CRNs), ZigBee networks, Bluetooth networks, vehicular networks, LoRa networks, RFID networks, and GPS system, with the objective of offering a comprehensive knowledge landscape of existing jamming/anti-jamming strategies and stimulating more research efforts to secure wireless networks against jamming attacks. Different from prior survey papers, this article conducts a comprehensive, in-depth review on jamming and anti-jamming strategies, casting insights on the design of jamming-resilient wireless networking systems. An outlook on promising antijamming techniques is offered at the end of this article to delineate important research directions.

5. Networks of Necessity: Simulating Strategies for COVID-19 Mitigation among Disabled People and Their Caregivers

Thomas E. Valles, Hannah Shoenhard, Joseph Zinski, Sarah Trick, Mason A. Porter, Michael R. Lindstrom

retweets: 72, favorites: 19 (01/06/2021 10:15:22)
links: abs | pdf
cs.SI | physics.soc-ph

A major strategy to prevent the spread of COVID-19 is through the limiting of in-person contacts. However, for the many disabled people who live in the community and require caregivers to assist them with activities of daily living, limiting contacts is impractical or impossible. We seek to determine which interventions can prevent infections among disabled people and their caregivers. To accomplish this, we simulate COVID-19 transmission with a compartmental model on a network. The networks incorporate heterogeneity in the risks of different types of interactions, time-dependent lockdown and reopening measures, and interaction distributions for four different groups (caregivers, disabled people, essential workers, and the general population). Among these groups, we find the probability of becoming infected is highest for caregivers and second highest for disabled people. Our analysis of the network structure illustrates that caregivers have the largest modal eigenvector centrality among the four groups. We find that two interventions — contact-limiting by all groups and mask-wearing by disabled people and caregivers — particularly reduce cases among disabled people and caregivers. We also test which group most effectively spreads COVID-19 by seeding infections in a subset of each group and then comparing the total number of infections as the disease spreads. We find that caregivers are the most effective spreaders of COVID-19. We then test where limited vaccine doses could be used most effectively and we find that vaccinating caregivers better protects disabled people than vaccinating the general population, essential workers, or the disabled population itself. Our results highlight the potential effectiveness of mask-wearing, contact-limiting throughout society, and strategic vaccination for limiting the exposure of disabled people and their caregivers to COVID-19.

The academic paper following our short policy paper!

"Networks of Necessity: Simulating Strategies for COVID-19 Mitigation among Disabled People and Their Caregivers": https://t.co/BiqZOrDrHo

by @ThomasValles1, @HShoenhard, Joseph Zinski, @saraynet, MAP, Michael R. Lindstrom pic.twitter.com/JqChlZVRVh
— Mason Porter (@masonporter) January 5, 2021

6. EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets

Xiaohan Chen, Yu Cheng, Shuohang Wang, Zhe Gan, Zhangyang Wang, Jingjing Liu

retweets: 44, favorites: 43 (01/06/2021 10:15:22)
links: abs | pdf
cs.CL | cs.AI

Deep, heavily overparameterized language models such as BERT, XLNet and T5 have achieved impressive success in many NLP tasks. However, their high model complexity requires enormous computation resources and extremely long training time for both pre-training and fine-tuning. Many works have studied model compression on large NLP models, but only focus on reducing inference cost/time, while still requiring expensive training process. Other works use extremely large batch sizes to shorten the pre-training time at the expense of high demand for computation resources. In this paper, inspired by the Early-Bird Lottery Tickets studied for computer vision tasks, we propose EarlyBERT, a general computationally-efficient training algorithm applicable to both pre-training and fine-tuning of large-scale language models. We are the first to identify structured winning tickets in the early stage of BERT training, and use them for efficient training. Comprehensive pre-training and fine-tuning experiments on GLUE and SQuAD downstream tasks show that EarlyBERT easily achieves comparable performance to standard BERT with 35~45% less training time.

EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets
pdf: https://t.co/d9z2BXQ4S2
abs: https://t.co/g7xRcdAV36 pic.twitter.com/cX2OFEQbrR
— AK (@ak92501) January 5, 2021

7. VinVL: Making Visual Representations Matter in Vision-Language Models

Pengchuan Zhang, Xiujun Li, Xiaowei Hu, Jianwei Yang, Lei Zhang, Lijuan Wang, Yejin Choi, Jianfeng Gao

retweets: 30, favorites: 46 (01/06/2021 10:15:23)
links: abs | pdf
cs.CV | cs.AI | cs.CL | cs.LG

This paper presents a detailed study of improving visual representations for vision language (VL) tasks and develops an improved object detection model to provide object-centric representations of images. Compared to the most widely used \emph{bottom-up and top-down} model \cite{anderson2018bottom}, the new model is bigger, better-designed for VL tasks, and pre-trained on much larger training corpora that combine multiple public annotated object detection datasets. Therefore, it can generate representations of a richer collection of visual objects and concepts. While previous VL research focuses mainly on improving the vision-language fusion model and leaves the object detection model improvement untouched, we show that visual features matter significantly in VL models. In our experiments we feed the visual features generated by the new object detection model into a Transformer-based VL fusion model \oscar \cite{li2020oscar}, and utilize an improved approach \short\ to pre-train the VL model and fine-tune it on a wide range of downstream VL tasks. Our results show that the new visual features significantly improve the performance across all VL tasks, creating new state-of-the-art results on seven public benchmarks. We will release the new object detection model to public.

VinVL: Making Visual Representations Matter in Vision-Language Models
pdf: https://t.co/w2VpHSi6Fg
abs: https://t.co/dzZiNAri15 pic.twitter.com/BSmJmHYx4w
— AK (@ak92501) January 5, 2021

8. GIS and Computational Notebooks

Geoff Boeing, Dani Arribas-Bel

retweets: 42, favorites: 31 (01/06/2021 10:15:23)
links: abs | pdf
cs.CY | stat.CO

Researchers and practitioners across many disciplines have recently adopted computational notebooks to develop, document, and share their scientific workflows - and the GIS community is no exception. This chapter introduces computational notebooks in the geographical context. It begins by explaining the computational paradigm and philosophy that underlie notebooks. Next it unpacks their architecture to illustrate a notebook user’s typical workflow. Then it discusses the main benefits notebooks offer GIS researchers and practitioners, including better integration with modern software, more natural access to new forms of data, and better alignment with the principles and benefits of open science. In this context, it identifies notebooks as the glue that binds together a broader ecosystem of open source packages and transferable platforms for computational geography. The chapter concludes with a brief illustration of using notebooks for a set of basic GIS operations. Compared to traditional desktop GIS, notebooks can make spatial analysis more nimble, extensible, and reproducible and have thus evolved into an important component of the geospatial science toolkit.

"GIS and Computational Notebooks" from @gboeing and @darribas: How jupyter notebooks are a key part of integrating GIS data and data science tools https://t.co/bhX22Wb8NM. The frontier of geodata science is fascinating. pic.twitter.com/BnwlDnmoIW
— Adam Lauretig (@lauretig) January 5, 2021

9. Faster Stochastic Trace Estimation with a Chebyshev Product Identity

Eric Hallman

retweets: 20, favorites: 49 (01/06/2021 10:15:24)
links: abs | pdf
math.NA

Methods for stochastic trace estimation often require the repeated evaluation of expressions of the form $z^T p_n(A)z$ , where $A$ is a symmetric matrix and $p_n$ is a degree $n$ polynomial written in the standard or Chebyshev basis. We show how to evaluate these expressions using only $\lceil n/2\rceil$ matrix-vector products, thus substantially reducing the cost of existing trace estimation algorithms that use Chebyshev interpolation or Taylor series.

those damn Chebyshev polynomials are back at it again!https://t.co/b2c4E2iuIb

(here, for estimating expressions of the form Tr f(A) for matrices A, using only matrix-vector products)
— Sam Power (@sam_power_825) January 5, 2021

10. A p-adaptive, implicit-explicit mixed finite element method for reaction-diffusion problems

Mebratu Wakeni, Ankush Aggarwal, Lukasz Kaczmarczyk, Andrew McBride, Ignatios Athanasiadis, Chris Pearce, Paul Steinmann

retweets: 25, favorites: 30 (01/06/2021 10:15:25)
links: abs | pdf
math.NA

A new class of implicit-explicit (IMEX) methods combined with a p-adaptive mixed finite element formulation is proposed to simulate the diffusion of reacting species. Hierarchical polynomial functions are used to construct an $H(\mathrm{Div})$ -conforming base for the flux vectors, and a non-conforming $L^2$ base for the mass concentration of the species. The mixed formulation captures the distinct nonlinearities associated with the constitutive flux equations and the reaction terms. The IMEX method conveniently treats these two sources of nonlinearity implicitly and explicitly, respectively, within a single time-stepping framework. The combination of the p-adaptive mixed formulation and the IMEX method delivers a robust and efficient algorithm. The proposed methods eliminate the coupled effect of mesh size and time step on the algorithmic stability. A residual based a posteriori error estimate that provides an upper bound of the natural error norm is derived. The availability of such estimate which can be obtained with minimal computational effort and the hierarchical construction of the finite element spaces allow for the formulation of an efficient p-adaptive algorithm. A series of numerical examples demonstrate the performance of the approach. It is shown that the method with the p-adaptive strategy accurately solves problems involving travelling waves, and those with discontinuities and singularities. The flexibility of the formulation is also illustrated via selected applications in pattern formation and electrophysiology.

A preprint of paper with work done in MoFEM.

"A p-adaptive, implicit-explicit mixed finite element method for reaction-diffusion problems"https://t.co/UeFlUeF1Xa pic.twitter.com/ZC6cJbNbIq
— Lukasz Kaczmarczyk (@LukaszKaczmarcz) January 5, 2021

11. End-to-End Training of Neural Retrievers for Open-Domain Question Answering

Devendra Singh Sachan, Mostofa Patwary, Mohammad Shoeybi, Neel Kant, Wei Ping, William L Hamilton, Bryan Catanzaro

retweets: 12, favorites: 42 (01/06/2021 10:15:25)
links: abs | pdf
cs.CL | cs.AI

Recent work on training neural retrievers for open-domain question answering (OpenQA) has employed both supervised and unsupervised approaches. However, it remains unclear how unsupervised and supervised methods can be used most effectively for neural retrievers. In this work, we systematically study retriever pre-training. We first propose an approach of unsupervised pre-training with the Inverse Cloze Task and masked salient spans, followed by supervised finetuning using question-context pairs. This approach leads to absolute gains of 2+ points over the previous best result in the top-20 retrieval accuracy on Natural Questions and TriviaQA datasets. We also explore two approaches for end-to-end supervised training of the reader and retriever components in OpenQA models. In the first approach, the reader considers each retrieved document separately while in the second approach, the reader considers all the retrieved documents together. Our experiments demonstrate the effectiveness of these approaches as we obtain new state-of-the-art results. On the Natural Questions dataset, we obtain a top-20 retrieval accuracy of 84, an improvement of 5 points over the recent DPR model. In addition, we achieve good results on answer extraction, outperforming recent models like REALM and RAG by 3+ points. We further scale up end-to-end training to large models and show consistent gains in performance over smaller models.

End-to-End Training of Neural Retrievers for Open-Domain Question Answering

Achieves SotA in Open-domain QA (e.g. 5 points gain over DPR on NQ and outpeforms FiD) through an end-to-end
supervised training of the reader and retriever. https://t.co/Z1Y83MgG7s pic.twitter.com/7GfD01tvls
— Aran Komatsuzaki (@arankomatsuzaki) January 5, 2021

12. Provable Generalization of SGD-trained Neural Networks of Any Width in the Presence of Adversarial Label Noise

Spencer Frei, Yuan Cao, Quanquan Gu

retweets: 30, favorites: 23 (01/06/2021 10:15:26)
links: abs | pdf
cs.LG | math.OC | stat.ML

We consider a one-hidden-layer leaky ReLU network of arbitrary width trained by stochastic gradient descent following an arbitrary initialization. We prove that stochastic gradient descent (SGD) produces neural networks that have classification accuracy competitive with that of the best halfspace over the distribution for a broad class of distributions that includes log-concave isotropic and hard margin distributions. Equivalently, such networks can generalize when the data distribution is linearly separable but corrupted with adversarial label noise, despite the capacity to overfit. We conduct experiments which suggest that for some distributions our generalization bounds are nearly tight. This is the first result that shows that overparameterized neural networks trained by SGD can generalize when the data is corrupted with adversarial label noise.

New work on arXiv! We prove that SGD-trained 1-hidden-layer neural networks of any width and any initialization can generalize almost as well as the best linear classifier over the distribution for many distributions. https://t.co/JGLsmMh6Gi Joint w/ @QuanquanGu @_YuanCao_
— Spencer Frei (@sfrei_) January 5, 2021

Published 6 Jan 2021

ML Lead at Beatrust. (https://beatrust.com)Tatsuya Shirakawa on Twitter