All Articles

Hot Papers 2020-09-08

1. Measuring Massive Multitask Language Understanding

Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, Jacob Steinhardt

We propose a new test to measure a text model’s multitask accuracy. The test covers 57 tasks including elementary mathematics, US history, computer science, law, and more. To attain high accuracy on this test, models must possess extensive world knowledge and problem solving ability. We find that while most recent models have near random-chance accuracy, the very largest GPT-3 model improves over random chance by almost 20 percentage points on average. However, on every one of the 57 tasks, the best models still need substantial improvements before they can reach human-level accuracy. Models also have lopsided performance and frequently do not know when they are wrong. Worse, they still have near-random accuracy on some socially important subjects such as morality and law. By comprehensively evaluating the breadth and depth of a model’s academic and professional understanding, our test can be used to analyze models across many tasks and to identify important shortcomings.

2. Non-exponentially weighted aggregation: regret bounds for unbounded loss functions

Pierre Alquier

We tackle the problem of online optimization with a general, possibly unbounded, loss function. It is well known that the exponentially weighted aggregation strategy (EWA) leads to a regret in T\sqrt{T} after TT steps, under the assumption that the loss is bounded. The online gradient algorithm (OGA) has a regret in T\sqrt{T} when the loss is convex and Lipschitz. In this paper, we study a generalized aggregation strategy, where the weights do no longer necessarily depend exponentially on the losses. Our strategy can be interpreted as the minimization of the expected losses plus a penalty term. When the penalty term is the Kullback-Leibler divergence, we obtain EWA as a special case, but using alternative divergences lead to a regret bounds for unbounded, not necessarily convex losses. However, the cost is a worst regret bound in some cases.

3. E-BERT: A Phrase and Product Knowledge Enhanced Language Model for E-commerce

Denghui Zhang, Zixuan Yuan, Yanchi Liu, Fuzhen Zhuang, Hui Xiong

Pre-trained language models such as BERT have achieved great success in a broad range of natural language processing tasks. However, BERT cannot well support E-commerce related tasks due to the lack of two levels of domain knowledge, i.e., phrase-level and product-level. On one hand, many E-commerce tasks require an accurate understanding of domain phrases, whereas such fine-grained phrase-level knowledge is not explicitly modeled by BERT’s training objective. On the other hand, product-level knowledge like product associations can enhance the language modeling of E-commerce, but they are not factual knowledge thus using them indiscriminately may introduce noise. To tackle the problem, we propose a unified pre-training framework, namely, E-BERT. Specifically, to preserve phrase-level knowledge, we introduce Adaptive Hybrid Masking, which allows the model to adaptively switch from learning preliminary word knowledge to learning complex phrases, based on the fitting progress of two modes. To utilize product-level knowledge, we introduce Neighbor Product Reconstruction, which trains E-BERT to predict a product’s associated neighbors with a denoising cross attention layer. Our investigation reveals promising results in four downstream tasks, i.e., review-based question answering, aspect extraction, aspect sentiment classification, and product classification.

4. Fast simulation of planar Clifford circuits

David Gosset, Daniel Grier, Alex Kerzner, Luke Schaeffer

A general quantum circuit can be simulated in exponential time on a classical computer. If it has a planar layout, then a tensor-network contraction algorithm due to Markov and Shi has a runtime exponential in the square root of its size, or more generally exponential in the treewidth of the underlying graph. Separately, Gottesman and Knill showed that if all gates are restricted to be Clifford, then there is a polynomial time simulation. We combine these two ideas and show that treewidth and planarity can be exploited to improve Clifford circuit simulation. Our main result is a classical algorithm with runtime scaling asymptotically as nω/2<n1.19n^{\omega/2}<n^{1.19} which samples from the output distribution obtained by measuring all nn qubits of a planar graph state in given Pauli bases. Here ω\omega is the matrix multiplication exponent. We also provide a classical algorithm with the same asymptotic runtime which samples from the output distribution of any constant-depth Clifford circuit in a planar geometry. Our work improves known classical algorithms with cubic runtime. A key ingredient is a mapping which, given a tree decomposition of some graph GG, produces a Clifford circuit with a structure that mirrors the tree decomposition and which emulates measurement of the quantum graph state corresponding to GG. We provide a classical simulation of this circuit with the runtime stated above for planar graphs and otherwise ntω1n t^{\omega-1} where tt is the width of the tree decomposition. The algorithm also incorporates a matrix-multiplication-time version of the Gottesman-Knill simulation of multi-qubit measurement on stabilizer states, which may be of independent interest.