1. Evaluating Large Language Models Trained on Code
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde, Jared Kaplan, Harri Edwards, Yura Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, Will Guss, Alex Nichol, Igor Babuschkin, Suchir Balaji, Shantanu Jain, Andrew Carr, Jan Leike, Josh Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, Wojciech Zaremba
We introduce Codex, a GPT language model fine-tuned on publicly available code from GitHub, and study its Python code-writing capabilities. A distinct production version of Codex powers GitHub Copilot. On HumanEval, a new evaluation set we release to measure functional correctness for synthesizing programs from docstrings, our model solves 28.8% of the problems, while GPT-3 solves 0% and GPT-J solves 11.4%. Furthermore, we find that repeated sampling from the model is a surprisingly effective strategy for producing working solutions to difficult prompts. Using this method, we solve 70.2% of our problems with 100 samples per problem. Careful investigation of our model reveals its limitations, including difficulty with docstrings describing long chains of operations and with binding operations to variables. Finally, we discuss the potential broader impacts of deploying powerful code generation technologies, covering safety, security, and economics.
Codex paper is out! I'm grateful to have led the Safety and PL workstreams for Codex/Copilot, working along Policy @OpenAI. There's many questions about limitations and implications (BI section), a thread on some of our findings:https://t.co/kf8ig8szIQ
— Dr Heidy Khlaaf (هايدي خلاف) (@HeidyKhlaaf) July 8, 2021
Excited to finally share a paper on what a huge chunk of OpenAI has been working on lately: building a series of code generation models and assessing their capabilities and societal implications. 🧵 https://t.co/wed7Jj95Nl
— Miles Brundage (@Miles_Brundage) July 8, 2021
2. A Survey on Data Augmentation for Text Classification
Markus Bayer, Marc-André Kaufhold, Christian Reuter
Data augmentation, the artificial creation of training data for machine learning by transformations, is a widely studied research field across machine learning disciplines. While it is useful for increasing the generalization capabilities of a model, it can also address many other challenges and problems, from overcoming a limited amount of training data over regularizing the objective to limiting the amount data used to protect privacy. Based on a precise description of the goals and applications of data augmentation (C1) and a taxonomy for existing works (C2), this survey is concerned with data augmentation methods for textual classification and aims to achieve a concise and comprehensive overview for researchers and practitioners (C3). Derived from the taxonomy, we divided more than 100 methods into 12 different groupings and provide state-of-the-art references expounding which methods are highly promising (C4). Finally, research perspectives that may constitute a building block for future work are given (C5).
Two great NLP survey papers this week:
— elvis (@omarsar0) July 8, 2021
1) Survey on Data Augmentation for Text Classification - https://t.co/do1nwCWLTc
2) Survey on Dialogue Summarization: Advances and New Frontiers - https://t.co/91EXH3Jpxp pic.twitter.com/N1xvKMI6oz
3. A Survey on Dialogue Summarization: Recent Advances and New Frontiers
Xiachong Feng, Xiaocheng Feng, Bing Qin
With the development of dialogue systems and natural language generation techniques, the resurgence of dialogue summarization has attracted significant research attentions, which aims to condense the original dialogue into a shorter version covering salient information. However, there remains a lack of comprehensive survey for this task. To this end, we take the first step and present a thorough review of this research field. In detail, we provide an overview of publicly available research datasets, summarize existing works according to the domain of input dialogue as well as organize leaderboards under unified metrics. Furthermore, we discuss some future directions and give our thoughts. We hope that this first survey of dialogue summarization can provide the community with a quick access and a general picture to this task and motivate future researches.
Two great NLP survey papers this week:
— elvis (@omarsar0) July 8, 2021
1) Survey on Data Augmentation for Text Classification - https://t.co/do1nwCWLTc
2) Survey on Dialogue Summarization: Advances and New Frontiers - https://t.co/91EXH3Jpxp pic.twitter.com/N1xvKMI6oz
4. SoundStream: An End-to-End Neural Audio Codec
Neil Zeghidour, Alejandro Luebs, Ahmed Omran, Jan Skoglund, Marco Tagliasacchi
We present SoundStream, a novel neural audio codec that can efficiently compress speech, music and general audio at bitrates normally targeted by speech-tailored codecs. SoundStream relies on a model architecture composed by a fully convolutional encoder/decoder network and a residual vector quantizer, which are trained jointly end-to-end. Training leverages recent advances in text-to-speech and speech enhancement, which combine adversarial and reconstruction losses to allow the generation of high-quality audio content from quantized embeddings. By training with structured dropout applied to quantizer layers, a single model can operate across variable bitrates from 3kbps to 18kbps, with a negligible quality loss when compared with models trained at fixed bitrates. In addition, the model is amenable to a low latency implementation, which supports streamable inference and runs in real time on a smartphone CPU. In subjective evaluations using audio at 24kHz sampling rate, SoundStream at 3kbps outperforms Opus at 12kbps and approaches EVS at 9.6kbps. Moreover, we are able to perform joint compression and enhancement either at the encoder or at the decoder side with no additional latency, which we demonstrate through background noise suppression for speech.
Check Soundstream, our neural audio codec:
— Neil Zeghidour (@neilzegh) July 8, 2021
* outperforms Opus & EVS on speech & music w/ up to 4x fewer bits
* scalable: 1 model for all bitrates
* runs real-time on 1 smartphone CPU
* controllable denoising
Paper: https://t.co/Lwap8acnYz
Samples 🔊 : https://t.co/32HqtCMDpI
1/5 pic.twitter.com/QjgmBWJTI2
SoundStream: An End-to-End Neural Audio Codec
— AK (@ak92501) July 8, 2021
pdf: https://t.co/UyRE6snXGW
abs: https://t.co/nWjWyQ3pTk
neural audio codec that can efficiently compress speech, music and general audio at bitrates normally targeted by speech-tailored codecs pic.twitter.com/29GTuTyQ28
5. GLiT: Neural Architecture Search for Global and Local Image Transformer
Boyu Chen, Peixia Li, Chuming Li, Baopu Li, Lei Bai, Chen Lin, Ming Sun, Junjie yan, Wanli Ouyang
We introduce the first Neural Architecture Search (NAS) method to find a better transformer architecture for image recognition. Recently, transformers without CNN-based backbones are found to achieve impressive performance for image recognition. However, the transformer is designed for NLP tasks and thus could be sub-optimal when directly used for image recognition. In order to improve the visual representation ability for transformers, we propose a new search space and searching algorithm. Specifically, we introduce a locality module that models the local correlations in images explicitly with fewer computational cost. With the locality module, our search space is defined to let the search algorithm freely trade off between global and local information as well as optimizing the low-level design choice in each module. To tackle the problem caused by huge search space, a hierarchical neural architecture search method is proposed to search the optimal vision transformer from two levels separately with the evolutionary algorithm. Extensive experiments on the ImageNet dataset demonstrate that our method can find more discriminative and efficient transformer variants than the ResNet family (e.g., ResNet101) and the baseline ViT for image classification.
GLiT: Neural Architecture Search for Global and Local Image Transformer
— AK (@ak92501) July 8, 2021
pdf: https://t.co/9EVQvoLvyX
method can find more discriminative and efficient transformer variants than the ResNet family (e.g., ResNet101) and the baseline ViT for image classification pic.twitter.com/fkFVfMdHfq
6. Deep Extrapolation for Attribute-Enhanced Generation
Alvin Chan, Ali Madani, Ben Krause, Nikhil Naik
Attribute extrapolation in sample generation is challenging for deep neural networks operating beyond the training distribution. We formulate a new task for extrapolation in sequence generation, focusing on natural language and proteins, and propose GENhance, a generative framework that enhances attributes through a learned latent space. Trained on movie reviews and a computed protein stability dataset, GENhance can generate strongly-positive text reviews and highly stable protein sequences without being exposed to similar data during training. We release our benchmark tasks and models to contribute to the study of generative modeling extrapolation and data-driven design in biology and chemistry.
Can generative AI learn to extrapolate? We explore how to generate sequences that enhance desired attributes-- beyond what was seen in training. Works pretty well in #NLP and #proteins!
— Ali Madani (@thisismadani) July 8, 2021
Blog: https://t.co/dUAgZuX0W5
Paper: https://t.co/kY4TruHUTu
Code: https://t.co/jFNccTxW0M pic.twitter.com/1v55MlMr3m
7. Structured Denoising Diffusion Models in Discrete State-Spaces
Jacob Austin, Daniel Johnson, Jonathan Ho, Danny Tarlow, Rianne van den Berg
Denoising diffusion probabilistic models (DDPMs) (Ho et al. 2020) have shown impressive results on image and waveform generation in continuous state spaces. Here, we introduce Discrete Denoising Diffusion Probabilistic Models (D3PMs), diffusion-like generative models for discrete data that generalize the multinomial diffusion model of Hoogeboom et al. 2021, by going beyond corruption processes with uniform transition probabilities. This includes corruption with transition matrices that mimic Gaussian kernels in continuous space, matrices based on nearest neighbors in embedding space, and matrices that introduce absorbing states. The third allows us to draw a connection between diffusion models and autoregressive and mask-based generative models. We show that the choice of transition matrix is an important design decision that leads to improved results in image and text domains. We also introduce a new loss function that combines the variational lower bound with an auxiliary cross entropy loss. For text, this model class achieves strong results on character-level text generation while scaling to large vocabularies on LM1B. On the image dataset CIFAR-10, our models approach the sample quality and exceed the log-likelihood of the continuous-space DDPM model.
Structured Denoising Diffusion Models in Discrete
— AK (@ak92501) July 8, 2021
State-Spaces
pdf: https://t.co/OAc8ffsJCD
abs: https://t.co/mQHJTkXHqA
D3PMs, a class of models that improves diffusion models for discrete data by defining new kinds of discrete corruption processes pic.twitter.com/aYCovG2vyn
8. Big Data Information and Nowcasting: Consumption and Investment from Bank Transactions in Turkey
Ali B. Barlas, Seda Guler Mert, Berk Orkun Isa, Alvaro Ortiz, Tomasa Rodrigo, Baris Soybilgen, Ege Yazgan
We use the aggregate information from individual-to-firm and firm-to-firm in Garanti BBVA Bank transactions to mimic domestic private demand. Particularly, we replicate the quarterly national accounts aggregate consumption and investment (gross fixed capital formation) and its bigger components (Machinery and Equipment and Construction) in real time for the case of Turkey. In order to validate the usefulness of the information derived from these indicators we test the nowcasting ability of both indicators to nowcast the Turkish GDP using different nowcasting models. The results are successful and confirm the usefulness of Consumption and Investment Banking transactions for nowcasting purposes. The value of the Big data information is more relevant at the beginning of the nowcasting process, when the traditional hard data information is scarce. This makes this information specially relevant for those countries where statistical release lags are longer like the Emerging Markets.
Now you can check how we measure Turkish consumption and investment in real time and how we can introduce these data in traditional Nowcasting models ➡️ https://t.co/nxibEvcNFf @ali_hakan_kara @JuriMarcucci @RefetGurkaynak @SimdiTahmin pic.twitter.com/Me7mNr76A5
— Alvaro Ortiz (@alvaroortiz1968) July 8, 2021
9. A Survey of Uncertainty in Deep Neural Networks
Jakob Gawlikowski, Cedrique Rovile Njieutcheu Tassi, Mohsin Ali, Jongseok Lee, Matthias Humt, Jianxiang Feng, Anna Kruspe, Rudolph Triebel, Peter Jung, Ribana Roscher, Muhammad Shahzad, Wen Yang, Richard Bamler, Xiao Xiang Zhu
Due to their increasing spread, confidence in neural network predictions became more and more important. However, basic neural networks do not deliver certainty estimates or suffer from over or under confidence. Many researchers have been working on understanding and quantifying uncertainty in a neural network’s prediction. As a result, different types and sources of uncertainty have been identified and a variety of approaches to measure and quantify uncertainty in neural networks have been proposed. This work gives a comprehensive overview of uncertainty estimation in neural networks, reviews recent advances in the field, highlights current challenges, and identifies potential research opportunities. It is intended to give anyone interested in uncertainty estimation in neural networks a broad overview and introduction, without presupposing prior knowledge in this field. A comprehensive introduction to the most crucial sources of uncertainty is given and their separation into reducible model uncertainty and not reducible data uncertainty is presented. The modeling of these uncertainties based on deterministic neural networks, Bayesian neural networks, ensemble of neural networks, and test-time data augmentation approaches is introduced and different branches of these fields as well as the latest developments are discussed. For a practical application, we discuss different measures of uncertainty, approaches for the calibration of neural networks and give an overview of existing baselines and implementations. Different examples from the wide spectrum of challenges in different fields give an idea of the needs and challenges regarding uncertainties in practical applications. Additionally, the practical limitations of current methods for mission- and safety-critical real world applications are discussed and an outlook on the next steps towards a broader usage of such methods is given.
A Survey of Uncertainty in Deep Neural Networks. (arXiv:2107.03342v1 [cs.LG]) https://t.co/lgAV7jJ6rU
— Stat.ML Papers (@StatMLPapers) July 8, 2021
10. The Geography of Open Source Software: Evidence from GitHub
Johannes Wachs, Mariusz Nitecki, William Schueller, Axel Polleres
Open Source Software plays an important role in the digital economy. Yet although software production is amenable to remote collaboration and its end products are easily shared across distances, software development seems to cluster geographically in places such as Silicon Valley, London, or Berlin. And while recent work indicates that positive effects of open source software production accrue locally through knowledge spillovers and information effects, up-to-date data on the geographic distribution of active open source developers remains limited. Here we analyze the geographic distribution of more than half a million active contributors to GitHub located in early 2021 at various spatial scales. Comparing our data with results from before 2010, we find a significant increase in the relative share of developers based in Asia, Latin America and Eastern Europe, suggesting a more even spread of OSS developers globally. Within countries, however, we find significant concentration in regions, exceeding by some margin the concentration of workers in high-tech fields. We relate OSS activity to a number of social and technological indicators at both scales using a multiple regression framework. Despite the potential of OSS as a distributed mode of collaborative work, the data suggest that OSS activity remains highly localized.
We have a new preprint (https://t.co/CO54jx53yX, /w Mariusz Nitecki, @wschuell1, @AxelPolleres) on the geography of open source software devs. We geolocated and counted active devs on GitHub in countries and regions, comparing vs 10+ yrs ago. Get the data: https://t.co/NhVDxkYuGa pic.twitter.com/2c2PQTwyoQ
— Johannes Wachs (@johannes_wachs) July 8, 2021
11. Learning Latent Actions to Control Assistive Robots
Dylan P. Losey, Hong Jun Jeon, Mengxi Li, Krishnan Srinivasan, Ajay Mandlekar, Animesh Garg, Jeannette Bohg, Dorsa Sadigh
Assistive robot arms enable people with disabilities to conduct everyday tasks on their own. These arms are dexterous and high-dimensional; however, the interfaces people must use to control their robots are low-dimensional. Consider teleoperating a 7-DoF robot arm with a 2-DoF joystick. The robot is helping you eat dinner, and currently you want to cut a piece of tofu. Today’s robots assume a pre-defined mapping between joystick inputs and robot actions: in one mode the joystick controls the robot’s motion in the x-y plane, in another mode the joystick controls the robot’s z-yaw motion, and so on. But this mapping misses out on the task you are trying to perform! Ideally, one joystick axis should control how the robot stabs the tofu and the other axis should control different cutting motions. Our insight is that we can achieve intuitive, user-friendly control of assistive robots by embedding the robot’s high-dimensional actions into low-dimensional and human-controllable latent actions. We divide this process into three parts. First, we explore models for learning latent actions from offline task demonstrations, and formalize the properties that latent actions should satisfy. Next, we combine learned latent actions with autonomous robot assistance to help the user reach and maintain their high-level goals. Finally, we learn a personalized alignment model between joystick inputs and latent actions. We evaluate our resulting approach in four user studies where non-disabled participants reach marshmallows, cook apple pie, cut tofu, and assemble dessert. We then test our approach with two disabled adults who leverage assistive devices on a daily basis.
Making assistive teleoperation intuitive, easy to operate, and precise! The journal version of our work on the framework of learned latent actions + shared autonomy + personalization is out. We also have new studies with users with disability.
— Dorsa Sadigh (@DorsaSadigh) July 8, 2021
Paper: https://t.co/KXomKMMhvV pic.twitter.com/knWAxPOx4Z