Hot Papers 2021-07-16

1. From Show to Tell: A Survey on Image Captioning

Matteo Stefanini, Marcella Cornia, Lorenzo Baraldi, Silvia Cascianelli, Giuseppe Fiameni, Rita Cucchiara

retweets: 1230, favorites: 144 (07/17/2021 08:02:18)
links: abs | pdf
cs.CV | cs.CL

Connecting Vision and Language plays an essential role in Generative Intelligence. For this reason, in the last few years, a large research effort has been devoted to image captioning, i.e. the task of describing images with syntactically and semantically meaningful sentences. Starting from 2015 the task has generally been addressed with pipelines composed of a visual encoding step and a language model for text generation. During these years, both components have evolved considerably through the exploitation of object regions, attributes, and relationships and the introduction of multi-modal connections, fully-attentive approaches, and BERT-like early-fusion strategies. However, regardless of the impressive results obtained, research in image captioning has not reached a conclusive answer yet. This work aims at providing a comprehensive overview and categorization of image captioning approaches, from visual encoding and text generation to training strategies, used datasets, and evaluation metrics. In this respect, we quantitatively compare many relevant state-of-the-art approaches to identify the most impactful technical innovations in image captioning architectures and training strategies. Moreover, many variants of the problem and its open challenges are analyzed and discussed. The final goal of this work is to serve as a tool for understanding the existing state-of-the-art and highlighting the future directions for an area of research where Computer Vision and Natural Language Processing can find an optimal synergy.

📘 A Survey on Image Captioning

If you are interested to know the role vision and language play in generative machine learning models, you will enjoy this new comprehensive survey paper.

Highly recommended for ML and NLP students.https://t.co/zVJV6Lj2JL pic.twitter.com/bj9f3meFbZ
— elvis (@omarsar0) July 16, 2021

2. The Benchmark Lottery

Mostafa Dehghani, Yi Tay, Alexey A. Gritsenko, Zhe Zhao, Neil Houlsby, Fernando Diaz, Donald Metzler, Oriol Vinyals

retweets: 943, favorites: 240 (07/17/2021 08:02:18)
links: abs | pdf
cs.LG

The world of empirical machine learning (ML) strongly relies on benchmarks in order to determine the relative effectiveness of different algorithms and methods. This paper proposes the notion of “a benchmark lottery” that describes the overall fragility of the ML benchmarking process. The benchmark lottery postulates that many factors, other than fundamental algorithmic superiority, may lead to a method being perceived as superior. On multiple benchmark setups that are prevalent in the ML community, we show that the relative performance of algorithms may be altered significantly simply by choosing different benchmark tasks, highlighting the fragility of the current paradigms and potential fallacious interpretation derived from benchmarking ML methods. Given that every benchmark makes a statement about what it perceives to be important, we argue that this might lead to biased progress in the community. We discuss the implications of the observed phenomena and provide recommendations on mitigating them using multiple machine learning domains and communities as use cases, including natural language processing, computer vision, information retrieval, recommender systems, and reinforcement learning.

Sharing "The Benchmark Lottery" from @GoogleAI & @DeepMind.

In this meta-paper (https://t.co/ZNZDQHdZ2k), we examine the challenges of ML benchmarking (e.g., model comparisons) and how it affects long-term progress. 1/ pic.twitter.com/2V777yETaJ
— Yi Tay (@ytay017) July 16, 2021

The Benchmark Lottery
pdf: https://t.co/iZEyaKTitc
abs: https://t.co/atYqI7rXLq

proposes the notion of a benchmark lottery that describes the overall fragility of the ML benchmarking process pic.twitter.com/jx3kaPrT6y
— AK (@ak92501) July 16, 2021

1. Benchmarks are fundamental to track progress in empirical machine learning. In our new paper, we study how benchmarking may affect the long term research direction and pace of progress in ML and put forward the notion of a "benchmark lottery":https://t.co/br9RYNJPbN
— Mostafa Dehghani (@m__dehghani) July 16, 2021

"The Benchmark Lottery" -- on the challenges of the machine learning method comparison process. This manuscript comes with a handy checklist in the appendix, which could be a nice complement to the reproducibility checklist https://t.co/ImABAYACHk pic.twitter.com/2oLBwWk8lX
— Sebastian Raschka (@rasbt) July 16, 2021

3. Tailor: Generating and Perturbing Text with Semantic Controls

Alexis Ross, Tongshuang Wu, Hao Peng, Matthew E. Peters, Matt Gardner

retweets: 905, favorites: 161 (07/17/2021 08:02:19)
links: abs | pdf
cs.CL

Making controlled perturbations is essential for various tasks (e.g., data augmentation), but building task-specific generators can be expensive. We introduce Tailor, a task-agnostic generation system that perturbs text in a semantically-controlled way. With unlikelihood training, we design Tailor’s generator to follow a series of control codes derived from semantic roles. Through modifications of these control codes, Tailor can produce fine-grained perturbations. We implement a set of operations on control codes that can be composed into complex perturbation strategies, and demonstrate their effectiveness in three distinct applications: First, Tailor facilitates the construction of high-quality contrast sets that are lexically diverse, and less biased than original task test data. Second, paired with automated labeling heuristics, Tailor helps improve model generalization through data augmentation: We obtain an average gain of 1.73 on an NLI challenge set by perturbing just 5% of training data. Third, without any finetuning overhead, Tailor’s perturbations effectively improve compositionality in fine-grained style transfer, outperforming fine-tuned baselines on 6 transfers.

New preprint alert!
*Tailor: Generating and Perturbing Text with Semantic Controls*
Title says it all: we perturb sentences in semantically controlled ways like how a tailor changes clothes 🪡.
w/ @alexisjross, @haopeng01, @mattthemathman, @nlpmattg https://t.co/elD9WVrq0H
1/n pic.twitter.com/GledzeXrwh
— Sherry Tongshuang Wu (@tongshuangwu) July 16, 2021

Tailor: Generating and Perturbing Text with Semantic Controls
pdf: https://t.co/cIupTzHdyL
abs: https://t.co/udXVBAEqgs

a taskagnostic generation system that perturbs text
in a semantically-controlled way pic.twitter.com/JEcPQtsAUn
— AK (@ak92501) July 16, 2021

4. Passive attention in artificial neural networks predicts human visual selectivity

Thomas A. Langlois, H. Charles Zhao, Erin Grant, Ishita Dasgupta, Thomas L. Griffiths, Nori Jacoby

retweets: 864, favorites: 135 (07/17/2021 08:02:19)
links: abs | pdf
cs.CV

Developments in machine learning interpretability techniques over the past decade have provided new tools to observe the image regions that are most informative for classification and localization in artificial neural networks (ANNs). Are the same regions similarly informative to human observers? Using data from 78 new experiments and 6,610 participants, we show that passive attention techniques reveal a significant overlap with human visual selectivity estimates derived from 6 distinct behavioral tasks including visual discrimination, spatial localization, recognizability, free-viewing, cued-object search, and saliency search fixations. We find that input visualizations derived from relatively simple ANN architectures probed using guided backpropagation methods are the best predictors of a shared component in the joint variability of the human measures. We validate these correlational results with causal manipulations using recognition experiments. We show that images masked with ANN attention maps were easier for humans to classify than control masks in a speeded recognition experiment. Similarly, we find that recognition performance in the same ANN models was likewise influenced by masking input images using human visual selectivity maps. This work contributes a new approach to evaluating the biological and psychological validity of leading ANNs as models of human vision: by examining their similarities and differences in terms of their visual selectivity to the information contained in images.

Passive attention in artificial neural networks predicts human visual selectivity
pdf: https://t.co/px3yAu6rEM

78 new experiments and 6,610 participants, show that passive attention techniques reveal a significant overlap with human visual selectivity estimates pic.twitter.com/WKhgIOrFFo
— AK (@ak92501) July 16, 2021

5. Level generation and style enhancement — deep learning for game development overview

Piotr Migdał, Bartłomiej Olechno, Błażej Podgórski

retweets: 449, favorites: 66 (07/17/2021 08:02:20)
links: abs | pdf
cs.CV

We present practical approaches of using deep learning to create and enhance level maps and textures for video games — desktop, mobile, and web. We aim to present new possibilities for game developers and level artists. The task of designing levels and filling them with details is challenging. It is both time-consuming and takes effort to make levels rich, complex, and with a feeling of being natural. Fortunately, recent progress in deep learning provides new tools to accompany level designers and visual artists. Moreover, they offer a way to generate infinite worlds for game replayability and adjust educational games to players’ needs. We present seven approaches to create level maps, each using statistical methods, machine learning, or deep learning. In particular, we include: - Generative Adversarial Networks for creating new images from existing examples (e.g. ProGAN). - Super-resolution techniques for upscaling images while preserving crisp detail (e.g. ESRGAN). - Neural style transfer for changing visual themes. - Image translation - turning semantic maps into images (e.g. GauGAN). - Semantic segmentation for turning images into semantic masks (e.g. U-Net). - Unsupervised semantic segmentation for extracting semantic features (e.g. Tile2Vec). - Texture synthesis - creating large patterns based on a smaller sample (e.g. InGAN).

How to use deep learning in game development?
Glad you ask - I've just posted on arXiv:

"Level generation and style enhancement — deep learning for game development overview" https://t.co/bwm40PSVO2

Enjoy!
Also: it has pictures. #deeplearning #ai #gamedev pic.twitter.com/klFxIJLOTb
— Piotr Migdal (@pmigdal) July 16, 2021

6. HTLM: Hyper-Text Pre-Training and Prompting of Language Models

Armen Aghajanyan, Dmytro Okhonko, Mike Lewis, Mandar Joshi, Hu Xu, Gargi Ghosh, Luke Zettlemoyer

retweets: 387, favorites: 117 (07/17/2021 08:02:20)
links: abs | pdf
cs.CL | cs.LG

We introduce HTLM, a hyper-text language model trained on a large-scale web crawl. Modeling hyper-text has a number of advantages: (1) it is easily gathered at scale, (2) it provides rich document-level and end-task-adjacent supervision (e.g. class and id attributes often encode document category information), and (3) it allows for new structured prompting that follows the established semantics of HTML (e.g. to do zero-shot summarization by infilling title tags for a webpage that contains the input text). We show that pretraining with a BART-style denoising loss directly on simplified HTML provides highly effective transfer for a wide range of end tasks and supervision levels. HTLM matches or exceeds the performance of comparably sized text-only LMs for zero-shot prompting and fine-tuning for classification benchmarks, while also setting new state-of-the-art performance levels for zero-shot summarization. We also find that hyper-text prompts provide more value to HTLM, in terms of data efficiency, than plain text prompts do for existing LMs, and that HTLM is highly effective at auto-prompting itself, by simply generating the most likely hyper-text formatting for any available training data. We will release all code and models to support future HTLM research.

I'm excited to announce our new pre-training paper: HTLM: Hyper-Text Pre-Training and Prompting of Language Models (https://t.co/35bdRQDxC8) where we unlock new ways of priming and automatically generating prompts by pre-training on simplified HTML.
— Armen Aghajanyan (@ArmenAgha) July 16, 2021

HTLM: Hyper-Text Pre-Training and Prompting of Language Models
pdf: https://t.co/BuTQh8RGfq
abs: https://t.co/wR55ECm4nd

a hyper-text language model trained on simplified HTML documents from a large-scale web crawl pic.twitter.com/WlAC6y6FMi
— AK (@ak92501) July 16, 2021

7. Wordcraft: a Human-AI Collaborative Editor for Story Writing

Andy Coenen, Luke Davis, Daphne Ippolito, Emily Reif, Ann Yuan

retweets: 310, favorites: 77 (07/17/2021 08:02:20)
links: abs | pdf
cs.CL

As neural language models grow in effectiveness, they are increasingly being applied in real-world settings. However these applications tend to be limited in the modes of interaction they support. In this extended abstract, we propose Wordcraft, an AI-assisted editor for story writing in which a writer and a dialog system collaborate to write a story. Our novel interface uses few-shot learning and the natural affordances of conversation to support a variety of interactions. Our editor provides a sandbox for writers to probe the boundaries of transformer-based language models and paves the way for future human-in-the-loop training pipelines and novel evaluation methods.

Wordcraft: a Human-AI Collaborative Editor for Story Writing
pdf: https://t.co/88LqNoG55T
abs: https://t.co/ZBRjCe6GkM

an AI-assisted editor for story writing in which a writer and a dialog system collaborate to write a story pic.twitter.com/lEegXN4zHt
— AK (@ak92501) July 16, 2021

8. StyleFusion: A Generative Model for Disentangling Spatial Segments

Omer Kafri, Or Patashnik, Yuval Alaluf, Daniel Cohen-Or

retweets: 224, favorites: 64 (07/17/2021 08:02:20)
links: abs | pdf
cs.CV

We present StyleFusion, a new mapping architecture for StyleGAN, which takes as input a number of latent codes and fuses them into a single style code. Inserting the resulting style code into a pre-trained StyleGAN generator results in a single harmonized image in which each semantic region is controlled by one of the input latent codes. Effectively, StyleFusion yields a disentangled representation of the image, providing fine-grained control over each region of the generated image. Moreover, to help facilitate global control over the generated image, a special input latent code is incorporated into the fused representation. StyleFusion operates in a hierarchical manner, where each level is tasked with learning to disentangle a pair of image regions (e.g., the car body and wheels). The resulting learned disentanglement allows one to modify both local, fine-grained semantics (e.g., facial features) as well as more global features (e.g., pose and background), providing improved flexibility in the synthesis process. As a natural extension, StyleFusion enables one to perform semantically-aware cross-image mixing of regions that are not necessarily aligned. Finally, we demonstrate how StyleFusion can be paired with existing editing techniques to more faithfully constrain the edit to the user’s region of interest.

StyleFusion: A Generative Model for Disentangling Spatial Segments
pdf: https://t.co/wjKhgMdejQ
abs: https://t.co/qlm7Sl1lfR

a new mapping architecture for StyleGAN, which takes as input a number of latent codes and fuses them into a single style code pic.twitter.com/c78VxeZ5LT
— AK (@ak92501) July 16, 2021

9. Turning Tables: Generating Examples from Semi-structured Tables for Endowing Language Models with Reasoning Skills

Ori Yoran, Alon Talmor, Jonathan Berant

retweets: 202, favorites: 64 (07/17/2021 08:02:20)
links: abs | pdf
cs.CL | cs.LG

Models pre-trained with a language modeling objective possess ample world knowledge and language skills, but are known to struggle in tasks that require reasoning. In this work, we propose to leverage semi-structured tables, and automatically generate at scale question-paragraph pairs, where answering the question requires reasoning over multiple facts in the paragraph. We add a pre-training step over this synthetic data, which includes examples that require 16 different reasoning skills such as number comparison, conjunction, and fact composition. To improve data efficiency, we propose sampling strategies that focus training on reasoning skills the model is currently lacking. We evaluate our approach on three reading comprehension datasets that are focused on reasoning, and show that our model, PReasM, substantially outperforms T5, a popular pre-trained encoder-decoder model. Moreover, sampling examples based on current model errors leads to faster training and higher overall performance.

How can we make pre-trained LMs do better reasoning?
In a new work with @AlonTalmor and @JonathanBerant we show that we can automatically generate examples from semi-structured tables and drastically improve performance on RC tasks that involve reasoning.https://t.co/HuM0Y5pAQw pic.twitter.com/1JVYJNpCfd
— Ori Yoran (@OriYoran) July 16, 2021

10. Algorithmic Concept-based Explainable Reasoning

Dobrik Georgiev, Pietro Barbiero, Dmitry Kazhdan, Petar Veličković, Pietro Liò

retweets: 171, favorites: 68 (07/17/2021 08:02:21)
links: abs | pdf
cs.LG

Recent research on graph neural network (GNN) models successfully applied GNNs to classical graph algorithms and combinatorial optimisation problems. This has numerous benefits, such as allowing applications of algorithms when preconditions are not satisfied, or reusing learned models when sufficient training data is not available or can’t be generated. Unfortunately, a key hindrance of these approaches is their lack of explainability, since GNNs are black-box models that cannot be interpreted directly. In this work, we address this limitation by applying existing work on concept-based explanations to GNN models. We introduce concept-bottleneck GNNs, which rely on a modification to the GNN readout mechanism. Using three case studies we demonstrate that: (i) our proposed model is capable of accurately learning concepts and extracting propositional formulas based on the learned concepts for each target class; (ii) our concept-based GNN models achieve comparative performance with state-of-the-art models; (iii) we can derive global graph concepts, without explicitly providing any supervision on graph-level concepts.

Interesting things happen when explainable AI (XAI) meets algorithmic reasoning.

Our recent work (w/ @DobrikG, @pietro_barbiero, @DmitryKazhdan and @pl219_Cambridge) investigates.

We successfully extract useful FOL explanations from learnt reasoners!https://t.co/4ZFKPPwRSP https://t.co/Kra1YW21ba pic.twitter.com/Whysv5K04z
— Petar Veličković (@PetarV_93) July 16, 2021

11. From Reddit to Wall Street: The role of committed minorities in financial collective action

Lorenzo Lucchini, Luca Maria Aiello, Laura Alessandretti, Gianmarco De Francisci Morales, Michele Starnini, Andrea Baronchelli

retweets: 170, favorites: 68 (07/17/2021 08:02:21)
links: abs | pdf
physics.soc-ph | cs.CY

In January 2021, retail investors coordinated on Reddit to target short selling activity by hedge funds on GameStop shares, causing a surge in the share price and triggering significant losses for the funds involved. Such an effective collective action was unprecedented in finance, and its dynamics remain unclear. Here, we analyse Reddit and financial data and rationalise the events based on recent findings describing how a small fraction of committed individuals may trigger behavioural cascades. First, we operationalise the concept of individual commitment in financial discussions. Second, we show that the increase of commitment within Reddit predated the initial surge in price. Third, we reveal that initial committed users occupied a central position in the network of Reddit conversations. Finally, we show that the social identity of the broader Reddit community grew as the collective action unfolded. These findings shed light on financial collective action, as several observers anticipate it will grow in importance.

Very happy to share this last work with a bunch of friends!

We all know @Reddit users coordinated on @wallstreetbets to trigger a short squeeze of @GameStop shares.

But how such unprecedented collective action took place? 1/4https://t.co/TpCztGaVSH
— Michele Starnini (@m_starnini) July 16, 2021

In "From Reddit to Wall Street: The role of committed minorities in financial collective action" we clarify what happened when organised retail investors shook Wall Street in Jan 2021, alarming regulators from all over the world.https://t.co/HPr6Kw5wy6

See thread. https://t.co/LfN9S4oHdH
— Andrea Baronchelli (@a_baronca) July 16, 2021

12. StyleVideoGAN: A Temporal Generative Model using a Pretrained StyleGAN

Gereon Fox, Ayush Tewari, Mohamed Elgharib, Christian Theobalt

retweets: 80, favorites: 55 (07/17/2021 08:02:21)
links: abs | pdf
cs.CV

Generative adversarial models (GANs) continue to produce advances in terms of the visual quality of still images, as well as the learning of temporal correlations. However, few works manage to combine these two interesting capabilities for the synthesis of video content: Most methods require an extensive training dataset in order to learn temporal correlations, while being rather limited in the resolution and visual quality of their output frames. In this paper, we present a novel approach to the video synthesis problem that helps to greatly improve visual quality and drastically reduce the amount of training data and resources necessary for generating video content. Our formulation separates the spatial domain, in which individual frames are synthesized, from the temporal domain, in which motion is generated. For the spatial domain we make use of a pre-trained StyleGAN network, the latent space of which allows control over the appearance of the objects it was trained for. The expressive power of this model allows us to embed our training videos in the StyleGAN latent space. Our temporal architecture is then trained not on sequences of RGB frames, but on sequences of StyleGAN latent codes. The advantageous properties of the StyleGAN space simplify the discovery of temporal correlations. We demonstrate that it suffices to train our temporal architecture on only 10 minutes of footage of 1 subject for about 6 hours. After training, our model can not only generate new portrait videos for the training subject, but also for any random subject which can be embedded in the StyleGAN space.

StyleVideoGAN: A Temporal Generative Model using a Pretrained StyleGAN
pdf: https://t.co/0qmMTydUN5
abs: https://t.co/G5AW58eIqx

a temporal GAN for the unconditional generation of high-quality videos pic.twitter.com/30YnWSSdMj
— AK (@ak92501) July 16, 2021

13. Clustering of heterogeneous populations of networks

Jean-Gabriel Young, Alec Kirkley, M. E. J. Newman

retweets: 60, favorites: 51 (07/17/2021 08:02:21)
links: abs | pdf
cs.SI | physics.soc-ph | stat.AP

Statistical methods for reconstructing networks from repeated measurements typically assume that all measurements are generated from the same underlying network structure. This need not be the case, however. People’s social networks might be different on weekdays and weekends, for instance. Brain networks may differ between healthy patients and those with dementia or other conditions. Here we describe a Bayesian analysis framework for such data that allows for the fact that network measurements may be reflective of multiple possible structures. We define a finite mixture model of the measurement process and derive a fast Gibbs sampling procedure that samples exactly from the full posterior distribution of model parameters. The end result is a clustering of the measured networks into groups with similar structure. We demonstrate the method on both real and synthetic network populations.

Here's our preprint on an interpretable mixture model to separate imperfect observations of multiple networks and simultaneously find errors.

Co-led by @captainkirk1041, with MEJ Newman.https://t.co/2GIdeGZR9W pic.twitter.com/PYC8zZrXUH
— Jean-Gabriel Young (@_jgyou) July 16, 2021

14. Increasing Faithfulness in Knowledge-Grounded Dialogue with Controllable Features

Hannah Rashkin, David Reitter, Gaurav Singh Tomar, Dipanjan Das

retweets: 57, favorites: 43 (07/17/2021 08:02:21)
links: abs | pdf
cs.CL

Knowledge-grounded dialogue systems are intended to convey information that is based on evidence provided in a given source text. We discuss the challenges of training a generative neural dialogue model for such systems that is controlled to stay faithful to the evidence. Existing datasets contain a mix of conversational responses that are faithful to selected evidence as well as more subjective or chit-chat style responses. We propose different evaluation measures to disentangle these different styles of responses by quantifying the informativeness and objectivity. At training time, additional inputs based on these evaluation measures are given to the dialogue model. At generation time, these additional inputs act as stylistic controls that encourage the model to generate responses that are faithful to the provided evidence. We also investigate the usage of additional controls at decoding time using resampling techniques. In addition to automatic metrics, we perform a human evaluation study where raters judge the output of these controlled generation models to be generally more objective and faithful to the evidence compared to baseline dialogue systems.

End-to-end dialogue models often generate responses about the world that are not "faithful" to evidence in grounding corpora. We present new work on controlling these responses to be attributable to such evidence.

Paper: https://t.co/d0IaPHRS88
Abs: https://t.co/SU8sQ7mFTU

1/
— Dipanjan Das (@dipanjand) July 16, 2021

15. MultiBench: Multiscale Benchmarks for Multimodal Representation Learning

Paul Pu Liang, Yiwei Lyu, Xiang Fan, Zetian Wu, Yun Cheng, Jason Wu, Leslie Chen, Peter Wu, Michelle A. Lee, Yuke Zhu, Ruslan Salakhutdinov, Louis-Philippe Morency

retweets: 30, favorites: 40 (07/17/2021 08:02:21)
links: abs | pdf
cs.LG | cs.AI | cs.CL | cs.CV | cs.MM

Learning multimodal representations involves integrating information from multiple heterogeneous sources of data. It is a challenging yet crucial area with numerous real-world applications in multimedia, affective computing, robotics, finance, human-computer interaction, and healthcare. Unfortunately, multimodal research has seen limited resources to study (1) generalization across domains and modalities, (2) complexity during training and inference, and (3) robustness to noisy and missing modalities. In order to accelerate progress towards understudied modalities and tasks while ensuring real-world robustness, we release MultiBench, a systematic and unified large-scale benchmark spanning 15 datasets, 10 modalities, 20 prediction tasks, and 6 research areas. MultiBench provides an automated end-to-end machine learning pipeline that simplifies and standardizes data loading, experimental setup, and model evaluation. To enable holistic evaluation, MultiBench offers a comprehensive methodology to assess (1) generalization, (2) time and space complexity, and (3) modality robustness. MultiBench introduces impactful challenges for future research, including scalability to large-scale multimodal datasets and robustness to realistic imperfections. To accompany this benchmark, we also provide a standardized implementation of 20 core approaches in multimodal learning. Simply applying methods proposed in different research areas can improve the state-of-the-art performance on 9/15 datasets. Therefore, MultiBench presents a milestone in unifying disjoint efforts in multimodal research and paves the way towards a better understanding of the capabilities and limitations of multimodal models, all the while ensuring ease of use, accessibility, and reproducibility. MultiBench, our standardized code, and leaderboards are publicly available, will be regularly updated, and welcomes inputs from the community.

MultiBench: Multiscale Benchmarks for Multimodal Representation Learning
pdf: https://t.co/Ghy58zmFro

a systematic and unified large-scale benchmark for multimodal learning spanning 15 datasets, 10 modalities, 20 prediction tasks, and 6 research areas pic.twitter.com/UZOfm7dILI
— AK (@ak92501) July 16, 2021

16. One Thousand and One Stories: A Large-Scale Survey of Software Refactoring

Yaroslav Golubev, Zarina Kurbatova, Eman Abdullah AlOmar, Timofey Bryksin, Mohamed Wiem Mkaouer

retweets: 42, favorites: 26 (07/17/2021 08:02:21)
links: abs | pdf
cs.SE

Despite the availability of refactoring as a feature in popular IDEs, recent studies revealed that developers are reluctant to use them, and still prefer the manual refactoring of their code. At JetBrains, our goal is to fully support refactoring features in IntelliJ-based IDEs and improve their adoption in practice. Therefore, we start by raising the following main questions. How exactly do people refactor code? What refactorings are the most popular? Why do some developers tend not to use convenient IDE refactoring tools? In this paper, we investigate the raised questions through the design and implementation of a survey targeting 1,183 users of IntelliJ-based IDEs. Our quantitative and qualitative analysis of the survey results shows that almost two-thirds of developers spend more than one hour in a single session refactoring their code; that refactoring types vary greatly in popularity; and that a lot of developers would like to know more about IDE refactoring features but lack the means to do so. These results serve us internally to support the next generation of refactoring features, as well as can help our research community to establish new directions in the refactoring usability research.

Happy to share that our paper "One Thousand and One Stories: A Large-Scale Survey of Software Refactoring" has been accepted to ESEC/FSE 2021!

Congratulations and a big thank you to @zkurbatova, @ECS_Abdullah, @mwmkaouer, and @timofeybryksin!

Pre-print: https://t.co/yypvpWro6y
— Yaroslav Golubev (@areyde) July 16, 2021

17. An End-to-End Differentiable Framework for Contact-Aware Robot Design

Jie Xu, Tao Chen, Lara Zlokapa, Michael Foshey, Wojciech Matusik, Shinjiro Sueda, Pulkit Agrawal

retweets: 30, favorites: 30 (07/17/2021 08:02:22)
links: abs | pdf
cs.RO | cs.AI | cs.GR

The current dominant paradigm for robotic manipulation involves two separate stages: manipulator design and control. Because the robot’s morphology and how it can be controlled are intimately linked, joint optimization of design and control can significantly improve performance. Existing methods for co-optimization are limited and fail to explore a rich space of designs. The primary reason is the trade-off between the complexity of designs that is necessary for contact-rich tasks against the practical constraints of manufacturing, optimization, contact handling, etc. We overcome several of these challenges by building an end-to-end differentiable framework for contact-aware robot design. The two key components of this framework are: a novel deformation-based parameterization that allows for the design of articulated rigid robots with arbitrary, complex geometry, and a differentiable rigid body simulator that can handle contact-rich scenarios and computes analytical gradients for a full spectrum of kinematic and dynamic parameters. On multiple manipulation tasks, our framework outperforms existing methods that either only optimize for control or for design using alternate representations or co-optimize using gradient-free methods.

An End-to-End Differentiable Framework for Contact-Aware Robot Design
pdf: https://t.co/oV3mjiMLNH
abs: https://t.co/UZfjbSUkNY
project page: https://t.co/wEmZrnZgNY pic.twitter.com/0P5phh8Aqh
— AK (@ak92501) July 16, 2021

18. Recurrent Parameter Generators

Jiayun Wang, Yubei Chen, Stella X. Yu, Brian Cheung, Yann LeCun

retweets: 30, favorites: 28 (07/17/2021 08:02:22)
links: abs | pdf
cs.CV | cs.LG

We present a generic method for recurrently using the same parameters for many different convolution layers to build a deep network. Specifically, for a network, we create a recurrent parameter generator (RPG), from which the parameters of each convolution layer are generated. Though using recurrent models to build a deep convolutional neural network (CNN) is not entirely new, our method achieves significant performance gain compared to the existing works. We demonstrate how to build a one-layer neural network to achieve similar performance compared to other traditional CNN models on various applications and datasets. Such a method allows us to build an arbitrarily complex neural network with any amount of parameters. For example, we build a ResNet34 with model parameters reduced by more than $400$ times, which still achieves $41.6\%$ ImageNet top-1 accuracy. Furthermore, we demonstrate the RPG can be applied at different scales, such as layers, blocks, or even sub-networks. Specifically, we use the RPG to build a ResNet18 network with the number of weights equivalent to one convolutional layer of a conventional ResNet and show this model can achieve $67.2\%$ ImageNet top-1 accuracy. The proposed method can be viewed as an inverse approach to model compression. Rather than removing the unused parameters from a large model, it aims to squeeze more information into a small number of parameters. Extensive experiment results are provided to demonstrate the power of the proposed recurrent parameter generator.

Recurrent Parameter Generators
pdf: https://t.co/90MIwn23P5
abs: https://t.co/5lwOjM59Nu

demonstrate how to build a one-layer neural network to achieve similar performance compared to other traditional CNN models on various applications and datasets pic.twitter.com/bMxmu6xbj3
— AK (@ak92501) July 16, 2021

19. FLEX: Unifying Evaluation for Few-Shot NLP

Jonathan Bragg, Arman Cohan, Kyle Lo, Iz Beltagy

retweets: 31, favorites: 21 (07/17/2021 08:02:22)
links: abs | pdf
cs.CL | cs.LG

Few-shot NLP research is highly active, yet conducted in disjoint research threads with evaluation suites that lack challenging-yet-realistic testing setups and fail to employ careful experimental design. Consequently, the community does not know which techniques perform best or even if they outperform simple baselines. We formulate desiderata for an ideal few-shot NLP benchmark and present FLEX, the first benchmark, public leaderboard, and framework that provides unified, comprehensive measurement for few-shot NLP techniques. FLEX incorporates and introduces new best practices for few-shot evaluation, including measurement of four transfer settings, textual labels for zero-shot evaluation, and a principled approach to benchmark design that optimizes statistical accuracy while keeping evaluation costs accessible to researchers without large compute resources. In addition, we present UniFew, a simple yet strong prompt-based model for few-shot learning which unifies the pretraining and finetuning prompt formats, eschewing complex machinery of recent prompt-based approaches in adapting downstream task formats to language model pretraining objectives. We demonstrate that despite simplicity UniFew achieves results competitive with both popular meta-learning and prompt-based approaches.

FLEX: Unifying Evaluation for Few-Shot NLP
pdf: https://t.co/s4CTpGExQ9
abs: https://t.co/9rxkyOca4Y

benchmark, public leaderboard, and framework that provides unified, comprehensive measurement for few-shot NLP techniques pic.twitter.com/y3F8fhyuxF
— AK (@ak92501) July 16, 2021

Published 17 Jul 2021

ML Lead at Beatrust. (https://beatrust.com)Tatsuya Shirakawa on Twitter