1. From Show to Tell: A Survey on Image Captioning
Matteo Stefanini, Marcella Cornia, Lorenzo Baraldi, Silvia Cascianelli, Giuseppe Fiameni, Rita Cucchiara
Connecting Vision and Language plays an essential role in Generative Intelligence. For this reason, in the last few years, a large research effort has been devoted to image captioning, i.e. the task of describing images with syntactically and semantically meaningful sentences. Starting from 2015 the task has generally been addressed with pipelines composed of a visual encoding step and a language model for text generation. During these years, both components have evolved considerably through the exploitation of object regions, attributes, and relationships and the introduction of multi-modal connections, fully-attentive approaches, and BERT-like early-fusion strategies. However, regardless of the impressive results obtained, research in image captioning has not reached a conclusive answer yet. This work aims at providing a comprehensive overview and categorization of image captioning approaches, from visual encoding and text generation to training strategies, used datasets, and evaluation metrics. In this respect, we quantitatively compare many relevant state-of-the-art approaches to identify the most impactful technical innovations in image captioning architectures and training strategies. Moreover, many variants of the problem and its open challenges are analyzed and discussed. The final goal of this work is to serve as a tool for understanding the existing state-of-the-art and highlighting the future directions for an area of research where Computer Vision and Natural Language Processing can find an optimal synergy.
đ A Survey on Image Captioning
— elvis (@omarsar0) July 16, 2021
If you are interested to know the role vision and language play in generative machine learning models, you will enjoy this new comprehensive survey paper.
Highly recommended for ML and NLP students.https://t.co/zVJV6Lj2JL pic.twitter.com/bj9f3meFbZ
2. The Benchmark Lottery
Mostafa Dehghani, Yi Tay, Alexey A. Gritsenko, Zhe Zhao, Neil Houlsby, Fernando Diaz, Donald Metzler, Oriol Vinyals
The world of empirical machine learning (ML) strongly relies on benchmarks in order to determine the relative effectiveness of different algorithms and methods. This paper proposes the notion of âa benchmark lotteryâ that describes the overall fragility of the ML benchmarking process. The benchmark lottery postulates that many factors, other than fundamental algorithmic superiority, may lead to a method being perceived as superior. On multiple benchmark setups that are prevalent in the ML community, we show that the relative performance of algorithms may be altered significantly simply by choosing different benchmark tasks, highlighting the fragility of the current paradigms and potential fallacious interpretation derived from benchmarking ML methods. Given that every benchmark makes a statement about what it perceives to be important, we argue that this might lead to biased progress in the community. We discuss the implications of the observed phenomena and provide recommendations on mitigating them using multiple machine learning domains and communities as use cases, including natural language processing, computer vision, information retrieval, recommender systems, and reinforcement learning.
Sharing "The Benchmark Lottery" from @GoogleAI & @DeepMind.
— Yi Tay (@ytay017) July 16, 2021
In this meta-paper (https://t.co/ZNZDQHdZ2k), we examine the challenges of ML benchmarking (e.g., model comparisons) and how it affects long-term progress. 1/ pic.twitter.com/2V777yETaJ
The Benchmark Lottery
— AK (@ak92501) July 16, 2021
pdf: https://t.co/iZEyaKTitc
abs: https://t.co/atYqI7rXLq
proposes the notion of a benchmark lottery that describes the overall fragility of the ML benchmarking process pic.twitter.com/jx3kaPrT6y
1. Benchmarks are fundamental to track progress in empirical machine learning. In our new paper, we study how benchmarking may affect the long term research direction and pace of progress in ML and put forward the notion of a "benchmark lottery":https://t.co/br9RYNJPbN
— Mostafa Dehghani (@m__dehghani) July 16, 2021
"The Benchmark Lottery" -- on the challenges of the machine learning method comparison process. This manuscript comes with a handy checklist in the appendix, which could be a nice complement to the reproducibility checklist https://t.co/ImABAYACHk pic.twitter.com/2oLBwWk8lX
— Sebastian Raschka (@rasbt) July 16, 2021
3. Tailor: Generating and Perturbing Text with Semantic Controls
Alexis Ross, Tongshuang Wu, Hao Peng, Matthew E. Peters, Matt Gardner
Making controlled perturbations is essential for various tasks (e.g., data augmentation), but building task-specific generators can be expensive. We introduce Tailor, a task-agnostic generation system that perturbs text in a semantically-controlled way. With unlikelihood training, we design Tailorâs generator to follow a series of control codes derived from semantic roles. Through modifications of these control codes, Tailor can produce fine-grained perturbations. We implement a set of operations on control codes that can be composed into complex perturbation strategies, and demonstrate their effectiveness in three distinct applications: First, Tailor facilitates the construction of high-quality contrast sets that are lexically diverse, and less biased than original task test data. Second, paired with automated labeling heuristics, Tailor helps improve model generalization through data augmentation: We obtain an average gain of 1.73 on an NLI challenge set by perturbing just 5% of training data. Third, without any finetuning overhead, Tailorâs perturbations effectively improve compositionality in fine-grained style transfer, outperforming fine-tuned baselines on 6 transfers.
New preprint alert!
— Sherry Tongshuang Wu (@tongshuangwu) July 16, 2021
*Tailor: Generating and Perturbing Text with Semantic Controls*
Title says it all: we perturb sentences in semantically controlled ways like how a tailor changes clothes đŞĄ.
w/ @alexisjross, @haopeng01, @mattthemathman, @nlpmattghttps://t.co/elD9WVrq0H
1/n pic.twitter.com/GledzeXrwh
Tailor: Generating and Perturbing Text with Semantic Controls
— AK (@ak92501) July 16, 2021
pdf: https://t.co/cIupTzHdyL
abs: https://t.co/udXVBAEqgs
a taskagnostic generation system that perturbs text
in a semantically-controlled way pic.twitter.com/JEcPQtsAUn
4. Passive attention in artificial neural networks predicts human visual selectivity
Thomas A. Langlois, H. Charles Zhao, Erin Grant, Ishita Dasgupta, Thomas L. Griffiths, Nori Jacoby
Developments in machine learning interpretability techniques over the past decade have provided new tools to observe the image regions that are most informative for classification and localization in artificial neural networks (ANNs). Are the same regions similarly informative to human observers? Using data from 78 new experiments and 6,610 participants, we show that passive attention techniques reveal a significant overlap with human visual selectivity estimates derived from 6 distinct behavioral tasks including visual discrimination, spatial localization, recognizability, free-viewing, cued-object search, and saliency search fixations. We find that input visualizations derived from relatively simple ANN architectures probed using guided backpropagation methods are the best predictors of a shared component in the joint variability of the human measures. We validate these correlational results with causal manipulations using recognition experiments. We show that images masked with ANN attention maps were easier for humans to classify than control masks in a speeded recognition experiment. Similarly, we find that recognition performance in the same ANN models was likewise influenced by masking input images using human visual selectivity maps. This work contributes a new approach to evaluating the biological and psychological validity of leading ANNs as models of human vision: by examining their similarities and differences in terms of their visual selectivity to the information contained in images.
Passive attention in artificial neural networks predicts human visual selectivity
— AK (@ak92501) July 16, 2021
pdf: https://t.co/px3yAu6rEM
78 new experiments and 6,610 participants, show that passive attention techniques reveal a significant overlap with human visual selectivity estimates pic.twitter.com/WKhgIOrFFo
5. Level generation and style enhancement â deep learning for game development overview
Piotr MigdaĹ, BartĹomiej Olechno, BĹaĹźej PodgĂłrski
We present practical approaches of using deep learning to create and enhance level maps and textures for video games â desktop, mobile, and web. We aim to present new possibilities for game developers and level artists. The task of designing levels and filling them with details is challenging. It is both time-consuming and takes effort to make levels rich, complex, and with a feeling of being natural. Fortunately, recent progress in deep learning provides new tools to accompany level designers and visual artists. Moreover, they offer a way to generate infinite worlds for game replayability and adjust educational games to playersâ needs. We present seven approaches to create level maps, each using statistical methods, machine learning, or deep learning. In particular, we include: - Generative Adversarial Networks for creating new images from existing examples (e.g. ProGAN). - Super-resolution techniques for upscaling images while preserving crisp detail (e.g. ESRGAN). - Neural style transfer for changing visual themes. - Image translation - turning semantic maps into images (e.g. GauGAN). - Semantic segmentation for turning images into semantic masks (e.g. U-Net). - Unsupervised semantic segmentation for extracting semantic features (e.g. Tile2Vec). - Texture synthesis - creating large patterns based on a smaller sample (e.g. InGAN).
How to use deep learning in game development?
— Piotr Migdal (@pmigdal) July 16, 2021
Glad you ask - I've just posted on arXiv:
"Level generation and style enhancement â deep learning for game development overview" https://t.co/bwm40PSVO2
Enjoy!
Also: it has pictures. #deeplearning #ai #gamedev pic.twitter.com/klFxIJLOTb
6. HTLM: Hyper-Text Pre-Training and Prompting of Language Models
Armen Aghajanyan, Dmytro Okhonko, Mike Lewis, Mandar Joshi, Hu Xu, Gargi Ghosh, Luke Zettlemoyer
We introduce HTLM, a hyper-text language model trained on a large-scale web crawl. Modeling hyper-text has a number of advantages: (1) it is easily gathered at scale, (2) it provides rich document-level and end-task-adjacent supervision (e.g. class and id attributes often encode document category information), and (3) it allows for new structured prompting that follows the established semantics of HTML (e.g. to do zero-shot summarization by infilling title tags for a webpage that contains the input text). We show that pretraining with a BART-style denoising loss directly on simplified HTML provides highly effective transfer for a wide range of end tasks and supervision levels. HTLM matches or exceeds the performance of comparably sized text-only LMs for zero-shot prompting and fine-tuning for classification benchmarks, while also setting new state-of-the-art performance levels for zero-shot summarization. We also find that hyper-text prompts provide more value to HTLM, in terms of data efficiency, than plain text prompts do for existing LMs, and that HTLM is highly effective at auto-prompting itself, by simply generating the most likely hyper-text formatting for any available training data. We will release all code and models to support future HTLM research.
I'm excited to announce our new pre-training paper: HTLM: Hyper-Text Pre-Training and Prompting of Language Models (https://t.co/35bdRQDxC8) where we unlock new ways of priming and automatically generating prompts by pre-training on simplified HTML.
— Armen Aghajanyan (@ArmenAgha) July 16, 2021
HTLM: Hyper-Text Pre-Training and Prompting of Language Models
— AK (@ak92501) July 16, 2021
pdf: https://t.co/BuTQh8RGfq
abs: https://t.co/wR55ECm4nd
a hyper-text language model trained on simplified HTML documents from a large-scale web crawl pic.twitter.com/WlAC6y6FMi
7. Wordcraft: a Human-AI Collaborative Editor for Story Writing
Andy Coenen, Luke Davis, Daphne Ippolito, Emily Reif, Ann Yuan
As neural language models grow in effectiveness, they are increasingly being applied in real-world settings. However these applications tend to be limited in the modes of interaction they support. In this extended abstract, we propose Wordcraft, an AI-assisted editor for story writing in which a writer and a dialog system collaborate to write a story. Our novel interface uses few-shot learning and the natural affordances of conversation to support a variety of interactions. Our editor provides a sandbox for writers to probe the boundaries of transformer-based language models and paves the way for future human-in-the-loop training pipelines and novel evaluation methods.
Wordcraft: a Human-AI Collaborative Editor for Story Writing
— AK (@ak92501) July 16, 2021
pdf: https://t.co/88LqNoG55T
abs: https://t.co/ZBRjCe6GkM
an AI-assisted editor for story writing in which a writer and a dialog system collaborate to write a story pic.twitter.com/lEegXN4zHt
8. StyleFusion: A Generative Model for Disentangling Spatial Segments
Omer Kafri, Or Patashnik, Yuval Alaluf, Daniel Cohen-Or
We present StyleFusion, a new mapping architecture for StyleGAN, which takes as input a number of latent codes and fuses them into a single style code. Inserting the resulting style code into a pre-trained StyleGAN generator results in a single harmonized image in which each semantic region is controlled by one of the input latent codes. Effectively, StyleFusion yields a disentangled representation of the image, providing fine-grained control over each region of the generated image. Moreover, to help facilitate global control over the generated image, a special input latent code is incorporated into the fused representation. StyleFusion operates in a hierarchical manner, where each level is tasked with learning to disentangle a pair of image regions (e.g., the car body and wheels). The resulting learned disentanglement allows one to modify both local, fine-grained semantics (e.g., facial features) as well as more global features (e.g., pose and background), providing improved flexibility in the synthesis process. As a natural extension, StyleFusion enables one to perform semantically-aware cross-image mixing of regions that are not necessarily aligned. Finally, we demonstrate how StyleFusion can be paired with existing editing techniques to more faithfully constrain the edit to the userâs region of interest.
StyleFusion: A Generative Model for Disentangling Spatial Segments
— AK (@ak92501) July 16, 2021
pdf: https://t.co/wjKhgMdejQ
abs: https://t.co/qlm7Sl1lfR
a new mapping architecture for StyleGAN, which takes as input a number of latent codes and fuses them into a single style code pic.twitter.com/c78VxeZ5LT
9. Turning Tables: Generating Examples from Semi-structured Tables for Endowing Language Models with Reasoning Skills
Ori Yoran, Alon Talmor, Jonathan Berant
Models pre-trained with a language modeling objective possess ample world knowledge and language skills, but are known to struggle in tasks that require reasoning. In this work, we propose to leverage semi-structured tables, and automatically generate at scale question-paragraph pairs, where answering the question requires reasoning over multiple facts in the paragraph. We add a pre-training step over this synthetic data, which includes examples that require 16 different reasoning skills such as number comparison, conjunction, and fact composition. To improve data efficiency, we propose sampling strategies that focus training on reasoning skills the model is currently lacking. We evaluate our approach on three reading comprehension datasets that are focused on reasoning, and show that our model, PReasM, substantially outperforms T5, a popular pre-trained encoder-decoder model. Moreover, sampling examples based on current model errors leads to faster training and higher overall performance.
How can we make pre-trained LMs do better reasoning?
— Ori Yoran (@OriYoran) July 16, 2021
In a new work with @AlonTalmor and @JonathanBerant we show that we can automatically generate examples from semi-structured tables and drastically improve performance on RC tasks that involve reasoning.https://t.co/HuM0Y5pAQw pic.twitter.com/1JVYJNpCfd
10. Algorithmic Concept-based Explainable Reasoning
Dobrik Georgiev, Pietro Barbiero, Dmitry Kazhdan, Petar VeliÄkoviÄ, Pietro Liò
Recent research on graph neural network (GNN) models successfully applied GNNs to classical graph algorithms and combinatorial optimisation problems. This has numerous benefits, such as allowing applications of algorithms when preconditions are not satisfied, or reusing learned models when sufficient training data is not available or canât be generated. Unfortunately, a key hindrance of these approaches is their lack of explainability, since GNNs are black-box models that cannot be interpreted directly. In this work, we address this limitation by applying existing work on concept-based explanations to GNN models. We introduce concept-bottleneck GNNs, which rely on a modification to the GNN readout mechanism. Using three case studies we demonstrate that: (i) our proposed model is capable of accurately learning concepts and extracting propositional formulas based on the learned concepts for each target class; (ii) our concept-based GNN models achieve comparative performance with state-of-the-art models; (iii) we can derive global graph concepts, without explicitly providing any supervision on graph-level concepts.
Interesting things happen when explainable AI (XAI) meets algorithmic reasoning.
— Petar VeliÄkoviÄ (@PetarV_93) July 16, 2021
Our recent work (w/ @DobrikG, @pietro_barbiero, @DmitryKazhdan and @pl219_Cambridge) investigates.
We successfully extract useful FOL explanations from learnt reasoners!https://t.co/4ZFKPPwRSP https://t.co/Kra1YW21ba pic.twitter.com/Whysv5K04z
11. From Reddit to Wall Street: The role of committed minorities in financial collective action
Lorenzo Lucchini, Luca Maria Aiello, Laura Alessandretti, Gianmarco De Francisci Morales, Michele Starnini, Andrea Baronchelli
- retweets: 170, favorites: 68 (07/17/2021 08:02:21)
- links: abs | pdf
- physics.soc-ph | cs.CY
In January 2021, retail investors coordinated on Reddit to target short selling activity by hedge funds on GameStop shares, causing a surge in the share price and triggering significant losses for the funds involved. Such an effective collective action was unprecedented in finance, and its dynamics remain unclear. Here, we analyse Reddit and financial data and rationalise the events based on recent findings describing how a small fraction of committed individuals may trigger behavioural cascades. First, we operationalise the concept of individual commitment in financial discussions. Second, we show that the increase of commitment within Reddit predated the initial surge in price. Third, we reveal that initial committed users occupied a central position in the network of Reddit conversations. Finally, we show that the social identity of the broader Reddit community grew as the collective action unfolded. These findings shed light on financial collective action, as several observers anticipate it will grow in importance.
Very happy to share this last work with a bunch of friends!
— Michele Starnini (@m_starnini) July 16, 2021
We all know @Reddit users coordinated on @wallstreetbets to trigger a short squeeze of @GameStop shares.
But how such unprecedented collective action took place? 1/4https://t.co/TpCztGaVSH
In "From Reddit to Wall Street: The role of committed minorities in financial collective action" we clarify what happened when organised retail investors shook Wall Street in Jan 2021, alarming regulators from all over the world.https://t.co/HPr6Kw5wy6
— Andrea Baronchelli (@a_baronca) July 16, 2021
See thread. https://t.co/LfN9S4oHdH
12. StyleVideoGAN: A Temporal Generative Model using a Pretrained StyleGAN
Gereon Fox, Ayush Tewari, Mohamed Elgharib, Christian Theobalt
Generative adversarial models (GANs) continue to produce advances in terms of the visual quality of still images, as well as the learning of temporal correlations. However, few works manage to combine these two interesting capabilities for the synthesis of video content: Most methods require an extensive training dataset in order to learn temporal correlations, while being rather limited in the resolution and visual quality of their output frames. In this paper, we present a novel approach to the video synthesis problem that helps to greatly improve visual quality and drastically reduce the amount of training data and resources necessary for generating video content. Our formulation separates the spatial domain, in which individual frames are synthesized, from the temporal domain, in which motion is generated. For the spatial domain we make use of a pre-trained StyleGAN network, the latent space of which allows control over the appearance of the objects it was trained for. The expressive power of this model allows us to embed our training videos in the StyleGAN latent space. Our temporal architecture is then trained not on sequences of RGB frames, but on sequences of StyleGAN latent codes. The advantageous properties of the StyleGAN space simplify the discovery of temporal correlations. We demonstrate that it suffices to train our temporal architecture on only 10 minutes of footage of 1 subject for about 6 hours. After training, our model can not only generate new portrait videos for the training subject, but also for any random subject which can be embedded in the StyleGAN space.
StyleVideoGAN: A Temporal Generative Model using a Pretrained StyleGAN
— AK (@ak92501) July 16, 2021
pdf: https://t.co/0qmMTydUN5
abs: https://t.co/G5AW58eIqx
a temporal GAN for the unconditional generation of high-quality videos pic.twitter.com/30YnWSSdMj
13. Clustering of heterogeneous populations of networks
Jean-Gabriel Young, Alec Kirkley, M. E. J. Newman
- retweets: 60, favorites: 51 (07/17/2021 08:02:21)
- links: abs | pdf
- cs.SI | physics.soc-ph | stat.AP
Statistical methods for reconstructing networks from repeated measurements typically assume that all measurements are generated from the same underlying network structure. This need not be the case, however. Peopleâs social networks might be different on weekdays and weekends, for instance. Brain networks may differ between healthy patients and those with dementia or other conditions. Here we describe a Bayesian analysis framework for such data that allows for the fact that network measurements may be reflective of multiple possible structures. We define a finite mixture model of the measurement process and derive a fast Gibbs sampling procedure that samples exactly from the full posterior distribution of model parameters. The end result is a clustering of the measured networks into groups with similar structure. We demonstrate the method on both real and synthetic network populations.
Here's our preprint on an interpretable mixture model to separate imperfect observations of multiple networks and simultaneously find errors.
— Jean-Gabriel Young (@_jgyou) July 16, 2021
Co-led by @captainkirk1041, with MEJ Newman.https://t.co/2GIdeGZR9W pic.twitter.com/PYC8zZrXUH
14. Increasing Faithfulness in Knowledge-Grounded Dialogue with Controllable Features
Hannah Rashkin, David Reitter, Gaurav Singh Tomar, Dipanjan Das
Knowledge-grounded dialogue systems are intended to convey information that is based on evidence provided in a given source text. We discuss the challenges of training a generative neural dialogue model for such systems that is controlled to stay faithful to the evidence. Existing datasets contain a mix of conversational responses that are faithful to selected evidence as well as more subjective or chit-chat style responses. We propose different evaluation measures to disentangle these different styles of responses by quantifying the informativeness and objectivity. At training time, additional inputs based on these evaluation measures are given to the dialogue model. At generation time, these additional inputs act as stylistic controls that encourage the model to generate responses that are faithful to the provided evidence. We also investigate the usage of additional controls at decoding time using resampling techniques. In addition to automatic metrics, we perform a human evaluation study where raters judge the output of these controlled generation models to be generally more objective and faithful to the evidence compared to baseline dialogue systems.
End-to-end dialogue models often generate responses about the world that are not "faithful" to evidence in grounding corpora. We present new work on controlling these responses to be attributable to such evidence.
— Dipanjan Das (@dipanjand) July 16, 2021
Paper: https://t.co/d0IaPHRS88
Abs: https://t.co/SU8sQ7mFTU
1/
15. MultiBench: Multiscale Benchmarks for Multimodal Representation Learning
Paul Pu Liang, Yiwei Lyu, Xiang Fan, Zetian Wu, Yun Cheng, Jason Wu, Leslie Chen, Peter Wu, Michelle A. Lee, Yuke Zhu, Ruslan Salakhutdinov, Louis-Philippe Morency
- retweets: 30, favorites: 40 (07/17/2021 08:02:21)
- links: abs | pdf
- cs.LG | cs.AI | cs.CL | cs.CV | cs.MM
Learning multimodal representations involves integrating information from multiple heterogeneous sources of data. It is a challenging yet crucial area with numerous real-world applications in multimedia, affective computing, robotics, finance, human-computer interaction, and healthcare. Unfortunately, multimodal research has seen limited resources to study (1) generalization across domains and modalities, (2) complexity during training and inference, and (3) robustness to noisy and missing modalities. In order to accelerate progress towards understudied modalities and tasks while ensuring real-world robustness, we release MultiBench, a systematic and unified large-scale benchmark spanning 15 datasets, 10 modalities, 20 prediction tasks, and 6 research areas. MultiBench provides an automated end-to-end machine learning pipeline that simplifies and standardizes data loading, experimental setup, and model evaluation. To enable holistic evaluation, MultiBench offers a comprehensive methodology to assess (1) generalization, (2) time and space complexity, and (3) modality robustness. MultiBench introduces impactful challenges for future research, including scalability to large-scale multimodal datasets and robustness to realistic imperfections. To accompany this benchmark, we also provide a standardized implementation of 20 core approaches in multimodal learning. Simply applying methods proposed in different research areas can improve the state-of-the-art performance on 9/15 datasets. Therefore, MultiBench presents a milestone in unifying disjoint efforts in multimodal research and paves the way towards a better understanding of the capabilities and limitations of multimodal models, all the while ensuring ease of use, accessibility, and reproducibility. MultiBench, our standardized code, and leaderboards are publicly available, will be regularly updated, and welcomes inputs from the community.
MultiBench: Multiscale Benchmarks for Multimodal Representation Learning
— AK (@ak92501) July 16, 2021
pdf: https://t.co/Ghy58zmFro
a systematic and unified large-scale benchmark for multimodal learning spanning 15 datasets, 10 modalities, 20 prediction tasks, and 6 research areas pic.twitter.com/UZOfm7dILI
16. One Thousand and One Stories: A Large-Scale Survey of Software Refactoring
Yaroslav Golubev, Zarina Kurbatova, Eman Abdullah AlOmar, Timofey Bryksin, Mohamed Wiem Mkaouer
Despite the availability of refactoring as a feature in popular IDEs, recent studies revealed that developers are reluctant to use them, and still prefer the manual refactoring of their code. At JetBrains, our goal is to fully support refactoring features in IntelliJ-based IDEs and improve their adoption in practice. Therefore, we start by raising the following main questions. How exactly do people refactor code? What refactorings are the most popular? Why do some developers tend not to use convenient IDE refactoring tools? In this paper, we investigate the raised questions through the design and implementation of a survey targeting 1,183 users of IntelliJ-based IDEs. Our quantitative and qualitative analysis of the survey results shows that almost two-thirds of developers spend more than one hour in a single session refactoring their code; that refactoring types vary greatly in popularity; and that a lot of developers would like to know more about IDE refactoring features but lack the means to do so. These results serve us internally to support the next generation of refactoring features, as well as can help our research community to establish new directions in the refactoring usability research.
Happy to share that our paper "One Thousand and One Stories: A Large-Scale Survey of Software Refactoring" has been accepted to ESEC/FSE 2021!
— Yaroslav Golubev (@areyde) July 16, 2021
Congratulations and a big thank you to @zkurbatova, @ECS_Abdullah, @mwmkaouer, and @timofeybryksin!
Pre-print: https://t.co/yypvpWro6y
17. An End-to-End Differentiable Framework for Contact-Aware Robot Design
Jie Xu, Tao Chen, Lara Zlokapa, Michael Foshey, Wojciech Matusik, Shinjiro Sueda, Pulkit Agrawal
The current dominant paradigm for robotic manipulation involves two separate stages: manipulator design and control. Because the robotâs morphology and how it can be controlled are intimately linked, joint optimization of design and control can significantly improve performance. Existing methods for co-optimization are limited and fail to explore a rich space of designs. The primary reason is the trade-off between the complexity of designs that is necessary for contact-rich tasks against the practical constraints of manufacturing, optimization, contact handling, etc. We overcome several of these challenges by building an end-to-end differentiable framework for contact-aware robot design. The two key components of this framework are: a novel deformation-based parameterization that allows for the design of articulated rigid robots with arbitrary, complex geometry, and a differentiable rigid body simulator that can handle contact-rich scenarios and computes analytical gradients for a full spectrum of kinematic and dynamic parameters. On multiple manipulation tasks, our framework outperforms existing methods that either only optimize for control or for design using alternate representations or co-optimize using gradient-free methods.
An End-to-End Differentiable Framework for Contact-Aware Robot Design
— AK (@ak92501) July 16, 2021
pdf: https://t.co/oV3mjiMLNH
abs: https://t.co/UZfjbSUkNY
project page: https://t.co/wEmZrnZgNY pic.twitter.com/0P5phh8Aqh
18. Recurrent Parameter Generators
Jiayun Wang, Yubei Chen, Stella X. Yu, Brian Cheung, Yann LeCun
We present a generic method for recurrently using the same parameters for many different convolution layers to build a deep network. Specifically, for a network, we create a recurrent parameter generator (RPG), from which the parameters of each convolution layer are generated. Though using recurrent models to build a deep convolutional neural network (CNN) is not entirely new, our method achieves significant performance gain compared to the existing works. We demonstrate how to build a one-layer neural network to achieve similar performance compared to other traditional CNN models on various applications and datasets. Such a method allows us to build an arbitrarily complex neural network with any amount of parameters. For example, we build a ResNet34 with model parameters reduced by more than times, which still achieves ImageNet top-1 accuracy. Furthermore, we demonstrate the RPG can be applied at different scales, such as layers, blocks, or even sub-networks. Specifically, we use the RPG to build a ResNet18 network with the number of weights equivalent to one convolutional layer of a conventional ResNet and show this model can achieve ImageNet top-1 accuracy. The proposed method can be viewed as an inverse approach to model compression. Rather than removing the unused parameters from a large model, it aims to squeeze more information into a small number of parameters. Extensive experiment results are provided to demonstrate the power of the proposed recurrent parameter generator.
Recurrent Parameter Generators
— AK (@ak92501) July 16, 2021
pdf: https://t.co/90MIwn23P5
abs: https://t.co/5lwOjM59Nu
demonstrate how to build a one-layer neural network to achieve similar performance compared to other traditional CNN models on various applications and datasets pic.twitter.com/bMxmu6xbj3
19. FLEX: Unifying Evaluation for Few-Shot NLP
Jonathan Bragg, Arman Cohan, Kyle Lo, Iz Beltagy
Few-shot NLP research is highly active, yet conducted in disjoint research threads with evaluation suites that lack challenging-yet-realistic testing setups and fail to employ careful experimental design. Consequently, the community does not know which techniques perform best or even if they outperform simple baselines. We formulate desiderata for an ideal few-shot NLP benchmark and present FLEX, the first benchmark, public leaderboard, and framework that provides unified, comprehensive measurement for few-shot NLP techniques. FLEX incorporates and introduces new best practices for few-shot evaluation, including measurement of four transfer settings, textual labels for zero-shot evaluation, and a principled approach to benchmark design that optimizes statistical accuracy while keeping evaluation costs accessible to researchers without large compute resources. In addition, we present UniFew, a simple yet strong prompt-based model for few-shot learning which unifies the pretraining and finetuning prompt formats, eschewing complex machinery of recent prompt-based approaches in adapting downstream task formats to language model pretraining objectives. We demonstrate that despite simplicity UniFew achieves results competitive with both popular meta-learning and prompt-based approaches.
FLEX: Unifying Evaluation for Few-Shot NLP
— AK (@ak92501) July 16, 2021
pdf: https://t.co/s4CTpGExQ9
abs: https://t.co/9rxkyOca4Y
benchmark, public leaderboard, and framework that provides unified, comprehensive measurement for few-shot NLP techniques pic.twitter.com/y3F8fhyuxF