All Articles

Hot Papers 2021-05-28

1. CogView: Mastering Text-to-Image Generation via Transformers

Ming Ding, Zhuoyi Yang, Wenyi Hong, Wendi Zheng, Chang Zhou, Da Yin, Junyang Lin, Xu Zou, Zhou Shao, Hongxia Yang, Jie Tang

  • retweets: 3826, favorites: 131 (05/29/2021 06:43:32)
  • links: abs | pdf
  • cs.CV | cs.LG

Text-to-Image generation in the general domain has long been an open problem, which requires both generative model and cross-modal understanding. We propose CogView, a 4-billion-parameter Transformer with VQ-VAE tokenizer to advance this problem. We also demonstrate the finetuning strategies for various downstream tasks, e.g. style learning, super-resolution, text-image ranking and fashion design, and methods to stabilize pretraining, e.g. eliminating NaN losses. CogView (zero-shot) achieves a new state-of-the-art FID on blurred MS COCO, outperforms previous GAN-based models and a recent similar work DALL-E.

2. PyTouch: A Machine Learning Library for Touch Processing

Mike Lambeta, Huazhe Xu, Jingwei Xu, Po-Wei Chou, Shaoxiong Wang, Trevor Darrell, Roberto Calandra

With the increased availability of rich tactile sensors, there is an equally proportional need for open-source and integrated software capable of efficiently and effectively processing raw touch measurements into high-level signals that can be used for control and decision-making. In this paper, we present PyTouch — the first machine learning library dedicated to the processing of touch sensing signals. PyTouch, is designed to be modular, easy-to-use and provides state-of-the-art touch processing capabilities as a service with the goal of unifying the tactile sensing community by providing a library for building scalable, proven, and performance-validated modules over which applications and research can be built upon. We evaluate PyTouch on real-world data from several tactile sensors on touch processing tasks such as touch detection, slip and object pose estimations. PyTouch is open-sourced at https://github.com/facebookresearch/pytouch .

3. Stylizing 3D Scene via Implicit Representation and HyperNetwork

Pei-Ze Chiang, Meng-Shiun Tsai, Hung-Yu Tseng, Wei-sheng Lai, Wei-Chen Chiu

  • retweets: 519, favorites: 126 (05/29/2021 06:43:32)
  • links: abs | pdf
  • cs.CV

In this work, we aim to address the 3D scene stylization problem - generating stylized images of the scene at arbitrary novel view angles. A straightforward solution is to combine existing novel view synthesis and image/video style transfer approaches, which often leads to blurry results or inconsistent appearance. Inspired by the high quality results of the neural radiance fields (NeRF) method, we propose a joint framework to directly render novel views with the desired style. Our framework consists of two components: an implicit representation of the 3D scene with the neural radiance field model, and a hypernetwork to transfer the style information into the scene representation. In particular, our implicit representation model disentangles the scene into the geometry and appearance branches, and the hypernetwork learns to predict the parameters of the appearance branch from the reference style image. To alleviate the training difficulties and memory burden, we propose a two-stage training procedure and a patch sub-sampling approach to optimize the style and content losses with the neural radiance field model. After optimization, our model is able to render consistent novel views at arbitrary view angles with arbitrary style. Both quantitative evaluation and human subject study have demonstrated that the proposed method generates faithful stylization results with consistent appearance across different views.

4. Drawing Multiple Augmentation Samples Per Image During Training Efficiently Decreases Test Error

Stanislav Fort, Andrew Brock, Razvan Pascanu, Soham De, Samuel L. Smith

  • retweets: 450, favorites: 161 (05/29/2021 06:43:33)
  • links: abs | pdf
  • cs.LG | cs.CV

In computer vision, it is standard practice to draw a single sample from the data augmentation procedure for each unique image in the mini-batch, however it is not clear whether this choice is optimal for generalization. In this work, we provide a detailed empirical evaluation of how the number of augmentation samples per unique image influences performance on held out data. Remarkably, we find that drawing multiple samples per image consistently enhances the test accuracy achieved for both small and large batch training, despite reducing the number of unique training examples in each mini-batch. This benefit arises even when different augmentation multiplicities perform the same number of parameter updates and gradient evaluations. Our results suggest that, although the variance in the gradient estimate arising from subsampling the dataset has an implicit regularization benefit, the variance which arises from the data augmentation process harms test accuracy. By applying augmentation multiplicity to the recently proposed NFNet model family, we achieve a new ImageNet state of the art of 86.8%\% top-1 w/o extra data.

5. On the Universality of Graph Neural Networks on Large Random Graphs

Nicolas Keriven, Alberto Bietti, Samuel Vaiter

We study the approximation power of Graph Neural Networks (GNNs) on latent position random graphs. In the large graph limit, GNNs are known to converge to certain “continuous” models known as c-GNNs, which directly enables a study of their approximation power on random graph models. In the absence of input node features however, just as GNNs are limited by the Weisfeiler-Lehman isomorphism test, c-GNNs will be severely limited on simple random graph models. For instance, they will fail to distinguish the communities of a well-separated Stochastic Block Model (SBM) with constant degree function. Thus, we consider recently proposed architectures that augment GNNs with unique node identifiers, sometimes referred to as Graph Wavelets Neural Networks (GWNNs). We study the convergence of GWNNs to their continuous counterpart (c-GWNNs) in the large random graph limit, under new conditions on the node identifiers. We then show that c-GWNNs are strictly more powerful than c-GNNs in the continuous limit, and prove their universality on several random graph models of interest, including most SBMs and a large class of random geometric graphs. Our results cover both permutation-invariant and permutation-equivariant architectures.

6. Uncertainty-Aware Self-Supervised Target-Mass Grasping of Granular Foods

Kuniyuki Takahashi, Wilson Ko, Avinash Ummadisingu, Shin-ichi Maeda

  • retweets: 70, favorites: 28 (05/29/2021 06:43:33)
  • links: abs | pdf
  • cs.RO

Food packing industry workers typically pick a target amount of food by hand from a food tray and place them in containers. Since menus are diverse and change frequently, robots must adapt and learn to handle new foods in a short time-span. Learning to grasp a specific amount of granular food requires a large training dataset, which is challenging to collect reasonably quickly. In this study, we propose ways to reduce the necessary amount of training data by augmenting a deep neural network with models that estimate its uncertainty through self-supervised learning. To further reduce human effort, we devise a data collection system that automatically generates labels. We build on the idea that we can grasp sufficiently well if there is at least one low-uncertainty (high-confidence) grasp point among the various grasp point candidates. We evaluate the methods we propose in this work on a variety of granular foods — coffee beans, rice, oatmeal and peanuts — each of which has a different size, shape and material properties such as volumetric mass density or friction. For these foods, we show significantly improved grasp accuracy of user-specified target masses using smaller datasets by incorporating uncertainty.

7. CoSQA: 20,000+ Web Queries for Code Search and Question Answering

Junjie Huang, Duyu Tang, Linjun Shou, Ming Gong, Ke Xu, Daxin Jiang, Ming Zhou, Nan Duan

  • retweets: 42, favorites: 27 (05/29/2021 06:43:33)
  • links: abs | pdf
  • cs.CL | cs.SE

Finding codes given natural language query isb eneficial to the productivity of software developers. Future progress towards better semantic matching between query and code requires richer supervised training resources. To remedy this, we introduce the CoSQA dataset.It includes 20,604 labels for pairs of natural language queries and codes, each annotated by at least 3 human annotators. We further introduce a contrastive learning method dubbed CoCLR to enhance query-code matching, which works as a data augmenter to bring more artificially generated training instances. We show that evaluated on CodeXGLUE with the same CodeBERT model, training on CoSQA improves the accuracy of code question answering by 5.1%, and incorporating CoCLR brings a further improvement of 10.5%.

8. Self-Supervised Bug Detection and Repair

Miltiadis Allamanis, Henry Jackson-Flux, Marc Brockschmidt

  • retweets: 42, favorites: 25 (05/29/2021 06:43:33)
  • links: abs | pdf
  • cs.LG | cs.SE

Machine learning-based program analyses have recently shown the promise of integrating formal and probabilistic reasoning towards aiding software development. However, in the absence of large annotated corpora, training these analyses is challenging. Towards addressing this, we present BugLab, an approach for self-supervised learning of bug detection and repair. BugLab co-trains two models: (1) a detector model that learns to detect and repair bugs in code, (2) a selector model that learns to create buggy code for the detector to use as training data. A Python implementation of BugLab improves by up to 30% upon baseline methods on a test dataset of 2374 real-life bugs and finds 19 previously unknown bugs in open-source software.

9. Corpus-Level Evaluation for Event QA: The IndiaPoliceEvents Corpus Covering the 2002 Gujarat Violence

Andrew Halterman, Katherine A. Keith, Sheikh Muhammad Sarwar, Brendan O’Connor

  • retweets: 44, favorites: 23 (05/29/2021 06:43:33)
  • links: abs | pdf
  • cs.CL

Automated event extraction in social science applications often requires corpus-level evaluations: for example, aggregating text predictions across metadata and unbiased estimates of recall. We combine corpus-level evaluation requirements with a real-world, social science setting and introduce the IndiaPoliceEvents corpus—all 21,391 sentences from 1,257 English-language Times of India articles about events in the state of Gujarat during March 2002. Our trained annotators read and label every document for mentions of police activity events, allowing for unbiased recall evaluations. In contrast to other datasets with structured event representations, we gather annotations by posing natural questions, and evaluate off-the-shelf models for three different tasks: sentence classification, document ranking, and temporal aggregation of target events. We present baseline results from zero-shot BERT-based models fine-tuned on natural language inference and passage retrieval tasks. Our novel corpus-level evaluations and annotation approach can guide creation of similar social-science-oriented resources in the future.

10. Beyond Algorithmic Bias: A Socio-Computational Interrogation of the Google Search by Image Algorithm

Orestis Papakyriakopoulos, Arwa Michelle Mboya

  • retweets: 30, favorites: 24 (05/29/2021 06:43:33)
  • links: abs | pdf
  • cs.CY

We perform a socio-computational interrogation of the google search by image algorithm, a main component of the google search engine. We audit the algorithm by presenting it with more than 40 thousands faces of all ages and more than four races and collecting and analyzing the assigned labels with the appropriate statistical tools. We find that the algorithm reproduces white male patriarchal structures, often simplifying, stereotyping and discriminating females and non-white individuals, while providing more diverse and positive descriptions of white men. By drawing from Bourdieu’s theory of cultural reproduction, we link these results to the attitudes of the algorithm’s designers, owners, and the dataset the algorithm was trained on. We further underpin the problematic nature of the algorithm by using the ethnographic practice of studying-up: We show how the algorithm places individuals at the top of the tech industry within the socio-cultural reality that they shaped, many times creating biased representations of them. We claim that the use of social-theoretic frameworks such as the above are able to contribute to improved algorithmic accountability, algorithmic impact assessment and provide additional and more critical depth in algorithmic bias and auditing studies. Based on the analysis, we discuss the scientific and design implications and provide suggestions for alternative ways to design just socioalgorithmic systems.

11. A Computational Model of the Institutional Analysis and Development Framework

Nieves Montes

The Institutional Analysis and Development (IAD) framework is a conceptual toolbox put forward by Elinor Ostrom and colleagues in an effort to identify and delineate the universal common variables that structure the immense variety of human interactions. The framework identifies rules as one of the core concepts to determine the structure of interactions, and acknowledges their potential to steer a community towards more beneficial and socially desirable outcomes. This work presents the first attempt to turn the IAD framework into a computational model to allow communities of agents to formally perform what-if analysis on a given rule configuration. To do so, we define the Action Situation Language — or ASL — whose syntax is hgighly tailored to the components of the IAD framework and that we use to write descriptions of social interactions. ASL is complemented by a game engine that generates its semantics as an extensive-form game. These models, then, can be analyzed with the standard tools of game theory to predict which outcomes are being most incentivized, and evaluated according to their socially relevant properties.