Hot Papers 2021-05-28

1. CogView: Mastering Text-to-Image Generation via Transformers

Ming Ding, Zhuoyi Yang, Wenyi Hong, Wendi Zheng, Chang Zhou, Da Yin, Junyang Lin, Xu Zou, Zhou Shao, Hongxia Yang, Jie Tang

retweets: 3826, favorites: 131 (05/29/2021 06:43:32)
links: abs | pdf
cs.CV | cs.LG

Text-to-Image generation in the general domain has long been an open problem, which requires both generative model and cross-modal understanding. We propose CogView, a 4-billion-parameter Transformer with VQ-VAE tokenizer to advance this problem. We also demonstrate the finetuning strategies for various downstream tasks, e.g. style learning, super-resolution, text-image ranking and fashion design, and methods to stabilize pretraining, e.g. eliminating NaN losses. CogView (zero-shot) achieves a new state-of-the-art FID on blurred MS COCO, outperforms previous GAN-based models and a recent similar work DALL-E.

Interesting web demo for this state-of-the-art, text-to-image generative model (https://t.co/kIJUI617x8)

However, when I typed in the phrase "Hong Kong" (香港), the system failed because "the input text has illegal content." 🤔

Their web demo: https://t.co/XBERgMTI73 https://t.co/1jO9BjEUsH pic.twitter.com/Sdvpx9EJwP
— hardmaru (@hardmaru) May 28, 2021

CogView: Mastering Text-to-Image Generation via Transformers
pdf: https://t.co/SOIqFhHleW
abs: https://t.co/nmhinD6kXK
demo: https://t.co/QULIeN628P

a 4-billion-parameter Transformer with VQ-VAE tokenizer, achieves a new SOTA FID on blurred MS COCO pic.twitter.com/hvXheo5cYP
— AK (@ak92501) May 28, 2021

2. PyTouch: A Machine Learning Library for Touch Processing

Mike Lambeta, Huazhe Xu, Jingwei Xu, Po-Wei Chou, Shaoxiong Wang, Trevor Darrell, Roberto Calandra

retweets: 2613, favorites: 307 (05/29/2021 06:43:32)
links: abs | pdf
cs.RO | cs.HC | cs.LG

With the increased availability of rich tactile sensors, there is an equally proportional need for open-source and integrated software capable of efficiently and effectively processing raw touch measurements into high-level signals that can be used for control and decision-making. In this paper, we present PyTouch — the first machine learning library dedicated to the processing of touch sensing signals. PyTouch, is designed to be modular, easy-to-use and provides state-of-the-art touch processing capabilities as a service with the goal of unifying the tactile sensing community by providing a library for building scalable, proven, and performance-validated modules over which applications and research can be built upon. We evaluate PyTouch on real-world data from several tactile sensors on touch processing tasks such as touch detection, slip and object pose estimations. PyTouch is open-sourced at https://github.com/facebookresearch/pytouch .

PyTouch: A Machine Learning Library for Touch Processing
pdf: https://t.co/La8xopY5Ce
abs: https://t.co/ltdASLt91N
github (to be released): https://t.co/CNzlXGcigR

ML library dedicated to the processing of touch sensing signals pic.twitter.com/Ry5uGhsHlQ
— AK (@ak92501) May 28, 2021

3. Stylizing 3D Scene via Implicit Representation and HyperNetwork

Pei-Ze Chiang, Meng-Shiun Tsai, Hung-Yu Tseng, Wei-sheng Lai, Wei-Chen Chiu

retweets: 519, favorites: 126 (05/29/2021 06:43:32)
links: abs | pdf
cs.CV

In this work, we aim to address the 3D scene stylization problem - generating stylized images of the scene at arbitrary novel view angles. A straightforward solution is to combine existing novel view synthesis and image/video style transfer approaches, which often leads to blurry results or inconsistent appearance. Inspired by the high quality results of the neural radiance fields (NeRF) method, we propose a joint framework to directly render novel views with the desired style. Our framework consists of two components: an implicit representation of the 3D scene with the neural radiance field model, and a hypernetwork to transfer the style information into the scene representation. In particular, our implicit representation model disentangles the scene into the geometry and appearance branches, and the hypernetwork learns to predict the parameters of the appearance branch from the reference style image. To alleviate the training difficulties and memory burden, we propose a two-stage training procedure and a patch sub-sampling approach to optimize the style and content losses with the neural radiance field model. After optimization, our model is able to render consistent novel views at arbitrary view angles with arbitrary style. Both quantitative evaluation and human subject study have demonstrated that the proposed method generates faithful stylization results with consistent appearance across different views.

Stylizing 3D Scene via Implicit Representation and HyperNetwork
pdf: https://t.co/lpp503Yfwc
abs: https://t.co/brXkdUViM9
project page: https://t.co/ecYGirmpv4 pic.twitter.com/A6PHgdI5Eb
— AK (@ak92501) May 28, 2021

Stylizing 3D Scene via Implicit Representation and HyperNetwork https://t.co/F3bflWbgfl

Interesting use of hypernetworks and NeRF! @drsrinathsridha check this out#3d #computervision pic.twitter.com/7zOz8JAxC6
— Tomasz Malisiewicz (@quantombone) May 28, 2021

4. Drawing Multiple Augmentation Samples Per Image During Training Efficiently Decreases Test Error

Stanislav Fort, Andrew Brock, Razvan Pascanu, Soham De, Samuel L. Smith

retweets: 450, favorites: 161 (05/29/2021 06:43:33)
links: abs | pdf
cs.LG | cs.CV

In computer vision, it is standard practice to draw a single sample from the data augmentation procedure for each unique image in the mini-batch, however it is not clear whether this choice is optimal for generalization. In this work, we provide a detailed empirical evaluation of how the number of augmentation samples per unique image influences performance on held out data. Remarkably, we find that drawing multiple samples per image consistently enhances the test accuracy achieved for both small and large batch training, despite reducing the number of unique training examples in each mini-batch. This benefit arises even when different augmentation multiplicities perform the same number of parameter updates and gradient evaluations. Our results suggest that, although the variance in the gradient estimate arising from subsampling the dataset has an implicit regularization benefit, the variance which arises from the data augmentation process harms test accuracy. By applying augmentation multiplicity to the recently proposed NFNet model family, we achieve a new ImageNet state of the art of 86.8 $\%$ top-1 w/o extra data.

Drawing Multiple Augmentation Samples Per Image
During Training Efficiently Decreases Test Error

By applying augmentation multiplicity to the recently proposed NFNet model family, they achieved a new ImageNet SotA of 86.8% top-1 w/o extra data.https://t.co/UoEzw922lt pic.twitter.com/XM7FGCg5yF
— Aran Komatsuzaki (@arankomatsuzaki) May 28, 2021

Drawing Multiple Augmentation Samples Per Image
During Training Efficiently Decreases Test Error
pdf: https://t.co/zwS8DT9Km0
abs: https://t.co/8xxFUi2nay

ImageNet SOTA of 86.8% top-1 accuracy after just 34 epochs of training with an NFNet-F5 using the SAM optimizer pic.twitter.com/AhDVnLSef4
— AK (@ak92501) May 28, 2021

5. On the Universality of Graph Neural Networks on Large Random Graphs

Nicolas Keriven, Alberto Bietti, Samuel Vaiter

retweets: 100, favorites: 36 (05/29/2021 06:43:33)
links: abs | pdf
stat.ML | cs.LG

We study the approximation power of Graph Neural Networks (GNNs) on latent position random graphs. In the large graph limit, GNNs are known to converge to certain “continuous” models known as c-GNNs, which directly enables a study of their approximation power on random graph models. In the absence of input node features however, just as GNNs are limited by the Weisfeiler-Lehman isomorphism test, c-GNNs will be severely limited on simple random graph models. For instance, they will fail to distinguish the communities of a well-separated Stochastic Block Model (SBM) with constant degree function. Thus, we consider recently proposed architectures that augment GNNs with unique node identifiers, sometimes referred to as Graph Wavelets Neural Networks (GWNNs). We study the convergence of GWNNs to their continuous counterpart (c-GWNNs) in the large random graph limit, under new conditions on the node identifiers. We then show that c-GWNNs are strictly more powerful than c-GNNs in the continuous limit, and prove their universality on several random graph models of interest, including most SBMs and a large class of random geometric graphs. Our results cover both permutation-invariant and permutation-equivariant architectures.

New preprint!🤓 "On the Universality of GNNs on Large Random Graphs" w/ @albertobietti @vaiter

What can GNNs compute in the continuous limit? Are recent architectures more powerful than normal message-passing ones?https://t.co/YUBaGU3dMy

(1/6) pic.twitter.com/zbTMzsCncW
— Nicolas Keriven (@n_keriven) May 28, 2021

6. Uncertainty-Aware Self-Supervised Target-Mass Grasping of Granular Foods

Kuniyuki Takahashi, Wilson Ko, Avinash Ummadisingu, Shin-ichi Maeda

retweets: 70, favorites: 28 (05/29/2021 06:43:33)
links: abs | pdf
cs.RO

Food packing industry workers typically pick a target amount of food by hand from a food tray and place them in containers. Since menus are diverse and change frequently, robots must adapt and learn to handle new foods in a short time-span. Learning to grasp a specific amount of granular food requires a large training dataset, which is challenging to collect reasonably quickly. In this study, we propose ways to reduce the necessary amount of training data by augmenting a deep neural network with models that estimate its uncertainty through self-supervised learning. To further reduce human effort, we devise a data collection system that automatically generates labels. We build on the idea that we can grasp sufficiently well if there is at least one low-uncertainty (high-confidence) grasp point among the various grasp point candidates. We evaluate the methods we propose in this work on a variety of granular foods — coffee beans, rice, oatmeal and peanuts — each of which has a different size, shape and material properties such as volumetric mass density or friction. For these foods, we show significantly improved grasp accuracy of user-specified target masses using smaller datasets by incorporating uncertainty.

ICRA2021で発表予定の食品の定量把持の論文，動画，解説記事を公開しました．少量のデータセットしか取得できない場合，深層学習は不安定です．自己教師あり学習を使った不確実性を考慮することで安定的な結果を出す方法を提案してます．
論文：https://t.co/Gf1SUGR9GD
動画：https://t.co/SMic5HqcMk https://t.co/Jrp8p7qtbk
— Kuniyuki Takahashi (@kuniyuki_taka) May 28, 2021

7. CoSQA: 20,000+ Web Queries for Code Search and Question Answering

Junjie Huang, Duyu Tang, Linjun Shou, Ming Gong, Ke Xu, Daxin Jiang, Ming Zhou, Nan Duan

retweets: 42, favorites: 27 (05/29/2021 06:43:33)
links: abs | pdf
cs.CL | cs.SE

Finding codes given natural language query isb eneficial to the productivity of software developers. Future progress towards better semantic matching between query and code requires richer supervised training resources. To remedy this, we introduce the CoSQA dataset.It includes 20,604 labels for pairs of natural language queries and codes, each annotated by at least 3 human annotators. We further introduce a contrastive learning method dubbed CoCLR to enhance query-code matching, which works as a data augmenter to bring more artificially generated training instances. We show that evaluated on CodeXGLUE with the same CodeBERT model, training on CoSQA improves the accuracy of code question answering by 5.1%, and incorporating CoCLR brings a further improvement of 10.5%.

CoSQA: 20,000+ Web Queries for Code Search and Question Answering
pdf: https://t.co/NiVRH9kxdo
abs: https://t.co/bDb8WWgQ77 pic.twitter.com/8KYfC0panw
— AK (@ak92501) May 28, 2021

8. Self-Supervised Bug Detection and Repair

Miltiadis Allamanis, Henry Jackson-Flux, Marc Brockschmidt

retweets: 42, favorites: 25 (05/29/2021 06:43:33)
links: abs | pdf
cs.LG | cs.SE

Machine learning-based program analyses have recently shown the promise of integrating formal and probabilistic reasoning towards aiding software development. However, in the absence of large annotated corpora, training these analyses is challenging. Towards addressing this, we present BugLab, an approach for self-supervised learning of bug detection and repair. BugLab co-trains two models: (1) a detector model that learns to detect and repair bugs in code, (2) a selector model that learns to create buggy code for the detector to use as training data. A Python implementation of BugLab improves by up to 30% upon baseline methods on a test dataset of 2374 real-life bugs and finds 19 previously unknown bugs in open-source software.

Self-Supervised Bug Detection and Repair
pdf: https://t.co/CWTMTauPQ6
abs: https://t.co/IwQ0aULKp0

improves by up to 30% upon baseline methods on a test dataset of 2374 real-life bugs and finds 19 previously unknown bugs in open-source software pic.twitter.com/TPCaCrFoJu
— AK (@ak92501) May 28, 2021

9. Corpus-Level Evaluation for Event QA: The IndiaPoliceEvents Corpus Covering the 2002 Gujarat Violence

Andrew Halterman, Katherine A. Keith, Sheikh Muhammad Sarwar, Brendan O’Connor

retweets: 44, favorites: 23 (05/29/2021 06:43:33)
links: abs | pdf
cs.CL

Automated event extraction in social science applications often requires corpus-level evaluations: for example, aggregating text predictions across metadata and unbiased estimates of recall. We combine corpus-level evaluation requirements with a real-world, social science setting and introduce the IndiaPoliceEvents corpus—all 21,391 sentences from 1,257 English-language Times of India articles about events in the state of Gujarat during March 2002. Our trained annotators read and label every document for mentions of police activity events, allowing for unbiased recall evaluations. In contrast to other datasets with structured event representations, we gather annotations by posing natural questions, and evaluate off-the-shelf models for three different tasks: sentence classification, document ranking, and temporal aggregation of target events. We present baseline results from zero-shot BERT-based models fine-tuned on natural language inference and passage retrieval tasks. Our novel corpus-level evaluations and annotation approach can guide creation of similar social-science-oriented resources in the future.

Excited to announce @ahalterman @zzz2aaa @brendan642's and my paper "Corpus-Level Evaluation for Event QA: The IndiaPoliceEvents Corpus Covering the 2002 Gujarat Violence" is to be published in Findings of ACL 2021! https://t.co/aAR3KEJ7tE 1/6 pic.twitter.com/evKw2Qqiyf
— Katie Keith (@katakeith) May 28, 2021

10. Beyond Algorithmic Bias: A Socio-Computational Interrogation of the Google Search by Image Algorithm

Orestis Papakyriakopoulos, Arwa Michelle Mboya

retweets: 30, favorites: 24 (05/29/2021 06:43:33)
links: abs | pdf
cs.CY

We perform a socio-computational interrogation of the google search by image algorithm, a main component of the google search engine. We audit the algorithm by presenting it with more than 40 thousands faces of all ages and more than four races and collecting and analyzing the assigned labels with the appropriate statistical tools. We find that the algorithm reproduces white male patriarchal structures, often simplifying, stereotyping and discriminating females and non-white individuals, while providing more diverse and positive descriptions of white men. By drawing from Bourdieu’s theory of cultural reproduction, we link these results to the attitudes of the algorithm’s designers, owners, and the dataset the algorithm was trained on. We further underpin the problematic nature of the algorithm by using the ethnographic practice of studying-up: We show how the algorithm places individuals at the top of the tech industry within the socio-cultural reality that they shaped, many times creating biased representations of them. We claim that the use of social-theoretic frameworks such as the above are able to contribute to improved algorithmic accountability, algorithmic impact assessment and provide additional and more critical depth in algorithmic bias and auditing studies. Based on the analysis, we discuss the scientific and design implications and provide suggestions for alternative ways to design just socioalgorithmic systems.

🌟proud to share the preprint co-authored with great @RuMboya. We audited a part of the @google images search engine, and illustrate how the engine reproduces the problematic culture of white patriarchy. https://t.co/2FDCBfOlif #bias #aiethics
— Orestis Papakyriakopoulos (@SciOrestis) May 28, 2021

11. A Computational Model of the Institutional Analysis and Development Framework

Nieves Montes

retweets: 36, favorites: 16 (05/29/2021 06:43:34)
links: abs | pdf
cs.AI | econ.TH

The Institutional Analysis and Development (IAD) framework is a conceptual toolbox put forward by Elinor Ostrom and colleagues in an effort to identify and delineate the universal common variables that structure the immense variety of human interactions. The framework identifies rules as one of the core concepts to determine the structure of interactions, and acknowledges their potential to steer a community towards more beneficial and socially desirable outcomes. This work presents the first attempt to turn the IAD framework into a computational model to allow communities of agents to formally perform what-if analysis on a given rule configuration. To do so, we define the Action Situation Language — or ASL — whose syntax is hgighly tailored to the components of the IAD framework and that we use to write descriptions of social interactions. ASL is complemented by a game engine that generates its semantics as an extensive-form game. These models, then, can be analyzed with the standard tools of game theory to predict which outcomes are being most incentivized, and evaluated according to their socially relevant properties.

Published 29 May 2021

ML Lead at Beatrust. (https://beatrust.com)Tatsuya Shirakawa on Twitter