Hot Papers 2021-02-05

1. Understanding the Capabilities, Limitations, and Societal Impact of Large Language Models

Alex Tamkin, Miles Brundage, Jack Clark, Deep Ganguli

retweets: 5190, favorites: 100 (02/07/2021 12:58:11)
links: abs | pdf
cs.CL | cs.LG

On October 14th, 2020, researchers from OpenAI, the Stanford Institute for Human-Centered Artificial Intelligence, and other universities convened to discuss open research questions surrounding GPT-3, the largest publicly-disclosed dense language model at the time. The meeting took place under Chatham House Rules. Discussants came from a variety of research backgrounds including computer science, linguistics, philosophy, political science, communications, cyber policy, and more. Broadly, the discussion centered around two main questions: 1) What are the technical capabilities and limitations of large language models? 2) What are the societal effects of widespread use of large language models? Here, we provide a detailed summary of the discussion organized by the two themes above.

2020年10月14日、OpenAIを含む様々な研究者たちが集まり、多様なタスクを解ける言語モデル「GPT-3」について議論した。その時の詳細が公開された。読み物として面白かった。
主な議論は
①大規模言語モデルの能力と制限は何か
②大規模言語モデルの普及による社会的影響は何かhttps://t.co/tmUF9LevnF
— 小猫遊りょう（たかにゃし・りょう） (@jaguring1) February 5, 2021

A few months back, OpenAI and Stanford' Institute for Human-Centered AI co-hosted a workshop on the capabilities/limitations/societal implications of large language models like GPT-3.

The proceedings can be found here: https://t.co/7rLNa619sg
— Miles Brundage (@Miles_Brundage) February 5, 2021

2. Unifying Vision-and-Language Tasks via Text Generation

Jaemin Cho, Jie Lei, Hao Tan, Mohit Bansal

retweets: 2206, favorites: 323 (02/07/2021 12:58:11)
links: abs | pdf
cs.CL | cs.AI | cs.CV | cs.LG

Existing methods for vision-and-language learning typically require designing task-specific architectures and objectives for each task. For example, a multi-label answer classifier for visual question answering, a region scorer for referring expression comprehension, and a language decoder for image captioning, etc. To alleviate these hassles, in this work, we propose a unified framework that learns different tasks in a single architecture with the same language modeling objective, i.e., multimodal conditional text generation, where our models learn to generate labels in text based on the visual and textual inputs. On 7 popular vision-and-language benchmarks, including visual question answering, referring expression comprehension, visual commonsense reasoning, most of which have been previously modeled as discriminative tasks, our generative approach (with a single unified architecture) reaches comparable performance to recent task-specific state-of-the-art vision-and-language models. Moreover, our generative approach shows better generalization ability on answering questions that have rare answers. In addition, we show that our framework allows multi-task learning in a single architecture with a single set of parameters, which achieves similar performance to separately optimized single-task models. Our code will be publicly available at: https://github.com/j-min/VL-T5

Presenting our new V+L pretraining work: “Unifying Vision-and-Language Tasks via Text Generation”,
a single unified generative framework (VL-T5 / VL-BART) for diverse multimodal tasks!

Arxiv: https://t.co/QWLWgRIhp7

Work done w/ @jayleicn @HaoTan5 @mohitban47 (@uncnlp)

🧵1/n pic.twitter.com/SLweqOWbI2
— Jaemin Cho (@jmin__cho) February 5, 2021

視覚理解能力を持つように事前学習済み言語モデル「T5」と「BART」を拡張する研究。「VL-T5」と「VL-BART」と名付けている。生成的アプローチ（単一の構造）で7つの視覚+言語タスクで最高性能に匹敵するとのこと（VQA, GQA, VCR, NLVR2, RefCOCOg, COCO caption, Multi30k）https://t.co/HjcdJaswQm
— 小猫遊りょう（たかにゃし・りょう） (@jaguring1) February 6, 2021

Unifying Vision-and-Language Tasks via Text Generation
pdf: https://t.co/6HQ52pyIT0
abs: https://t.co/OqYYbT1laJ pic.twitter.com/fiepBmlqZJ
— AK (@ak92501) February 5, 2021

3. Regenerating Soft Robots through Neural Cellular Automata

Kazuya Horibe, Kathryn Walker, Sebastian Risi

retweets: 818, favorites: 146 (02/07/2021 12:58:12)
links: abs | pdf
cs.NE | cs.RO | q-bio.PE

Morphological regeneration is an important feature that highlights the environmental adaptive capacity of biological systems. Lack of this regenerative capacity significantly limits the resilience of machines and the environments they can operate in. To aid in addressing this gap, we develop an approach for simulated soft robots to regrow parts of their morphology when being damaged. Although numerical simulations using soft robots have played an important role in their design, evolving soft robots with regenerative capabilities have so far received comparable little attention. Here we propose a model for soft robots that regenerate through a neural cellular automata. Importantly, this approach only relies on local cell information to regrow damaged components, opening interesting possibilities for physical regenerable soft robots in the future. Our approach allows simulated soft robots that are damaged to partially regenerate their original morphology through local cell interactions alone and regain some of their ability to locomote. These results take a step towards equipping artificial systems with regenerative capacities and could potentially allow for more robust operations in a variety of situations and environments. The code for the experiments in this paper is available at: \url{github.com/KazuyaHoribe/RegeneratingSoftRobots}.

We are happy to present "Regenerating Soft Robots through Neural Cellular Automata" w/ @khoribe3, @katt_walker

The approach allows soft robots to regrow parts of their morphology when being damaged only based on local cell communication.

PDF: https://t.co/WnQ64kb03z pic.twitter.com/DzItkC4Qz1
— Sebastian Risi (@risi1979) February 6, 2021

4. Sovereign Smartphone: To Enjoy Freedom We Have to Control Our Phones

Friederike Groschupp, Moritz Schneider, Ivan Puddu, Shweta Shinde, Srdjan Capkun

retweets: 841, favorites: 79 (02/07/2021 12:58:12)
links: abs | pdf
cs.CR

The majority of smartphones either run iOS or Android operating systems. This has created two distinct ecosystems largely controlled by Apple and Google - they dictate which applications can run, how they run, and what kind of phone resources they can access. Barring some exceptions in Android where different phone manufacturers may have influence, users, developers, and governments are left with little to no choice. Specifically, users need to entrust their security and privacy to OS vendors and accept the functionality constraints they impose. Given the wide use of Android and iOS, immediately leaving these ecosystems is not practical, except in niche application areas. In this work, we draw attention to the magnitude of this problem and why it is an undesirable situation. As an alternative, we advocate the development of a new smartphone architecture that securely transfers the control back to the users while maintaining compatibility with the rich existing smartphone ecosystems. We propose and analyze one such design based on advances in trusted execution environments for ARM and RISC-V.

Too much control over smartphones is in hands of few companies. Gatekeeping limits developers, users, governments. A different phone architecture can hand control back to the users while still protecting existing ecosystems. With @shw3ta_shinde @dn0sar. https://t.co/Yzcps6UPFG pic.twitter.com/oNX5PNsUeX
— Srdjan Čapkun (@SrdjanCapkun) February 5, 2021

5. Im2Vec: Synthesizing Vector Graphics without Vector Supervision

Pradyumna Reddy, Michael Gharbi, Michal Lukac, Niloy J. Mitra

retweets: 703, favorites: 135 (02/07/2021 12:58:12)
links: abs | pdf
cs.CV | cs.GR

Vector graphics are widely used to represent fonts, logos, digital artworks, and graphic designs. But, while a vast body of work has focused on generative algorithms for raster images, only a handful of options exists for vector graphics. One can always rasterize the input graphic and resort to image-based generative approaches, but this negates the advantages of the vector representation. The current alternative is to use specialized models that require explicit supervision on the vector graphics representation at training time. This is not ideal because large-scale high quality vector-graphics datasets are difficult to obtain. Furthermore, the vector representation for a given design is not unique, so models that supervise on the vector representation are unnecessarily constrained. Instead, we propose a new neural network that can generate complex vector graphics with varying topologies, and only requires indirect supervision from readily-available raster training images (i.e., with no vector counterparts). To enable this, we use a differentiable rasterization pipeline that renders the generated vector shapes and composites them together onto a raster canvas. We demonstrate our method on a range of datasets, and provide comparison with state-of-the-art SVG-VAE and DeepSVG, both of which require explicit vector graphics supervision. Finally, we also demonstrate our approach on the MNIST dataset, for which no groundtruth vector representation is available. Source code, datasets, and more results are available at http://geometry.cs.ucl.ac.uk/projects/2020/Im2Vec/

Im2Vec: Synthesizing Vector Graphics without Vector Supervision
pdf: https://t.co/ywASUSzvuv
abs: https://t.co/9ByeBDRNH7 pic.twitter.com/q2DwZ6Fxf9
— AK (@ak92501) February 5, 2021

6. Designing an Encoder for StyleGAN Image Manipulation

Omer Tov, Yuval Alaluf, Yotam Nitzan, Or Patashnik, Daniel Cohen-Or

retweets: 256, favorites: 76 (02/07/2021 12:58:13)
links: abs | pdf
cs.CV

Recently, there has been a surge of diverse methods for performing image editing by employing pre-trained unconditional generators. Applying these methods on real images, however, remains a challenge, as it necessarily requires the inversion of the images into their latent space. To successfully invert a real image, one needs to find a latent code that reconstructs the input image accurately, and more importantly, allows for its meaningful manipulation. In this paper, we carefully study the latent space of StyleGAN, the state-of-the-art unconditional generator. We identify and analyze the existence of a distortion-editability tradeoff and a distortion-perception tradeoff within the StyleGAN latent space. We then suggest two principles for designing encoders in a manner that allows one to control the proximity of the inversions to regions that StyleGAN was originally trained on. We present an encoder based on our two principles that is specifically designed for facilitating editing on real images by balancing these tradeoffs. By evaluating its performance qualitatively and quantitatively on numerous challenging domains, including cars and horses, we show that our inversion method, followed by common editing techniques, achieves superior real-image editing quality, with only a small reconstruction accuracy drop.

Designing an Encoder for StyleGAN Image Manipulation
pdf: https://t.co/6A33wc9xxn
abs: https://t.co/zNv4LrvY7r pic.twitter.com/GTicRuDFpQ
— AK (@ak92501) February 5, 2021

7. Sampling in Combinatorial Spaces with SurVAE Flow Augmented MCMC

Priyank Jaini, Didrik Nielsen, Max Welling

retweets: 130, favorites: 46 (02/07/2021 12:58:13)
links: abs | pdf
cs.LG

Hybrid Monte Carlo is a powerful Markov Chain Monte Carlo method for sampling from complex continuous distributions. However, a major limitation of HMC is its inability to be applied to discrete domains due to the lack of gradient signal. In this work, we introduce a new approach based on augmenting Monte Carlo methods with SurVAE Flows to sample from discrete distributions using a combination of neural transport methods like normalizing flows and variational dequantization, and the Metropolis-Hastings rule. Our method first learns a continuous embedding of the discrete space using a surjective map and subsequently learns a bijective transformation from the continuous space to an approximately Gaussian distributed latent variable. Sampling proceeds by simulating MCMC chains in the latent space and mapping these samples to the target discrete space via the learned transformations. We demonstrate the efficacy of our algorithm on a range of examples from statistics, computational physics and machine learning, and observe improvements compared to alternative algorithms.

Excited to present our new work “Sampling in Combinatorial Spaces with SurVAE Flow augmented MCMC” with @nielsen_didrik and @wellingmax to appear @aistats_conf. #AISTATS2021

Paper: https://t.co/ALC1QLj9Wg
Code: [coming soon]

1/6
— Priyank Jaini (@priyankjaini) February 5, 2021

8. RoI Tanh-polar Transformer Network for Face Parsing in the Wild

Yiming Lin, Jie Shen, Yujiang Wang, Maja Pantic

retweets: 72, favorites: 34 (02/07/2021 12:58:13)
links: abs | pdf
cs.CV

Face parsing aims to predict pixel-wise labels for facial components of a target face in an image. Existing approaches usually crop the target face from the input image with respect to a bounding box calculated during pre-processing, and thus can only parse inner facial Regions of Interest (RoIs). Peripheral regions like hair are ignored and nearby faces that are partially included in the bounding box can cause distractions. Moreover, these methods are only trained and evaluated on near-frontal portrait images and thus their performance for in-the-wild cases were unexplored. To address these issues, this paper makes three contributions. First, we introduce iBugMask dataset for face parsing in the wild containing 1,000 manually annotated images with large variations in sizes, poses, expressions and background, and Helen-LP, a large-pose training set containing 21,866 images generated using head pose augmentation. Second, we propose RoI Tanh-polar transform that warps the whole image to a Tanh-polar representation with a fixed ratio between the face area and the context, guided by the target bounding box. The new representation contains all information in the original image, and allows for rotation equivariance in the convolutional neural networks (CNNs). Third, we propose a hybrid residual representation learning block, coined HybridBlock, that contains convolutional layers in both the Tanh-polar space and the Tanh-Cartesian space, allowing for receptive fields of different shapes in CNNs. Through extensive experiments, we show that the proposed method significantly improves the state-of-the-art for face parsing in the wild.

RoI Tanh-polar Transformer Network for Face Parsing in the
Wild
pdf: https://t.co/pikSR1qJdJ
abs: https://t.co/c3jtkmmLhh pic.twitter.com/gJV8mVcbxK
— AK (@ak92501) February 5, 2021

9. Only a Matter of Style: Age Transformation Using a Style-Based Regression Model

Yuval Alaluf, Or Patashnik, Daniel Cohen-Or

retweets: 63, favorites: 32 (02/07/2021 12:58:13)
links: abs | pdf
cs.CV

The task of age transformation illustrates the change of an individual’s appearance over time. Accurately modeling this complex transformation over an input facial image is extremely challenging as it requires making convincing and possibly large changes to facial features and head shape, while still preserving the input identity. In this work, we present an image-to-image translation method that learns to directly encode real facial images into the latent space of a pre-trained unconditional GAN (e.g., StyleGAN) subject to a given aging shift. We employ a pre-trained age regression network used to explicitly guide the encoder in generating the latent codes corresponding to the desired age. In this formulation, our method approaches the continuous aging process as a regression task between the input age and desired target age, providing fine-grained control over the generated image. Moreover, unlike other approaches that operate solely in the latent space using a prior on the path controlling age, our method learns a more disentangled, non-linear path. Finally, we demonstrate that the end-to-end nature of our approach, coupled with the rich semantic latent space of StyleGAN, allows for further editing of the generated images. Qualitative and quantitative evaluations show the advantages of our method compared to state-of-the-art approaches.

Only a Matter of Style: Age Transformation Using a Style-Based Regression Model
pdf: https://t.co/dNVUN4mXCP
abs: https://t.co/pDGmHGa5u7
github: https://t.co/fyQwuOzVgW pic.twitter.com/2SBB9G3Tqv
— AK (@ak92501) February 5, 2021

10. Adaptive Semiparametric Language Models

Dani Yogatama, Cyprien de Masson d’Autume, Lingpeng Kong

retweets: 66, favorites: 27 (02/07/2021 12:58:13)
links: abs | pdf
cs.CL

We present a language model that combines a large parametric neural network (i.e., a transformer) with a non-parametric episodic memory component in an integrated architecture. Our model uses extended short-term context by caching local hidden states — similar to transformer-XL — and global long-term memory by retrieving a set of nearest neighbor tokens at each timestep. We design a gating function to adaptively combine multiple information sources to make a prediction. This mechanism allows the model to use either local context, short-term memory, or long-term memory (or any combination of them) on an ad hoc basis depending on the context. Experiments on word-based and character-based language modeling datasets demonstrate the efficacy of our proposed method compared to strong baselines.

Adaptive Semiparametric Language Models
pdf: https://t.co/n8OVzEU9YC
abs: https://t.co/A5qhcN1c9V pic.twitter.com/46faweWQ1r
— AK (@ak92501) February 5, 2021

11. MUFASA: Multimodal Fusion Architecture Search for Electronic Health Records

Zhen Xu, David R. So, Andrew M. Dai

retweets: 49, favorites: 31 (02/07/2021 12:58:13)
links: abs | pdf
cs.LG | cs.AI | cs.CL

One important challenge of applying deep learning to electronic health records (EHR) is the complexity of their multimodal structure. EHR usually contains a mixture of structured (codes) and unstructured (free-text) data with sparse and irregular longitudinal features — all of which doctors utilize when making decisions. In the deep learning regime, determining how different modality representations should be fused together is a difficult problem, which is often addressed by handcrafted modeling and intuition. In this work, we extend state-of-the-art neural architecture search (NAS) methods and propose MUltimodal Fusion Architecture SeArch (MUFASA) to simultaneously search across multimodal fusion strategies and modality-specific architectures for the first time. We demonstrate empirically that our MUFASA method outperforms established unimodal NAS on public EHR data with comparable computation costs. In addition, MUFASA produces architectures that outperform Transformer and Evolved Transformer. Compared with these baselines on CCS diagnosis code prediction, our discovered models improve top-5 recall from 0.88 to 0.91 and demonstrate the ability to generalize to other EHR tasks. Studying our top architecture in depth, we provide empirical evidence that MUFASA’s improvements are derived from its ability to both customize modeling for each data modality and find effective fusion strategies.

MUFASA: Multimodal Fusion Architecture Search for Electronic Health Records
pdf: https://t.co/70O9mgpW9k
abs: https://t.co/oKU75SARUo pic.twitter.com/n5v4LoBdnA
— AK (@ak92501) February 5, 2021

12. A formalization of Dedekind domains and class groups of global fields

Anne Baanen, Sander R. Dahmen, Ashvni Narayanan, Filippo A. E. Nuccio Mortarino Majno di Capriglio

retweets: 42, favorites: 34 (02/07/2021 12:58:13)
links: abs | pdf
cs.LO | math.NT

Dedekind domains and their class groups are notions in commutative algebra that are essential in algebraic number theory. We formalized these structures and several fundamental properties, including number theoretic finiteness results for class groups, in the Lean prover as part of the mathlib mathematical library. This paper describes the formalization process, noting the idioms we found useful in our development and mathlib’s decentralized collaboration processes involved in this project.

I say (angrily) "30+ years of computer proof verifiers and still no undergraduate maths curriculum". Sometimes I hear "oh come on, it must have been done by now in Coq+Lean+Isabelle/HOL". My standard counterexample: finiteness of class group. But today: https://t.co/cFFu1aEQEX
— The Xena Project (@XenaProject) February 5, 2021

13. MeInGame: Create a Game Character Face from a Single Portrait

Jiangke Lin, Yi Yuan, Zhengxia Zou

retweets: 14, favorites: 37 (02/07/2021 12:58:14)
links: abs | pdf
cs.CV | cs.AI

Many deep learning based 3D face reconstruction methods have been proposed recently, however, few of them have applications in games. Current game character customization systems either require players to manually adjust considerable face attributes to obtain the desired face, or have limited freedom of facial shape and texture. In this paper, we propose an automatic character face creation method that predicts both facial shape and texture from a single portrait, and it can be integrated into most existing 3D games. Although 3D Morphable Face Model (3DMM) based methods can restore accurate 3D faces from single images, the topology of 3DMM mesh is different from the meshes used in most games. To acquire fidelity texture, existing methods require a large amount of face texture data for training, while building such datasets is time-consuming and laborious. Besides, such a dataset collected under laboratory conditions may not generalized well to in-the-wild situations. To tackle these problems, we propose 1) a low-cost facial texture acquisition method, 2) a shape transfer algorithm that can transform the shape of a 3DMM mesh to games, and 3) a new pipeline for training 3D game face reconstruction networks. The proposed method not only can produce detailed and vivid game characters similar to the input portrait, but can also eliminate the influence of lighting and occlusions. Experiments show that our method outperforms state-of-the-art methods used in games.

MeInGame: Create a Game Character Face from a Single Portrait
pdf: https://t.co/tyro9CXAco
abs: https://t.co/mmg5jFKwY0 pic.twitter.com/cjN106WKoa
— AK (@ak92501) February 5, 2021

Published 7 Feb 2021

ML Lead at Beatrust. (https://beatrust.com)Tatsuya Shirakawa on Twitter