Hot Papers 2021-04-13

1. MobileStyleGAN: A Lightweight Convolutional Neural Network for High-Fidelity Image Synthesis

Sergei Belousov

retweets: 11274, favorites: 39 (04/14/2021 11:53:41)
links: abs | pdf
cs.CV | eess.IV

In recent years, the use of Generative Adversarial Networks (GANs) has become very popular in generative image modeling. While style-based GAN architectures yield state-of-the-art results in high-fidelity image synthesis, computationally, they are highly complex. In our work, we focus on the performance optimization of style-based generative models. We analyze the most computationally hard parts of StyleGAN2, and propose changes in the generator network to make it possible to deploy style-based generative networks in the edge devices. We introduce MobileStyleGAN architecture, which has x3.5 fewer parameters and is x9.5 less computationally complex than StyleGAN2, while providing comparable quality.

MobileStyleGAN: A Lightweight Convolutional Neural Network for High-Fidelity Image Synthesis
pdf: https://t.co/SILNLZkZqx
abs: https://t.co/eh0sNbBq2G pic.twitter.com/C7M7LgTujo
— AK (@ak92501) April 13, 2021

2. Voting-based probabilistic consensuses and their applications in distributed ledgers

Serguei Popov, Sebastian Müller

retweets: 6226, favorites: 531 (04/14/2021 11:53:42)
links: abs | pdf
cs.DC

We review probabilistic models known as majority dynamics (also known as threshold Voter Models) and discuss their possible applications for achieving consensus in cryptocurrency systems. In particular, we show that using this approach straightforwardly for practical consensus in Byzantine setting can be problematic and requires extensive further research. We then discuss the FPC consensus protocol which circumvents the problems mentioned above by using external randomness.

In our latest #IOTA paper, we review voting-based #consensus protocols, explain the #magic of #FPC, and why one should always be skeptical. By @mthcl_crypto and @NaitsabesMue. Check it here: https://t.co/9TSdTtuBnh
— IOTA (@iota) April 13, 2021

3. Representation Learning for Networks in Biology and Medicine: Advancements, Challenges, and Opportunities

Michelle M. Li, Kexin Huang, Marinka Zitnik

retweets: 5588, favorites: 347 (04/14/2021 11:53:42)
links: abs | pdf
cs.LG | cs.SI | q-bio.BM | q-bio.GN | q-bio.MN

With the remarkable success of representation learning in providing powerful predictions and data insights, we have witnessed a rapid expansion of representation learning techniques into modeling, analysis, and learning with networks. Biomedical networks are universal descriptors of systems of interacting elements, from protein interactions to disease networks, all the way to healthcare systems and scientific knowledge. In this review, we put forward an observation that long-standing principles of network biology and medicine — while often unspoken in machine learning research — can provide the conceptual grounding for representation learning, explain its current successes and limitations, and inform future advances. We synthesize a spectrum of algorithmic approaches that, at their core, leverage topological features to embed networks into compact vector spaces. We also provide a taxonomy of biomedical areas that are likely to benefit most from algorithmic innovation. Representation learning techniques are becoming essential for identifying causal variants underlying complex traits, disentangling behaviors of single cells and their impact on health, and diagnosing and treating diseases with safe and effective medicines.

Survey on Representation Learning for Networks in Biology and Medicine https://t.co/1Aby9wNMcV
Long-standing principles of biomed nets (often unspoken in ML) provide grounding for representation learning, explain successes & limitations @_michellemli @KexinHuang5 #netbio #GNN #ML pic.twitter.com/Y71r3WK84l
— Marinka Zitnik (@marinkazitnik) April 13, 2021

4. Getting to the Point. Index Sets and Parallelism-Preserving Autodiff for Pointful Array Programming

Adam Paszke, Daniel Johnson, David Duvenaud, Dimitrios Vytiniotis, Alexey Radul, Matthew Johnson, Jonathan Ragan-Kelley, Dougal Maclaurin

retweets: 2400, favorites: 255 (04/14/2021 11:53:42)
links: abs | pdf
cs.PL

We present a novel programming language design that attempts to combine the clarity and safety of high-level functional languages with the efficiency and parallelism of low-level numerical languages. We treat arrays as eagerly-memoized functions on typed index sets, allowing abstract function manipulations, such as currying, to work on arrays. In contrast to composing primitive bulk-array operations, we argue for an explicit nested indexing style that mirrors application of functions to arguments. We also introduce a fine-grained typed effects system which affords concise and automatically-parallelized in-place updates. Specifically, an associative accumulation effect allows reverse-mode automatic differentiation of in-place updates in a way that preserves parallelism. Empirically, we benchmark against the Futhark array programming language, and demonstrate that aggressive inlining and type-driven compilation allows array programs to be written in an expressive, “pointful” style with little performance penalty.

Curious what we think the future of array computing can look like? Check out our new Dex preprint to see how we’re designing a language for safe, expressive parallelism (incl. no shape errors!) and efficient AD. Available at https://t.co/ae4cOQjoZ8
— Adam Paszke (@apaszke) April 13, 2021

5. Escaping the Big Data Paradigm with Compact Transformers

Ali Hassani, Steven Walton, Nikhil Shah, Abulikemu Abuduweili, Jiachen Li, Humphrey Shi

retweets: 1247, favorites: 194 (04/14/2021 11:53:42)
links: abs | pdf
cs.CV | cs.LG

With the rise of Transformers as the standard for language processing, and their advancements in computer vision, along with their unprecedented size and amounts of training data, many have come to believe that they are not suitable for small sets of data. This trend leads to great concerns, including but not limited to: limited availability of data in certain scientific domains and the exclusion of those with limited resource from research in the field. In this paper, we dispel the myth that transformers are “data hungry” and therefore can only be applied to large sets of data. We show for the first time that with the right size and tokenization, transformers can perform head-to-head with state-of-the-art CNNs on small datasets. Our model eliminates the requirement for class token and positional embeddings through a novel sequence pooling strategy and the use of convolutions. We show that compared to CNNs, our compact transformers have fewer parameters and MACs, while obtaining similar accuracies. Our method is flexible in terms of model size, and can have as little as 0.28M parameters and achieve reasonable results. It can reach an accuracy of 94.72% when training from scratch on CIFAR-10, which is comparable with modern CNN based approaches, and a significant improvement over previous Transformer based models. Our simple and compact design democratizes transformers by making them accessible to those equipped with basic computing resources and/or dealing with important small datasets. Our code and pre-trained models will be made publicly available at https://github.com/SHI-Labs/Compact-Transformers.

Escaping the Big Data Paradigm with Compact Transformers
pdf: https://t.co/izACyF4fL3
abs: https://t.co/DxMmNbDwgg
github: https://t.co/0NakpcqZ6G
"that with the right size and tokenization, transformers can perform head-to-head with state-of-the-art CNNs on small datasets" pic.twitter.com/Ync7fo9DjL
— AK (@ak92501) April 13, 2021

Let's make Transformers accessible and bring them into the hands of everyone -- esp for those dealing with limited computing resources and small datasets! Train Compact Transformers on CIFAR-10 in 30 minutes or less with a single GPU! #democratizeAI https://t.co/kEmgA2W7UX pic.twitter.com/060AgwWGHt
— Humphrey Shi (@humphrey_shi) April 13, 2021

6. Understanding Overparameterization in Generative Adversarial Networks

Yogesh Balaji, Mohammadmahdi Sajedi, Neha Mukund Kalibhat, Mucong Ding, Dominik Stöger, Mahdi Soltanolkotabi, Soheil Feizi

retweets: 1122, favorites: 201 (04/14/2021 11:53:43)
links: abs | pdf
cs.LG | stat.ML

A broad class of unsupervised deep learning methods such as Generative Adversarial Networks (GANs) involve training of overparameterized models where the number of parameters of the model exceeds a certain threshold. A large body of work in supervised learning have shown the importance of model overparameterization in the convergence of the gradient descent (GD) to globally optimal solutions. In contrast, the unsupervised setting and GANs in particular involve non-convex concave mini-max optimization problems that are often trained using Gradient Descent/Ascent (GDA). The role and benefits of model overparameterization in the convergence of GDA to a global saddle point in non-convex concave problems is far less understood. In this work, we present a comprehensive analysis of the importance of model overparameterization in GANs both theoretically and empirically. We theoretically show that in an overparameterized GAN model with a $1$ -layer neural network generator and a linear discriminator, GDA converges to a global saddle point of the underlying non-convex concave min-max problem. To the best of our knowledge, this is the first result for global convergence of GDA in such settings. Our theory is based on a more general result that holds for a broader class of nonlinear generators and discriminators that obey certain assumptions (including deeper generators and random feature discriminators). We also empirically study the role of model overparameterization in GANs using several large-scale experiments on CIFAR-10 and Celeb-A datasets. Our experiments show that overparameterization improves the quality of generated samples across various model architectures and datasets. Remarkably, we observe that overparameterization leads to faster and more stable convergence behavior of GDA across the board.

Deep learning often requires solving non-convex concave min-max problems.

In our #ICLR2021 work, we prove (for the first time) the "global" convergence of gradient descent/ascent for a GAN-based min-max problem in an overparameterized regime.

Paper: https://t.co/perPU5Fb3g
👇 pic.twitter.com/c6yBLaa9T7
— Soheil Feizi (@FeiziSoheil) April 13, 2021

7. LocalViT: Bringing Locality to Vision Transformers

Yawei Li, Kai Zhang, Jiezhang Cao, Radu Timofte, Luc Van Gool

retweets: 441, favorites: 133 (04/14/2021 11:53:43)
links: abs | pdf
cs.CV

We study how to introduce locality mechanisms into vision transformers. The transformer network originates from machine translation and is particularly good at modelling long-range dependencies within a long sequence. Although the global interaction between the token embeddings could be well modelled by the self-attention mechanism of transformers, what is lacking a locality mechanism for information exchange within a local region. Yet, locality is essential for images since it pertains to structures like lines, edges, shapes, and even objects. We add locality to vision transformers by introducing depth-wise convolution into the feed-forward network. This seemingly simple solution is inspired by the comparison between feed-forward networks and inverted residual blocks. The importance of locality mechanisms is validated in two ways: 1) A wide range of design choices (activation function, layer placement, expansion ratio) are available for incorporating locality mechanisms and all proper choices can lead to a performance gain over the baseline, and 2) The same locality mechanism is successfully applied to 4 vision transformers, which shows the generalization of the locality concept. In particular, for ImageNet2012 classification, the locality-enhanced transformers outperform the baselines DeiT-T and PVT-T by 2.6% and 3.1% with a negligible increase in the number of parameters and computational effort. Code is available at \url{https://github.com/ofsoundof/LocalViT}.

LocalViT: Bringing Locality to Vision Transformers

Outperforms DeiT-T and PVT-T by 2.6% and 3.1% w/ comparable params and computes by adding locality to vision transformers by introducing depth-wise convs into FFN.

abs: https://t.co/gOFr6QXfPf
code: https://t.co/1kvGZy7pEC pic.twitter.com/BpcxsWrEwT
— Aran Komatsuzaki (@arankomatsuzaki) April 13, 2021

LocalViT: Bringing Locality to Vision Transformers
pdf: https://t.co/UkIpEiaRme
abs: https://t.co/HfSdnR1LNS
"We add locality to vision transformers by introducing
depth-wise convolution into the feed-forward network" pic.twitter.com/rMcw62bOjR
— AK (@ak92501) April 13, 2021

8. Neural RGB-D Surface Reconstruction

Dejan Azinović, Ricardo Martin-Brualla, Dan B Goldman, Matthias Nießner, Justus Thies

retweets: 244, favorites: 95 (04/14/2021 11:53:43)
links: abs | pdf
cs.CV

In this work, we explore how to leverage the success of implicit novel view synthesis methods for surface reconstruction. Methods which learn a neural radiance field have shown amazing image synthesis results, but the underlying geometry representation is only a coarse approximation of the real geometry. We demonstrate how depth measurements can be incorporated into the radiance field formulation to produce more detailed and complete reconstruction results than using methods based on either color or depth data alone. In contrast to a density field as the underlying geometry representation, we propose to learn a deep neural network which stores a truncated signed distance field. Using this representation, we show that one can still leverage differentiable volume rendering to estimate color values of the observed images during training to compute a reconstruction loss. This is beneficial for learning the signed distance field in regions with missing depth measurements. Furthermore, we correct misalignment errors of the camera, improving the overall reconstruction quality. In several experiments, we showcase our method and compare to existing works on classical RGB-D fusion and learned representations.

Neural RGB-D Surface Reconstruction
pdf: https://t.co/LDn6dOSZWB
abs: https://t.co/TR6x4yc2w5
project page: https://t.co/BEES8OVKFE pic.twitter.com/W7g12MvFVY
— AK (@ak92501) April 13, 2021

9. Action-Conditioned 3D Human Motion Synthesis with Transformer VAE

Mathis Petrovich, Michael J. Black, Gül Varol

retweets: 210, favorites: 105 (04/14/2021 11:53:43)
links: abs | pdf
cs.CV

We tackle the problem of action-conditioned generation of realistic and diverse human motion sequences. In contrast to methods that complete, or extend, motion sequences, this task does not require an initial pose or sequence. Here we learn an action-aware latent representation for human motions by training a generative variational autoencoder (VAE). By sampling from this latent space and querying a certain duration through a series of positional encodings, we synthesize variable-length motion sequences conditioned on a categorical action. Specifically, we design a Transformer-based architecture, ACTOR, for encoding and decoding a sequence of parametric SMPL human body models estimated from action recognition datasets. We evaluate our approach on the NTU RGB+D, HumanAct12 and UESTC datasets and show improvements over the state of the art. Furthermore, we present two use cases: improving action recognition through adding our synthesized data to training, and motion denoising. Our code and models will be made available.

Action-Conditioned 3D Human Motion Synthesis with Transformer VAE
pdf: https://t.co/4bG5PHP1wx
abs: https://t.co/2AQdYNrw4s
project page: https://t.co/COHaXkGf4l pic.twitter.com/Ro2cLvhGhn
— AK (@ak92501) April 13, 2021

10. Machine Translation Decoding beyond Beam Search

Rémi Leblond, Jean-Baptiste Alayrac, Laurent Sifre, Miruna Pislar, Jean-Baptiste Lespiau, Ioannis Antonoglou, Karen Simonyan, Oriol Vinyals

retweets: 169, favorites: 80 (04/14/2021 11:53:44)
links: abs | pdf
cs.CL | cs.LG

Beam search is the go-to method for decoding auto-regressive machine translation models. While it yields consistent improvements in terms of BLEU, it is only concerned with finding outputs with high model likelihood, and is thus agnostic to whatever end metric or score practitioners care about. Our aim is to establish whether beam search can be replaced by a more powerful metric-driven search technique. To this end, we explore numerous decoding algorithms, including some which rely on a value function parameterised by a neural network, and report results on a variety of metrics. Notably, we introduce a Monte-Carlo Tree Search (MCTS) based method and showcase its competitiveness. We provide a blueprint for how to use MCTS fruitfully in language applications, which opens promising future directions. We find that which algorithm is best heavily depends on the characteristics of the goal metric; we believe that our extensive experiments and analysis will inform further research in this area.

Machine Translation Decoding beyond Beam Search

Proposes a MCTS-based decoding as an alternative to Beam Search. It performs competitively. https://t.co/QyCyQ398Q2 pic.twitter.com/w8NcL42tDk
— Aran Komatsuzaki (@arankomatsuzaki) April 13, 2021

11. Pixel Codec Avatars

Shugao Ma, Tomas Simon, Jason Saragih, Dawei Wang, Yuecheng Li, Fernando De La Torre, Yaser Sheikh

retweets: 121, favorites: 58 (04/14/2021 11:53:44)
links: abs | pdf
cs.CV

Telecommunication with photorealistic avatars in virtual or augmented reality is a promising path for achieving authentic face-to-face communication in 3D over remote physical distances. In this work, we present the Pixel Codec Avatars (PiCA): a deep generative model of 3D human faces that achieves state of the art reconstruction performance while being computationally efficient and adaptive to the rendering conditions during execution. Our model combines two core ideas: (1) a fully convolutional architecture for decoding spatially varying features, and (2) a rendering-adaptive per-pixel decoder. Both techniques are integrated via a dense surface representation that is learned in a weakly-supervised manner from low-topology mesh tracking over training images. We demonstrate that PiCA improves reconstruction over existing techniques across testing expressions and views on persons of different gender and skin tone. Importantly, we show that the PiCA model is much smaller than the state-of-art baseline model, and makes multi-person telecommunicaiton possible: on a single Oculus Quest 2 mobile VR headset, 5 avatars are rendered in realtime in the same scene.

Pixel Codec Avatars
pdf: https://t.co/esLrBbT8dr
abs: https://t.co/Q5e1F0jei9 pic.twitter.com/7fpqgXNXfH
— AK (@ak92501) April 13, 2021

12. High-performance, Distributed Training of Large-scale Deep Learning Recommendation Models

Dheevatsa Mudigere, Yuchen Hao, Jianyu Huang, Andrew Tulloch, Srinivas Sridharan, Xing Liu, Mustafa Ozdal, Jade Nie, Jongsoo Park, Liang Luo, Jie Amy Yang, Leon Gao, Dmytro Ivchenko, Aarti Basant, Yuxi Hu, Jiyan Yang, Ehsan K. Ardestani, Xiaodong Wang, Rakesh Komuravelli, Ching-Hsiang Chu, Serhat Yilmaz, Huayu Li, Jiyuan Qian, Zhuobo Feng, Yinbin Ma, Junjie Yang, Ellie Wen, Hong Li, Lin Yang, Chonglin Sun, Whitney Zhao, Krishna Dhulipala, KR Kishore, Tyler Graf, Assaf Eisenman, Kiran Kumar Matam, Adi Gangidi, Pallab Bhattacharya, Guoqiang Jerry Chen, Manoj Krishnan, Krishnakumar Nair, Petr Lapukhov, Maxim Naumov, Lin Qiao, Mikhail Smelyanskiy, Bill Jia, Vijay Rao

retweets: 121, favorites: 38 (04/14/2021 11:53:44)
links: abs | pdf
cs.DC | cs.AI | cs.LG | cs.PF

Deep learning recommendation models (DLRMs) are used across many business-critical services at Facebook and are the single largest AI application in terms of infrastructure demand in its data-centers. In this paper we discuss the SW/HW co-designed solution for high-performance distributed training of large-scale DLRMs. We introduce a high-performance scalable software stack based on PyTorch and pair it with the new evolution of Zion platform, namely ZionEX. We demonstrate the capability to train very large DLRMs with up to 12 Trillion parameters and show that we can attain 40X speedup in terms of time to solution over previous systems. We achieve this by (i) designing the ZionEX platform with dedicated scale-out network, provisioned with high bandwidth, optimal topology and efficient transport (ii) implementing an optimized PyTorch-based training stack supporting both model and data parallelism (iii) developing sharding algorithms capable of hierarchical partitioning of the embedding tables along row, column dimensions and load balancing them across multiple workers; (iv) adding high-performance core operators while retaining flexibility to support optimizers with fully deterministic updates (v) leveraging reduced precision communications, multi-level memory hierarchy (HBM+DDR+SSD) and pipelining. Furthermore, we develop and briefly comment on distributed data ingestion and other supporting services that are required for the robust and efficient end-to-end training in production environments.

High-performance, Distributed Training of Large-scale Deep Learning Recommendation Models
pdf: https://t.co/f0miQlgq2s
abs: https://t.co/R3qCT62l17

"We demonstrate the capability to train very large DLRMs with up to 12 Trillion parameters and show that we can attain 40× speedup" pic.twitter.com/PAnLvQ1Fxn
— AK (@ak92501) April 13, 2021

13. Rethinking and Improving the Robustness of Image Style Transfer

Pei Wang, Yijun Li, Nuno Vasconcelos

retweets: 50, favorites: 85 (04/14/2021 11:53:44)
links: abs | pdf
cs.CV | eess.IV

Extensive research in neural style transfer methods has shown that the correlation between features extracted by a pre-trained VGG network has a remarkable ability to capture the visual style of an image. Surprisingly, however, this stylization quality is not robust and often degrades significantly when applied to features from more advanced and lightweight networks, such as those in the ResNet family. By performing extensive experiments with different network architectures, we find that residual connections, which represent the main architectural difference between VGG and ResNet, produce feature maps of small entropy, which are not suitable for style transfer. To improve the robustness of the ResNet architecture, we then propose a simple yet effective solution based on a softmax transformation of the feature activations that enhances their entropy. Experimental results demonstrate that this small magic can greatly improve the quality of stylization results, even for networks with random weights. This suggests that the architecture used for feature extraction is more important than the use of learned weights for the task of style transfer.

Rethinking and Improving the Robustness of Image Style Transfer (CVPR2021)https://t.co/wlaJPurZ7O
Style-transferがVGG系だと(ランダム初期化でもそこそこ)うまくいくのにResNet系だとうまくいかないという，あるある話を真面目に解析して原因を発見するやつ，面白い
— Naoto Inoue (@naoto_inoue_) April 13, 2021

Rethinking and Improving the Robustness of Image Style Transfer
pdf: https://t.co/JvZCMAfWSe
abs: https://t.co/TQztSVesnw pic.twitter.com/LiQ8nHjkz0
— AK (@ak92501) April 13, 2021

14. Meta-tuning Language Models to Answer Prompts Better

Ruiqi Zhong, Kristy Lee, Zheng Zhang, Dan Klein

retweets: 76, favorites: 38 (04/14/2021 11:53:44)
links: abs | pdf
cs.CL | cs.AI

Large pretrained language models like GPT-3 have acquired a surprising ability to perform zero-shot classification (ZSC). For example, to classify review sentiments, we can “prompt” the language model with the review and the question “Is the review positive?” as the context, and ask it to predict whether the next word is “Yes” or “No”. However, these models are not specialized for answering these prompts. To address this weakness, we propose meta-tuning, which trains the model to specialize in answering prompts but still generalize to unseen tasks. To create the training data, we aggregated 43 existing datasets, annotated 441 label descriptions in total, and unified them into the above question answering (QA) format. After meta-tuning, our model outperforms a same-sized QA model for most labels on unseen tasks, and we forecast that the performance would improve for even larger models. Therefore, measuring ZSC performance on non-specialized language models might underestimate their true capability, and community-wide efforts on aggregating datasets and unifying their formats can help build models that understand prompts better.

Meta-tuning Language Models to Answer Prompts Better
pdf: https://t.co/zyugMr2Vkr
abs: https://t.co/lh0pM4PmoB
After metatuning, our model outperforms a same-sized
QA model for most labels on unseen tasks, and
we forecast that the performance would improve for even larger models pic.twitter.com/Gy26aUmvDj
— AK (@ak92501) April 13, 2021

15. Adversarial Open Domain Adaption for Sketch-to-Photo Synthesis

Xiaoyu Xiang, Ding Liu, Xiao Yang, Yiheng Zhu, Xiaohui Shen, Jan P. Allebach

retweets: 72, favorites: 42 (04/14/2021 11:53:44)
links: abs | pdf
cs.CV | cs.AI

In this paper, we explore the open-domain sketch-to-photo translation, which aims to synthesize a realistic photo from a freehand sketch with its class label, even if the sketches of that class are missing in the training data. It is challenging due to the lack of training supervision and the large geometry distortion between the freehand sketch and photo domains. To synthesize the absent freehand sketches from photos, we propose a framework that jointly learns sketch-to-photo and photo-to-sketch generation. However, the generator trained from fake sketches might lead to unsatisfying results when dealing with sketches of missing classes, due to the domain gap between synthesized sketches and real ones. To alleviate this issue, we further propose a simple yet effective open-domain sampling and optimization strategy to “fool” the generator into treating fake sketches as real ones. Our method takes advantage of the learned sketch-to-photo and photo-to-sketch mapping of in-domain data and generalizes them to the open-domain classes. We validate our method on the Scribble and SketchyCOCO datasets. Compared with the recent competing methods, our approach shows impressive results in synthesizing realistic color, texture, and maintaining the geometric composition for various categories of open-domain sketches.

Adversarial Open Domain Adaption for Sketch-to-Photo Synthesis
pdf: https://t.co/1i35S8a9wF
abs: https://t.co/mY5C7GhSba pic.twitter.com/dtLkdASbN9
— AK (@ak92501) April 13, 2021

16. Hausdorff approximations and volume of tubes of singular algebraic sets

Saugata Basu, Antonio Lerario

retweets: 54, favorites: 34 (04/14/2021 11:53:44)
links: abs | pdf
math.AG | math.NA | math.OC | math.PR

We prove bounds for the volume of neighborhoods of algebraic sets, in the euclidean space or the sphere, in terms of the degree of the defining polynomials, the number of variables and the dimension of the algebraic set, without any smoothness assumption. This generalizes previous work of Lotz on smooth complete intersections in the euclidean space and of B”urgisser, Cucker and Lotz on hypersurfaces in the sphere, and gives a complete solution to Problem 17 in the book titled “Condition” by B”urgisser and Cucker.

I like it when the non-standard world (non-archimedean extensions, infinitesimals) meets the standard (integrals, Gauss maps, condition numbers). New preprint on "volume of tubes" with Antonio Lerario. Comments most welcome.https://t.co/obkiAIQBdS pic.twitter.com/I8yqDkdh3t
— Saugata Basu (@SaugataBasu4) April 13, 2021

17. StereoPIFu: Depth Aware Clothed Human Digitization via Stereo Vision

Yang Hong, Juyong Zhang, Boyi Jiang, Yudong Guo, Ligang Liu, Hujun Bao

retweets: 42, favorites: 21 (04/14/2021 11:53:45)
links: abs | pdf
cs.CV

In this paper, we propose StereoPIFu, which integrates the geometric constraints of stereo vision with implicit function representation of PIFu, to recover the 3D shape of the clothed human from a pair of low-cost rectified images. First, we introduce the effective voxel-aligned features from a stereo vision-based network to enable depth-aware reconstruction. Moreover, the novel relative z-offset is employed to associate predicted high-fidelity human depth and occupancy inference, which helps restore fine-level surface details. Second, a network structure that fully utilizes the geometry information from the stereo images is designed to improve the human body reconstruction quality. Consequently, our StereoPIFu can naturally infer the human body’s spatial location in camera space and maintain the correct relative position of different parts of the human body, which enables our method to capture human performance. Compared with previous works, our StereoPIFu significantly improves the robustness, completeness, and accuracy of the clothed human reconstruction, which is demonstrated by extensive experimental results.

StereoPIFu: Depth Aware Clothed Human Digitization via Stereo Vision
pdf: https://t.co/0oR10kCnOt
abs: https://t.co/i4n1pQtHaP pic.twitter.com/zepRzs981j
— AK (@ak92501) April 13, 2021

18. Drafting and Revision: Laplacian Pyramid Network for Fast High-Quality Artistic Style Transfer

Tianwei Lin, Zhuoqi Ma, Fu Li, Dongliang He, Xin Li, Errui Ding, Nannan Wang, Jie Li, Xinbo Gao

retweets: 25, favorites: 32 (04/14/2021 11:53:45)
links: abs | pdf
cs.CV | eess.IV

Artistic style transfer aims at migrating the style from an example image to a content image. Currently, optimization-based methods have achieved great stylization quality, but expensive time cost restricts their practical applications. Meanwhile, feed-forward methods still fail to synthesize complex style, especially when holistic global and local patterns exist. Inspired by the common painting process of drawing a draft and revising the details, we introduce a novel feed-forward method named Laplacian Pyramid Network (LapStyle). LapStyle first transfers global style patterns in low-resolution via a Drafting Network. It then revises the local details in high-resolution via a Revision Network, which hallucinates a residual image according to the draft and the image textures extracted by Laplacian filtering. Higher resolution details can be easily generated by stacking Revision Networks with multiple Laplacian pyramid levels. The final stylized image is obtained by aggregating outputs of all pyramid levels. %We also introduce a patch discriminator to better learn local patterns adversarially. Experiments demonstrate that our method can synthesize high quality stylized images in real time, where holistic style patterns are properly transferred.

Drafting and Revision: Laplacian Pyramid Network for Fast High-Quality Artistic Style Transfer
pdf: https://t.co/MPiiRYbCaf
abs: https://t.co/XoJn3Nnoyh pic.twitter.com/BN60raRklp
— AK (@ak92501) April 13, 2021

19. Joint Universal Syntactic and Semantic Parsing

Elias Stengel-Eskin, Kenton Murray, Sheng Zhang, Aaron Steven White, Benjamin Van Durme

retweets: 30, favorites: 27 (04/14/2021 11:53:45)
links: abs | pdf
cs.CL

While numerous attempts have been made to jointly parse syntax and semantics, high performance in one domain typically comes at the price of performance in the other. This trade-off contradicts the large body of research focusing on the rich interactions at the syntax-semantics interface. We explore multiple model architectures which allow us to exploit the rich syntactic and semantic annotations contained in the Universal Decompositional Semantics (UDS) dataset, jointly parsing Universal Dependencies and UDS to obtain state-of-the-art results in both formalisms. We analyze the behaviour of a joint model of syntax and semantics, finding patterns supported by linguistic theory at the syntax-semantics interface. We then investigate to what degree joint modeling generalizes to a multilingual setting, where we find similar trends across 8 languages.

Actual link to the paper: https://t.co/J2S9idikZf

I’m really excited to finally share this work where our joint model improves both syntactic and semantic parsing! Ever since I saw the preliminary results last year, I’ve been eagerly awaiting being able to share them. https://t.co/FV2jWLRPOW
— Kenton Murray (@kentonmurray) April 13, 2021

20. Fool Me Twice: Entailment from Wikipedia Gamification

Julian Martin Eisenschlos, Bhuwan Dhingra, Jannis Bulian, Benjamin Börschinger, Jordan Boyd-Graber

retweets: 42, favorites: 14 (04/14/2021 11:53:45)
links: abs | pdf
cs.CL

We release FoolMeTwice (FM2 for short), a large dataset of challenging entailment pairs collected through a fun multi-player game. Gamification encourages adversarial examples, drastically lowering the number of examples that can be solved using “shortcuts” compared to other popular entailment datasets. Players are presented with two tasks. The first task asks the player to write a plausible claim based on the evidence from a Wikipedia page. The second one shows two plausible claims written by other players, one of which is false, and the goal is to identify it before the time runs out. Players “pay” to see clues retrieved from the evidence pool: the more evidence the player needs, the harder the claim. Game-play between motivated players leads to diverse strategies for crafting claims, such as temporal inference and diverting to unrelated evidence, and results in higher quality data for the entailment and evidence retrieval tasks. We open source the dataset and the game code.

21. Robust Egocentric Photo-realistic Facial Expression Transfer for Virtual Reality

Amin Jourabloo, Fernando De la Torre, Jason Saragih, Shih-En Wei, Te-Li Wang, Stephen Lombardi, Danielle Belko, Autumn Trimble, Hernan Badino

retweets: 30, favorites: 21 (04/14/2021 11:53:45)
links: abs | pdf
cs.CV

Social presence, the feeling of being there with a real person, will fuel the next generation of communication systems driven by digital humans in virtual reality (VR). The best 3D video-realistic VR avatars that minimize the uncanny effect rely on person-specific (PS) models. However, these PS models are time-consuming to build and are typically trained with limited data variability, which results in poor generalization and robustness. Major sources of variability that affects the accuracy of facial expression transfer algorithms include using different VR headsets (e.g., camera configuration, slop of the headset), facial appearance changes over time (e.g., beard, make-up), and environmental factors (e.g., lighting, backgrounds). This is a major drawback for the scalability of these models in VR. This paper makes progress in overcoming these limitations by proposing an end-to-end multi-identity architecture (MIA) trained with specialized augmentation strategies. MIA drives the shape component of the avatar from three cameras in the VR headset (two eyes, one mouth), in untrained subjects, using minimal personalized information (i.e., neutral 3D mesh shape). Similarly, if the PS texture decoder is available, MIA is able to drive the full avatar (shape+texture) robustly outperforming PS models in challenging scenarios. Our key contribution to improve robustness and generalization, is that our method implicitly decouples, in an unsupervised manner, the facial expression from nuisance factors (e.g., headset, environment, facial appearance). We demonstrate the superior performance and robustness of the proposed method versus state-of-the-art PS approaches in a variety of experiments.

Robust Egocentric Photo-realistic Facial Expression Transfer for Virtual Reality
pdf: https://t.co/FGBR45Ynwb
abs: https://t.co/mpDwM5T5ak pic.twitter.com/yCD6L2bOyI
— AK (@ak92501) April 13, 2021

Published 14 Apr 2021

ML Lead at Beatrust. (https://beatrust.com)Tatsuya Shirakawa on Twitter