Hot Papers 2021-05-14

Tao Yang, Peiran Ren, Xuansong Xie, Lei Zhang

retweets: 5016, favorites: 358 (05/15/2021 16:03:59)
links: abs | pdf
cs.CV

Blind face restoration (BFR) from severely degraded face images in the wild is a very challenging problem. Due to the high illness of the problem and the complex unknown degradation, directly training a deep neural network (DNN) usually cannot lead to acceptable results. Existing generative adversarial network (GAN) based methods can produce better results but tend to generate over-smoothed restorations. In this work, we propose a new method by first learning a GAN for high-quality face image generation and embedding it into a U-shaped DNN as a prior decoder, then fine-tuning the GAN prior embedded DNN with a set of synthesized low-quality face images. The GAN blocks are designed to ensure that the latent code and noise input to the GAN can be respectively generated from the deep and shallow features of the DNN, controlling the global face structure, local face details and background of the reconstructed image. The proposed GAN prior embedded network (GPEN) is easy-to-implement, and it can generate visually photo-realistic results. Our experiments demonstrated that the proposed GPEN achieves significantly superior results to state-of-the-art BFR methods both quantitatively and qualitatively, especially for the restoration of severely degraded face images in the wild. The source code and models can be found at https://github.com/yangxy/GPEN.

GAN Prior Embedded Network for Blind Face Restoration in the Wild
pdf: https://t.co/zOamBeN85A
abs: https://t.co/JW5XPISg8r pic.twitter.com/LVDkF0Yb7u
— AK (@ak92501) May 14, 2021

2. Monetizing Propaganda: How Far-right Extremists Earn Money by Video Streaming

Megan Squire

retweets: 2478, favorites: 161 (05/15/2021 16:03:59)
links: abs | pdf
cs.SI | cs.CY

Video streaming platforms such as Youtube, Twitch, and DLive allow users to live-stream video content for viewers who can optionally express their appreciation through monetary donations. DLive is one of the smaller and lesser-known streaming platforms, and historically has had fewer content moderation practices. It has thus become a popular place for violent extremists and other clandestine groups to earn money and propagandize. What is the financial structure of the DLive streaming ecosystem and how much money is changing hands? In the past it has been difficult to understand how far-right extremists fundraise via podcasts and video streams because of the secretive nature of the activity and because of the difficulty of getting data from social media platforms. This paper describes a novel experiment to collect and analyze data from DLive’s publicly available ledgers of transactions in order to understand the financial structure of the clandestine, extreme far-right video streaming community. The main findings of this paper are, first, that the majority of donors are using micropayments in varying frequencies, but a small handful of donors spend large amounts of money to finance their favorite streamers. Next, the timing of donations to high-profile far-right streamers follows a fairly predictable pattern that is closely tied to a broadcast schedule. Finally, the far-right video streaming financial landscape is divided into separate cliques which exhibit very little crossover in terms of sizable donations. This work will be important to technology companies, policymakers, and researchers who are trying to understand how niche social media services, including video platforms, are being exploited by extremists to propagandize and fundraise.

Finally - here's my accepted paper on the DLive video streaming service and how it's being used by far-right propagandists to earn money (Apr 2020-Jan 2021). Lots of data! Here are some of the largest cash-outs including some post-insurrection refunds https://t.co/FzgR0UqEZq pic.twitter.com/MsE6qH39dN
— Megan Squire (@MeganSquire0) May 14, 2021

3. High-Resolution Complex Scene Synthesis with Transformers

Manuel Jahn, Robin Rombach, Björn Ommer

retweets: 940, favorites: 165 (05/15/2021 16:03:59)
links: abs | pdf
cs.CV

The use of coarse-grained layouts for controllable synthesis of complex scene images via deep generative models has recently gained popularity. However, results of current approaches still fall short of their promise of high-resolution synthesis. We hypothesize that this is mostly due to the highly engineered nature of these approaches which often rely on auxiliary losses and intermediate steps such as mask generators. In this note, we present an orthogonal approach to this task, where the generative model is based on pure likelihood training without additional objectives. To do so, we first optimize a powerful compression model with adversarial training which learns to reconstruct its inputs via a discrete latent bottleneck and thereby effectively strips the latent representation of high-frequency details such as texture. Subsequently, we train an autoregressive transformer model to learn the distribution of the discrete image representations conditioned on a tokenized version of the layouts. Our experiments show that the resulting system is able to synthesize high-quality images consistent with the given layouts. In particular, we improve the state-of-the-art FID score on COCO-Stuff and on Visual Genome by up to 19% and 53% and demonstrate the synthesis of images up to 512 x 512 px on COCO and Open Images.

High-Resolution Complex Scene Synthesis with Transformers
pdf: https://t.co/ERfWD3zaS4
abs: https://t.co/y5gF3PaK7C

state-of-the-art FID score on COCO-Stuff and on Visual Genome by up to 19% and 53% and demonstrate the synthesis of images up to 512×512 px on COCO and Open Images pic.twitter.com/HyhifO7hOm
— AK (@ak92501) May 14, 2021

High-Resolution Complex Scene Synthesis with Transformers

Improves the SotA FID score on COCO-Stuff and on Visual Genome by up to 19% and 53% and demonstrates the synthesis of images up to 512×512 px on COCO and Open Images.https://t.co/OsMQDhgtzy pic.twitter.com/IugiKBxEud
— Aran Komatsuzaki (@arankomatsuzaki) May 14, 2021

4. Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech

Vadim Popov, Ivan Vovk, Vladimir Gogoryan, Tasnima Sadekova, Mikhail Kudinov

retweets: 634, favorites: 207 (05/15/2021 16:04:00)
links: abs | pdf
cs.LG | cs.CL | stat.ML

Recently, denoising diffusion probabilistic models and generative score matching have shown high potential in modelling complex data distributions while stochastic calculus has provided a unified point of view on these techniques allowing for flexible inference schemes. In this paper we introduce Grad-TTS, a novel text-to-speech model with score-based decoder producing mel-spectrograms by gradually transforming noise predicted by encoder and aligned with text input by means of Monotonic Alignment Search. The framework of stochastic differential equations helps us to generalize conventional diffusion probabilistic models to the case of reconstructing data from noise with different parameters and allows to make this reconstruction flexible by explicitly controlling trade-off between sound quality and inference speed. Subjective human evaluation shows that Grad-TTS is competitive with state-of-the-art text-to-speech approaches in terms of Mean Opinion Score. We will make the code publicly available shortly.

Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech

Proposes Grad-TTS, a TTS model with score-based decoder producing mel-spectrograms, which performs competitively with SotA TTS approaches in terms of MOS.

abs: https://t.co/bs0ZWEQnm7
project: https://t.co/C6xQme546l pic.twitter.com/wBN0FgolhT
— Aran Komatsuzaki (@arankomatsuzaki) May 14, 2021

Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech
pdf: https://t.co/wtCmJsIT4H
abs: https://t.co/NkpqPvNW2R
project page: https://t.co/wKM8wtmuUI

acoustic feature generator utilizing the concept of diffusion probabilistic modelling pic.twitter.com/iSsUugYzUv
— AK (@ak92501) May 14, 2021

1/n. Happy to announce that my team presents our new paper “Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech” which has been accepted to ICML 2021! Check it out: https://t.co/jNN5nNknuQ. DEMO: https://t.co/V6I8L9vLKI. The code will also be released shortly.
— Ivan Vovk (@Kartexxx) May 14, 2021

5. Editing Conditional Radiance Fields

Steven Liu, Xiuming Zhang, Zhoutong Zhang, Richard Zhang, Jun-Yan Zhu, Bryan Russell

retweets: 196, favorites: 47 (05/15/2021 16:04:00)
links: abs | pdf
cs.CV | cs.GR | cs.LG

A neural radiance field (NeRF) is a scene model supporting high-quality view synthesis, optimized per scene. In this paper, we explore enabling user editing of a category-level NeRF - also known as a conditional radiance field - trained on a shape category. Specifically, we introduce a method for propagating coarse 2D user scribbles to the 3D space, to modify the color or shape of a local region. First, we propose a conditional radiance field that incorporates new modular network components, including a shape branch that is shared across object instances. Observing multiple instances of the same category, our model learns underlying part semantics without any supervision, thereby allowing the propagation of coarse 2D user scribbles to the entire 3D region (e.g., chair seat). Next, we propose a hybrid network update strategy that targets specific network components, which balances efficiency and accuracy. During user interaction, we formulate an optimization problem that both satisfies the user’s constraints and preserves the original object structure. We demonstrate our approach on various editing tasks over three shape datasets and show that it outperforms prior neural editing approaches. Finally, we edit the appearance and shape of a real photograph and show that the edit propagates to extrapolated novel views.

Editing Conditional Radiance Fields
pdf: https://t.co/rZVPqahBGi
abs: https://t.co/LcHRTGig4A
project page: https://t.co/hs1QAjN9yi
github: https://t.co/JZN7SwvXdE
colab: https://t.co/d1HhP9WRdM pic.twitter.com/bg7ROR655i
— AK (@ak92501) May 14, 2021

6. Connecting What to Say With Where to Look by Modeling Human Attention Traces

Zihang Meng, Licheng Yu, Ning Zhang, Tamara Berg, Babak Damavandi, Vikas Singh, Amy Bearman

retweets: 182, favorites: 47 (05/15/2021 16:04:00)
links: abs | pdf
cs.CV

We introduce a unified framework to jointly model images, text, and human attention traces. Our work is built on top of the recent Localized Narratives annotation framework [30], where each word of a given caption is paired with a mouse trace segment. We propose two novel tasks: (1) predict a trace given an image and caption (i.e., visual grounding), and (2) predict a caption and a trace given only an image. Learning the grounding of each word is challenging, due to noise in the human-provided traces and the presence of words that cannot be meaningfully visually grounded. We present a novel model architecture that is jointly trained on dual tasks (controlled trace generation and controlled caption generation). To evaluate the quality of the generated traces, we propose a local bipartite matching (LBM) distance metric which allows the comparison of two traces of different lengths. Extensive experiments show our model is robust to the imperfect training data and outperforms the baselines by a clear margin. Moreover, we demonstrate that our model pre-trained on the proposed tasks can be also beneficial to the downstream task of COCO’s guided image captioning. Our code and project page are publicly available.

Connecting What to Say With Where to Look by Modeling Human Attention Traces
pdf: https://t.co/nmP9yMtcbK
abs: https://t.co/IExz1ptmxw
github: https://t.co/grRS0Wcqb2

unified framework for modeling vision, language, and human attention traces pic.twitter.com/PU3j7XjXMX
— AK (@ak92501) May 14, 2021

7. 2021 Roadmap on Neuromorphic Computing and Engineering

Dennis V. Christensen, Regina Dittmann, Bernabé Linares-Barranco, Abu Sebastian, Manuel Le Gallo, Andrea Redaelli, Stefan Slesazeck, Thomas Mikolajick, Sabina Spiga, Stephan Menzel, Ilia Valov, Gianluca Milano, Carlo Ricciardi, Shi-Jun Liang, Feng Miao, Mario Lanza, Tyler J. Quill, Scott T. Keene, Alberto Salleo, Julie Grollier, Danijela Marković, Alice Mizrahi, Peng Yao, J. Joshua Yang, Giacomo Indiveri, John Paul Strachan, Suman Datta, Elisa Vianello, Alexandre Valentian, Johannes Feldmann, Xuan Li, Wolfram H.P. Pernice, Harish Bhaskaran, Emre Neftci, Srikanth Ramaswamy, Jonathan Tapson, Franz Scherr, Wolfgang Maass, Priyadarshini Panda, Youngeun Kim, Gouhei Tanaka, Simon Thorpe, Chiara Bartolozzi, Thomas A. Cleland, Christoph Posch, Shih-Chii Liu, Arnab Neelim Mazumder, Morteza Hosseini

retweets: 131, favorites: 52 (05/15/2021 16:04:01)
links: abs | pdf
cs.ET | cond-mat.dis-nn | cond-mat.mtrl-sci

Modern computation based on the von Neumann architecture is today a mature cutting-edge science. In this architecture, processing and memory units are implemented as separate blocks interchanging data intensively and continuously. This data transfer is responsible for a large part of the power consumption. The next generation computer technology is expected to solve problems at the exascale. Even though these future computers will be incredibly powerful, if they are based on von Neumann type architectures, they will consume between 20 and 30 megawatts of power and will not have intrinsic physically built-in capabilities to learn or deal with complex and unstructured data as our brain does. Neuromorphic computing systems are aimed at addressing these needs. The human brain performs about 10^15 calculations per second using 20W and a 1.2L volume. By taking inspiration from biology, new generation computers could have much lower power consumption than conventional processors, could exploit integrated non-volatile memory and logic, and could be explicitly designed to support dynamic learning in the context of complex and unstructured data. Among their potential future applications, business, health care, social security, disease and viruses spreading control might be the most impactful at societal level. This roadmap envisages the potential applications of neuromorphic materials in cutting edge technologies and focuses on the design and fabrication of artificial neural systems. The contents of this roadmap will highlight the interdisciplinary nature of this activity which takes inspiration from biology, physics, mathematics, computer science and engineering. This will provide a roadmap to explore and consolidate new technology behind both present and future applications in many technologically relevant areas.

Too many top authors in the 2021 Roadmap on Neuromorphic Computing and Engineering to tag! 2/2 @AliceMizrahi @giacomoi, @virtualmind, @srikipedia, @jontapson, @franz_scherr, @priyapanda12, @nyalki, @ElDonati, @slytolu, thanks for contributing: https://t.co/sueEJOH9Nu pic.twitter.com/YOWRYooDEj
— Neuromorphic Computing and Engineering (@IOPneuromorphic) May 14, 2021

8. Dynamic View Synthesis from Dynamic Monocular Video

Chen Gao, Ayush Saraf, Johannes Kopf, Jia-Bin Huang

retweets: 110, favorites: 66 (05/15/2021 16:04:01)
links: abs | pdf
cs.CV

We present an algorithm for generating novel views at arbitrary viewpoints and any input time step given a monocular video of a dynamic scene. Our work builds upon recent advances in neural implicit representation and uses continuous and differentiable functions for modeling the time-varying structure and the appearance of the scene. We jointly train a time-invariant static NeRF and a time-varying dynamic NeRF, and learn how to blend the results in an unsupervised manner. However, learning this implicit function from a single video is highly ill-posed (with infinitely many solutions that match the input video). To resolve the ambiguity, we introduce regularization losses to encourage a more physically plausible solution. We show extensive quantitative and qualitative results of dynamic view synthesis from casually captured videos.

Dynamic View Synthesis from Dynamic Monocular Video
pdf: https://t.co/ScrGSD3jU7
abs: https://t.co/QPNL8mWZVn
project page: https://t.co/w5QCp3ighh

generating novel views at arbitrary viewpoints and any input time step given a monocular video of a dynamic scene pic.twitter.com/GIZ4a9cKZc
— AK (@ak92501) May 14, 2021

9. 3D Spatial Recognition without Spatially Labeled 3D

Zhongzheng Ren, Ishan Misra, Alexander G. Schwing, Rohit Girdhar

retweets: 121, favorites: 52 (05/15/2021 16:04:01)
links: abs | pdf
cs.CV | cs.AI | cs.LG | cs.MM

We introduce WyPR, a Weakly-supervised framework for Point cloud Recognition, requiring only scene-level class tags as supervision. WyPR jointly addresses three core 3D recognition tasks: point-level semantic segmentation, 3D proposal generation, and 3D object detection, coupling their predictions through self and cross-task consistency losses. We show that in conjunction with standard multiple-instance learning objectives, WyPR can detect and segment objects in point cloud data without access to any spatial labels at training time. We demonstrate its efficacy using the ScanNet and S3DIS datasets, outperforming prior state of the art on weakly-supervised segmentation by more than 6% mIoU. In addition, we set up the first benchmark for weakly-supervised 3D object detection on both datasets, where WyPR outperforms standard approaches and establishes strong baselines for future work.

3D Spatial Recognition without Spatially Labeled 3D
pdf: https://t.co/uhSbqJhXwY
abs: https://t.co/X8P0u6oStH
project page: https://t.co/6mN8AXvc4F

a novel framework for joint 3D semantic segmentation and object detection, trained using only scene-level class tags as supervision pic.twitter.com/KtuFlKlmoo
— AK (@ak92501) May 14, 2021

10. SyntheticFur dataset for neural rendering

Trung Le, Ryan Poplin, Fred Bertsch, Andeep Singh Toor, Margaret L. Oh

retweets: 90, favorites: 42 (05/15/2021 16:04:01)
links: abs | pdf
cs.LG | cs.CV

We introduce a new dataset called SyntheticFur built specifically for machine learning training. The dataset consists of ray traced synthetic fur renders with corresponding rasterized input buffers and simulation data files. We procedurally generated approximately 140,000 images and 15 simulations with Houdini. The images consist of fur groomed with different skin primitives and move with various motions in a predefined set of lighting environments. We also demonstrated how the dataset could be used with neural rendering to significantly improve fur graphics using inexpensive input buffers by training a conditional generative adversarial network with perceptual loss. We hope the availability of such high fidelity fur renders will encourage new advances with neural rendering for a variety of applications.

SyntheticFur dataset for neural rendering
pdf: https://t.co/ugiJKHmwvz
abs: https://t.co/ylqc7EGVqY
github: https://t.co/XvBIt5sFte

dataset consists of ray traced synthetic fur renders with corresponding rasterized input buffers and simulation data files pic.twitter.com/vY9zi3xEfc
— AK (@ak92501) May 14, 2021

11. Orienting, Framing, Bridging, Magic, and Counseling: How Data Scientists Navigate the Outer Loop of Client Collaborations in Industry and Academia

Sean Kross, Philip J. Guo

retweets: 56, favorites: 34 (05/15/2021 16:04:01)
links: abs | pdf
cs.HC

Data scientists often collaborate with clients to analyze data to meet a client’s needs. What does the end-to-end workflow of a data scientist’s collaboration with clients look like throughout the lifetime of a project? To investigate this question, we interviewed ten data scientists (5 female, 4 male, 1 non-binary) in diverse roles across industry and academia. We discovered that they work with clients in a six-stage outer-loop workflow, which involves 1) laying groundwork by building trust before a project begins, 2) orienting to the constraints of the client’s environment, 3) collaboratively framing the problem, 4) bridging the gap between data science and domain expertise, 5) the inner loop of technical data analysis work, 6) counseling to help clients emotionally cope with analysis results. This novel outer-loop workflow contributes to CSCW by expanding the notion of what collaboration means in data science beyond the widely-known inner-loop technical workflow stages of acquiring, cleaning, analyzing, modeling, and visualizing data. We conclude by discussing the implications of our findings for data science education, parallels to design work, and unmet needs for tool development.

I am so excited to announce that my new paper “Orienting, Framing, Bridging, Magic, and Counseling: How Data Scientists Navigate the Outer Loop of Client Collaborations in Industry and Academia” has been accepted to #CSCW2021! You can read it here: https://t.co/BqNKcn7Kxf pic.twitter.com/qzirhQzf3a
— Sean Kross (@seankross) May 14, 2021

12. The Power of the Weisfeiler-Leman Algorithm for Machine Learning with Graphs

Christopher Morris, Matthias Fey, Nils M. Kriege

retweets: 32, favorites: 52 (05/15/2021 16:04:01)
links: abs | pdf
cs.LG | cs.AI | cs.DS | cs.NE

In recent years, algorithms and neural architectures based on the Weisfeiler-Leman algorithm, a well-known heuristic for the graph isomorphism problem, emerged as a powerful tool for (supervised) machine learning with graphs and relational data. Here, we give a comprehensive overview of the algorithm’s use in a machine learning setting. We discuss the theoretical background, show how to use it for supervised graph- and node classification, discuss recent extensions, and its connection to neural architectures. Moreover, we give an overview of current applications and future directions to stimulate research.

Want to get a high-level overview of the Weisfeiler-Leman algorithm's use in ML and its connection to GNNs? Check out our IJCAI survey track paper: https://t.co/7WCYtXoE7b.

Joint work with @rusty1s (@sfb876) and Nils M. Kriege (@univienna).
— Christopher Morris (@chrsmrrs) May 14, 2021

13. Neural Trajectory Fields for Dynamic Novel View Synthesis

Chaoyang Wang, Ben Eckart, Simon Lucey, Orazio Gallo

retweets: 42, favorites: 22 (05/15/2021 16:04:01)
links: abs | pdf
cs.CV

Recent approaches to render photorealistic views from a limited set of photographs have pushed the boundaries of our interactions with pictures of static scenes. The ability to recreate moments, that is, time-varying sequences, is perhaps an even more interesting scenario, but it remains largely unsolved. We introduce DCT-NeRF, a coordinatebased neural representation for dynamic scenes. DCTNeRF learns smooth and stable trajectories over the input sequence for each point in space. This allows us to enforce consistency between any two frames in the sequence, which results in high quality reconstruction, particularly in dynamic regions.

Neural Trajectory Fields for Dynamic Novel View Synthesis
pdf: https://t.co/3VAaIzMhEX
abs: https://t.co/LAiOgvEG15

coordinate-based neural representation that can render photorealistic novel views of dynamic scenes pic.twitter.com/BJeRZ8uP35
— AK (@ak92501) May 14, 2021

Published 15 May 2021

ML Lead at Beatrust. (https://beatrust.com)Tatsuya Shirakawa on Twitter