Hot Papers 2021-08-09

1. Mitigating dataset harms requires stewardship: Lessons from 1000 papers

Kenny Peng, Arunesh Mathur, Arvind Narayanan

retweets: 14264, favorites: 5 (08/10/2021 10:15:07)
links: abs | pdf
cs.LG | cs.CY

Concerns about privacy, bias, and harmful applications have shone a light on the ethics of machine learning datasets, even leading to the retraction of prominent datasets including DukeMTMC, MS-Celeb-1M, TinyImages, and VGGFace2. In response, the machine learning community has called for higher ethical standards, transparency efforts, and technical fixes in the dataset creation process. The premise of our work is that these efforts can be more effective if informed by an understanding of how datasets are used in practice in the research community. We study three influential face and person recognition datasets - DukeMTMC, MS-Celeb-1M, and Labeled Faces in the Wild (LFW) - by analyzing nearly 1000 papers that cite them. We found that the creation of derivative datasets and models, broader technological and social change, the lack of clarity of licenses, and dataset management practices can introduce a wide range of ethical concerns. We conclude by suggesting a distributed approach that can mitigate these harms, making recommendations to dataset creators, conference program committees, dataset users, and the broader research community.

To better understand the ethics of machine learning datasets, we picked three controversial face recognition / person recognition datasets—DukeMTMC, MS-Celeb-1M, and Labeled Faces in the Wild—and analyzed ~1,000 papers that cite them.
Paper: https://t.co/FZRO4eB1Tt

Thread ⬇️
— Arvind Narayanan (@random_walker) August 9, 2021

2. Lights, Camera, Action! A Framework to Improve NLP Accuracy over OCR documents

Amit Gupte, Alexey Romanov, Sahitya Mantravadi, Dalitso Banda, Jianjie Liu, Raza Khan, Lakshmanan Ramu Meenal, Benjamin Han, Soundar Srinivasan

retweets: 806, favorites: 132 (08/10/2021 10:15:09)
links: abs | pdf
cs.CL | cs.LG

Document digitization is essential for the digital transformation of our societies, yet a crucial step in the process, Optical Character Recognition (OCR), is still not perfect. Even commercial OCR systems can produce questionable output depending on the fidelity of the scanned documents. In this paper, we demonstrate an effective framework for mitigating OCR errors for any downstream NLP task, using Named Entity Recognition (NER) as an example. We first address the data scarcity problem for model training by constructing a document synthesis pipeline, generating realistic but degraded data with NER labels. We measure the NER accuracy drop at various degradation levels and show that a text restoration model, trained on the degraded data, significantly closes the NER accuracy gaps caused by OCR errors, including on an out-of-domain dataset. For the benefit of the community, we have made the document synthesis pipeline available as an open-source project.

Lights, Camera, Action! A Framework to Improve NLP Accuracy over OCR documents
pdf: https://t.co/EpMUYVdm2N
abs: https://t.co/1zapqvnnxS
github: https://t.co/toOaWBOJG3 pic.twitter.com/OlFjIDisjl
— AK (@ak92501) August 9, 2021

3. Detailed Avatar Recovery from Single Image

Hao Zhu, Xinxin Zuo, Haotian Yang, Sen Wang, Xun Cao, Ruigang Yang

retweets: 399, favorites: 111 (08/10/2021 10:15:10)
links: abs | pdf
cs.CV

This paper presents a novel framework to recover \emph{detailed} avatar from a single image. It is a challenging task due to factors such as variations in human shapes, body poses, texture, and viewpoints. Prior methods typically attempt to recover the human body shape using a parametric-based template that lacks the surface details. As such resulting body shape appears to be without clothing. In this paper, we propose a novel learning-based framework that combines the robustness of the parametric model with the flexibility of free-form 3D deformation. We use the deep neural networks to refine the 3D shape in a Hierarchical Mesh Deformation (HMD) framework, utilizing the constraints from body joints, silhouettes, and per-pixel shading information. Our method can restore detailed human body shapes with complete textures beyond skinned models. Experiments demonstrate that our method has outperformed previous state-of-the-art approaches, achieving better accuracy in terms of both 2D IoU number and 3D metric distance.

Detailed Avatar Recovery from Single Image
pdf: https://t.co/Q1eOSXZQ0z
abs: https://t.co/5W5QkWLPGs pic.twitter.com/LynCUbxy2v
— AK (@ak92501) August 9, 2021

4. Disentangled Lifespan Face Synthesis

Sen He, Wentong Liao, Michael Ying Yang, Yi-Zhe Song, Bodo Rosenhahn, Tao Xiang

retweets: 360, favorites: 93 (08/10/2021 10:15:10)
links: abs | pdf
cs.CV

A lifespan face synthesis (LFS) model aims to generate a set of photo-realistic face images of a person’s whole life, given only one snapshot as reference. The generated face image given a target age code is expected to be age-sensitive reflected by bio-plausible transformations of shape and texture, while being identity preserving. This is extremely challenging because the shape and texture characteristics of a face undergo separate and highly nonlinear transformations w.r.t. age. Most recent LFS models are based on generative adversarial networks (GANs) whereby age code conditional transformations are applied to a latent face representation. They benefit greatly from the recent advancements of GANs. However, without explicitly disentangling their latent representations into the texture, shape and identity factors, they are fundamentally limited in modeling the nonlinear age-related transformation on texture and shape whilst preserving identity. In this work, a novel LFS model is proposed to disentangle the key face characteristics including shape, texture and identity so that the unique shape and texture age transformations can be modeled effectively. This is achieved by extracting shape, texture and identity features separately from an encoder. Critically, two transformation modules, one conditional convolution based and the other channel attention based, are designed for modeling the nonlinear shape and texture feature transformations respectively. This is to accommodate their rather distinct aging processes and ensure that our synthesized images are both age-sensitive and identity preserving. Extensive experiments show that our LFS model is clearly superior to the state-of-the-art alternatives. Codes and demo are available on our project website: \url{https://senhe.github.io/projects/iccv_2021_lifespan_face}.

Disentangled Lifespan Face Synthesis
pdf: https://t.co/OUt80SYrQB
abs: https://t.co/s845g8ouuw
project page: https://t.co/JWG827NCIG pic.twitter.com/mpJZMe6dEL
— AK (@ak92501) August 9, 2021

Yulin Li, Yuxi Qian, Yuchen Yu, Xiameng Qin, Chengquan Zhang, Yan Liu, Kun Yao, Junyu Han, Jingtuo Liu, Errui Ding

retweets: 323, favorites: 84 (08/10/2021 10:15:10)
links: abs | pdf
cs.CV | cs.CL

Structured text understanding on Visually Rich Documents (VRDs) is a crucial part of Document Intelligence. Due to the complexity of content and layout in VRDs, structured text understanding has been a challenging task. Most existing studies decoupled this problem into two sub-tasks: entity labeling and entity linking, which require an entire understanding of the context of documents at both token and segment levels. However, little work has been concerned with the solutions that efficiently extract the structured data from different levels. This paper proposes a unified framework named StrucTexT, which is flexible and effective for handling both sub-tasks. Specifically, based on the transformer, we introduce a segment-token aligned encoder to deal with the entity labeling and entity linking tasks at different levels of granularity. Moreover, we design a novel pre-training strategy with three self-supervised tasks to learn a richer representation. StrucTexT uses the existing Masked Visual Language Modeling task and the new Sentence Length Prediction and Paired Boxes Direction tasks to incorporate the multi-modal information across text, image, and layout. We evaluate our method for structured text understanding at segment-level and token-level and show it outperforms the state-of-the-art counterparts with significantly superior performance on the FUNSD, SROIE, and EPHOIE datasets.

StrucTexT: Structured Text Understanding with Multi-Modal Transformers
pdf: https://t.co/ZfjytV8PrU
abs: https://t.co/kgWEBdrL05

outperforms the sota counterparts with significantly superior performance on the FUNSD, SROIE, and EPHOIE datasets pic.twitter.com/PcDW3TwuNM
— AK (@ak92501) August 9, 2021

6. Is it Fake? News Disinformation Detection on South African News Websites

Harm de Wet, Vukosi Marivate

retweets: 214, favorites: 64 (08/10/2021 10:15:10)
links: abs | pdf
cs.CL | cs.CY | cs.LG

Disinformation through fake news is an ongoing problem in our society and has become easily spread through social media. The most cost and time effective way to filter these large amounts of data is to use a combination of human and technical interventions to identify it. From a technical perspective, Natural Language Processing (NLP) is widely used in detecting fake news. Social media companies use NLP techniques to identify the fake news and warn their users, but fake news may still slip through undetected. It is especially a problem in more localised contexts (outside the United States of America). How do we adjust fake news detection systems to work better for local contexts such as in South Africa. In this work we investigate fake news detection on South African websites. We curate a dataset of South African fake news and then train detection models. We contrast this with using widely available fake news datasets (from mostly USA website). We also explore making the datasets more diverse by combining them and observe the differences in behaviour in writing between nations’ fake news using interpretable machine learning.

Our pre-print looking at Machine Learning models to identity South African disinformation websites is out. This is a step in a journey. The paper has a dataset that will be available to all.https://t.co/MLh4XcV1z0

Thanks @MediaMattersZA for HAM data 😁

Cc @zilevandamme https://t.co/ebE68okmDd
— Dr. Vukosi Marivate (@vukosi) August 9, 2021

7. ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models

Jooyoung Choi, Sungwon Kim, Yonghyun Jeong, Youngjune Gwon, Sungroh Yoon

retweets: 169, favorites: 49 (08/10/2021 10:15:10)
links: abs | pdf
cs.CV

Denoising diffusion probabilistic models (DDPM) have shown remarkable performance in unconditional image generation. However, due to the stochasticity of the generative process in DDPM, it is challenging to generate images with the desired semantics. In this work, we propose Iterative Latent Variable Refinement (ILVR), a method to guide the generative process in DDPM to generate high-quality images based on a given reference image. Here, the refinement of the generative process in DDPM enables a single DDPM to sample images from various sets directed by the reference image. The proposed ILVR method generates high-quality images while controlling the generation. The controllability of our method allows adaptation of a single DDPM without any additional learning in various image generation tasks, such as generation from various downsampling factors, multi-domain image translation, paint-to-image, and editing with scribbles.

ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models
pdf: https://t.co/vLNCxTUyb0
abs: https://t.co/jTguWcP1Uq

propose Iterative Latent Variable Refinement, guides the generative process in DDPM to generate high-quality
images based on a given reference image pic.twitter.com/K1X0xRVYo7
— AK (@ak92501) August 9, 2021

8. Quantum Topological Data Analysis with Linear Depth and Exponential Speedup

Shashanka Ubaru, Ismail Yunus Akhalwaya, Mark S. Squillante, Kenneth L. Clarkson, Lior Horesh

retweets: 82, favorites: 33 (08/10/2021 10:15:10)
links: abs | pdf
quant-ph | cs.LG | math.NA

Quantum computing offers the potential of exponential speedups for certain classical computations. Over the last decade, many quantum machine learning (QML) algorithms have been proposed as candidates for such exponential improvements. However, two issues unravel the hope of exponential speedup for some of these QML algorithms: the data-loading problem and, more recently, the stunning dequantization results of Tang et al. A third issue, namely the fault-tolerance requirements of most QML algorithms, has further hindered their practical realization. The quantum topological data analysis (QTDA) algorithm of Lloyd, Garnerone and Zanardi was one of the first QML algorithms that convincingly offered an expected exponential speedup. From the outset, it did not suffer from the data-loading problem. A recent result has also shown that the generalized problem solved by this algorithm is likely classically intractable, and would therefore be immune to any dequantization efforts. However, the QTDA algorithm of Lloyd et~al. has a time complexity of $O(n^4/(\epsilon^2 \delta))$ (where $n$ is the number of data points, $\epsilon$ is the error tolerance, and $\delta$ is the smallest nonzero eigenvalue of the restricted Laplacian) and requires fault-tolerant quantum computing, which has not yet been achieved. In this paper, we completely overhaul the QTDA algorithm to achieve an improved exponential speedup and depth complexity of $O(n\log(1/(\delta\epsilon)))$ . Our approach includes three key innovations: (a) an efficient realization of the combinatorial Laplacian as a sum of Pauli operators; (b) a quantum rejection sampling approach to restrict the superposition to the simplices in the complex; and (c) a stochastic rank estimation method to estimate the Betti numbers. We present a theoretical error analysis, and the circuit and computational time and depth complexities for Betti number estimation.

IBM researchers have developed the first quantum #MachineLearning algorithm with linear-depth complexity and provable exponential speedup on arbitrary input, possibly opening the doors to the first generically useful NISQ algorithm with quantum advantage.https://t.co/E4vIWZQLgr pic.twitter.com/LChJ8W7zY5
— Underfox (@Underfox3) August 9, 2021

9. Evaluating CLIP: Towards Characterization of Broader Capabilities and Downstream Implications

Sandhini Agarwal, Gretchen Krueger, Jack Clark, Alec Radford, Jong Wook Kim, Miles Brundage

retweets: 56, favorites: 46 (08/10/2021 10:15:10)
links: abs | pdf
cs.CV | cs.AI | cs.CY

Recently, there have been breakthroughs in computer vision (“CV”) models that are more generalizable with the advent of models such as CLIP and ALIGN. In this paper, we analyze CLIP and highlight some of the challenges such models pose. CLIP reduces the need for task specific training data, potentially opening up many niche tasks to automation. CLIP also allows its users to flexibly specify image classification classes in natural language, which we find can shift how biases manifest. Additionally, through some preliminary probes we find that CLIP can inherit biases found in prior computer vision systems. Given the wide and unpredictable domain of uses for such models, this raises questions regarding what sufficiently safe behaviour for such systems may look like. These results add evidence to the growing body of work calling for a change in the notion of a ‘better’ model—to move beyond simply looking at higher accuracy at task-oriented capability evaluations, and towards a broader ‘better’ that takes into account deployment-critical features such as different use contexts, and people who interact with the model when thinking about model deployment.

Evaluating CLIP: Towards Characterization of Broader Capabilities and Downstream Implications
pdf: https://t.co/Yha7QLdBwA
abs: https://t.co/IcYvnc7E6W

analyze CLIP and highlight some of the challenges
such models pose pic.twitter.com/3d0iFi7Ehc
— AK (@ak92501) August 9, 2021

10. Higher-order motif analysis in hypergraphs

Quintino Francesco Lotito, Federico Musciotto, Alberto Montresor, Federico Battiston

retweets: 22, favorites: 34 (08/10/2021 10:15:11)
links: abs | pdf
physics.soc-ph | cs.DS | cs.SI | stat.ME

A deluge of new data on social, technological and biological networked systems suggests that a large number of interactions among system units are not limited to pairs, but rather involve a higher number of nodes. To properly encode such higher-order interactions, richer mathematical frameworks such as hypergraphs are needed, where hyperlinks describe connections among an arbitrary number of nodes. Here we introduce the concept of higher-order motifs, small connected subgraphs where vertices may be linked by interactions of any order. We provide lower and upper bounds on the number of higher-order motifs as a function of the motif size, and propose an efficient algorithm to extract complete higher-order motif profiles from empirical data. We identify different families of hypergraphs, characterized by distinct higher-order connectivity patterns at the local scale. We also capture evidences of structural reinforcement, a mechanism that associates higher strengths of higher-order interactions for the nodes that interact more at the pairwise level. Our work highlights the informative power of higher-order motifs, providing a first way to extract higher-order fingerprints in hypergraphs at the network microscale.

Interested in extracting the building blocks of higher-order networks?

We have a new work on the arXiv on 'Higher-order motif analysis in hypergraphs'https://t.co/i2e4zMPmC2
w/ @FraLotito @musci8 A. Montresor@dnds_ceu @ceu pic.twitter.com/BEcJLExsQq
— Federico Battiston (@fede7j) August 9, 2021

Published 10 Aug 2021

ML Lead at Beatrust. (https://beatrust.com)Tatsuya Shirakawa on Twitter