1. Closed-Form Factorization of Latent Semantics in GANs
Yujun Shen, Bolei Zhou
A rich set of semantic attributes has been shown to emerge in the latent space of the Generative Adversarial Networks (GANs) trained for synthesizing images. In order to identify such latent semantics for image manipulation, previous methods annotate a collection of synthesized samples and then train supervised classifiers in the latent space. However, they require a clear definition of the target attribute as well as the corresponding manual annotations, severely limiting their applications in practice. In this work, we examine the internal representation learned by GANs to reveal the underlying variation factors in an unsupervised manner. By studying the essential role of the fully-connected layer that takes the latent code into the generator of GANs, we propose a general closed-form factorization method for latent semantic discovery. The properties of the identified semantics are further analyzed both theoretically and empirically. With its fast and efficient implementation, our approach is capable of not only finding latent semantics as accurately as the state-of-the-art supervised methods, but also resulting in far more versatile semantic classes across multiple GAN models trained on a wide range of datasets.
Closed-Form Factorization of Latent Semantics in GANs
— AK (@ak92501) July 15, 2020
pdf: https://t.co/pYda24esEJ
abs: https://t.co/TTx4gwm5Xl pic.twitter.com/7YuVWnuquW
2. Multitask Learning Strengthens Adversarial Robustness
Chengzhi Mao, Amogh Gupta, Vikram Nitin, Baishakhi Ray, Shuran Song, Junfeng Yang, Carl Vondrick
Although deep networks achieve strong accuracy on a range of computer vision benchmarks, they remain vulnerable to adversarial attacks, where imperceptible input perturbations fool the network. We present both theoretical and empirical analyses that connect the adversarial robustness of a model to the number of tasks that it is trained on. Experiments on two datasets show that attack difficulty increases as the number of target tasks increase. Moreover, our results suggest that when models are trained on multiple tasks at once, they become more robust to adversarial attacks on individual tasks. While adversarial defense remains an open challenge, our results suggest that deep networks are vulnerable partly because they are trained on too few tasks.
What causes adversarial examples? Latest #ECCV2020 paper from @ChengzhiM and Amogh shows that deep networks are vulnerable partly because they are trained on too few tasks. Just by increasing tasks, we strengthen robustness for each task individually. https://t.co/zQzpS9pWtE pic.twitter.com/B5LoWkrHgg
— Carl Vondrick (@cvondrick) July 15, 2020
3. Transposer: Universal Texture Synthesis Using Feature Maps as Transposed Convolution Filter
Guilin Liu, Rohan Taori, Ting-Chun Wang, Zhiding Yu, Shiqiu Liu, Fitsum A. Reda, Karan Sapra, Andrew Tao, Bryan Catanzaro
Conventional CNNs for texture synthesis consist of a sequence of (de)-convolution and up/down-sampling layers, where each layer operates locally and lacks the ability to capture the long-term structural dependency required by texture synthesis. Thus, they often simply enlarge the input texture, rather than perform reasonable synthesis. As a compromise, many recent methods sacrifice generalizability by training and testing on the same single (or fixed set of) texture image(s), resulting in huge re-training time costs for unseen images. In this work, based on the discovery that the assembling/stitching operation in traditional texture synthesis is analogous to a transposed convolution operation, we propose a novel way of using transposed convolution operation. Specifically, we directly treat the whole encoded feature map of the input texture as transposed convolution filters and the features’ self-similarity map, which captures the auto-correlation information, as input to the transposed convolution. Such a design allows our framework, once trained, to be generalizable to perform synthesis of unseen textures with a single forward pass in nearly real-time. Our method achieves state-of-the-art texture synthesis quality based on various metrics. While self-similarity helps preserve the input textures’ regular structural patterns, our framework can also take random noise maps for irregular input textures instead of self-similarity maps as transposed convolution inputs. It allows to get more diverse results as well as generate arbitrarily large texture outputs by directly sampling large noise maps in a single pass as well.
📢 Transposer: Universal Texture Synthesis Using Feature Maps as Transposed Convolution Filter📢
— Guilin Liu (@GuilinL) July 15, 2020
Video: https://t.co/cKjbJyeH6f
Paper: https://t.co/rzwoVWBwB5
We propose a generalizable framework that can perform texture synthesis for unseen texture images in nearly real-time.
Transposer: Universal Texture Synthesis Using Feature Maps as Transposed Convolution Filter
— AK (@ak92501) July 15, 2020
pdf: https://t.co/mqfh1CbqPI
abs: https://t.co/J1ZW0wBSpA pic.twitter.com/I0MQB1Zhkj
4. Alleviating Over-segmentation Errors by Detecting Action Boundaries
Yuchi Ishikawa, Seito Kasai, Yoshimitsu Aoki, Hirokatsu Kataoka
We propose an effective framework for the temporal action segmentation task, namely an Action Segment Refinement Framework (ASRF). Our model architecture consists of a long-term feature extractor and two branches: the Action Segmentation Branch (ASB) and the Boundary Regression Branch (BRB). The long-term feature extractor provides shared features for the two branches with a wide temporal receptive field. The ASB classifies video frames with action classes, while the BRB regresses the action boundary probabilities. The action boundaries predicted by the BRB refine the output from the ASB, which results in a significant performance improvement. Our contributions are three-fold: (i) We propose a framework for temporal action segmentation, the ASRF, which divides temporal action segmentation into frame-wise action classification and action boundary regression. Our framework refines frame-level hypotheses of action classes using predicted action boundaries. (ii) We propose a loss function for smoothing the transition of action probabilities, and analyze combinations of various loss functions for temporal action segmentation. (iii) Our framework outperforms state-of-the-art methods on three challenging datasets, offering an improvement of up to 13.7% in terms of segmental edit distance and up to 16.1% in terms of segmental F1 score. Our code will be publicly available soon.
arXivで時系列行動セグメンテーションの論文を公開しました!時間軸の人物行動を過剰に分離してしまう問題(Over Segmentation)の解決に、行動間の切れ目を認識するネットワークを導入し、現在の最高精度を達成しました。F1値では従来法と比較し最大+16.1pt 向上しています。https://t.co/6GMr1yZrTu pic.twitter.com/6NUDjiVm0W
— cvpaper.challenge (@CVpaperChalleng) July 15, 2020
初arXivでした
— yuchi (@yciskw_) July 15, 2020
次はもっといい研究できるように頑張りますhttps://t.co/gdyeK5g03h
5. Transformer-XL Based Music Generation with Multiple Sequences of Time-valued Notes
Xianchao Wu, Chengyuan Wang, Qinying Lei
Current state-of-the-art AI based classical music creation algorithms such as Music Transformer are trained by employing single sequence of notes with time-shifts. The major drawback of absolute time interval expression is the difficulty of similarity computing of notes that share the same note value yet different tempos, in one or among MIDI files. In addition, the usage of single sequence restricts the model to separately and effectively learn music information such as harmony and rhythm. In this paper, we propose a framework with two novel methods to respectively track these two shortages, one is the construction of time-valued note sequences that liberate note values from tempos and the other is the separated usage of four sequences, namely, former note on to current note on, note on to note off, pitch, and velocity, for jointly training of four Transformer-XL networks. Through training on a 23-hour piano MIDI dataset, our framework generates significantly better and hour-level longer music than three state-of-the-art baselines, namely Music Transformer, DeepJ, and single sequence-based Transformer-XL, evaluated automatically and manually.
Transformer-XL Based Music Generation with Multiple Sequences of Time-valued Notes
— AK (@ak92501) July 15, 2020
pdf: https://t.co/xTrQBOTspz
abs: https://t.co/GiCuFyyVOc pic.twitter.com/k8fVWqGmku
6. Causal Inference using Gaussian Processes with Structured Latent Confounders
Sam Witty, Kenta Takatsu, David Jensen, Vikash Mansinghka
Latent confounders---unobserved variables that influence both treatment and outcome---can bias estimates of causal effects. In some cases, these confounders are shared across observations, e.g. all students taking a course are influenced by the course’s difficulty in addition to any educational interventions they receive individually. This paper shows how to semiparametrically model latent confounders that have this structure and thereby improve estimates of causal effects. The key innovations are a hierarchical Bayesian model, Gaussian processes with structured latent confounders (GP-SLC), and a Monte Carlo inference algorithm for this model based on elliptical slice sampling. GP-SLC provides principled Bayesian uncertainty estimates of individual treatment effect with minimal assumptions about the functional forms relating confounders, covariates, treatment, and outcome. Finally, this paper shows GP-SLC is competitive with or more accurate than widely used causal inference techniques on three benchmark datasets, including the Infant Health and Development Program and a dataset showing the effect of changing temperatures on state-wide energy consumption across New England.
Our paper “Causal Inference using Gaussian Processes with Structured Latent Confounders” has been accepted at #ICML2020! Joint work with @kenta_takatsu, David Jensen, and @vmansinghka.
— Sam Witty (@ICML2020) (@swittbit) July 15, 2020
Paper: https://t.co/yufXNEloo8
Poster: https://t.co/87229HOFRo pic.twitter.com/eI2fLwTzAi
7. A Survey of Numerical Methods Utilizing Mixed Precision Arithmetic
Ahmad Abdelfattah, Hartwig Anzt, Erik G. Boman, Erin Carson, Terry Cojean, Jack Dongarra, Mark Gates, Thomas Grützmacher, Nicholas J. Higham, Sherry Li, Neil Lindquist, Yang Liu, Jennifer Loe, Piotr Luszczek, Pratik Nayak, Sri Pranesh, Siva Rajamanickam, Tobias Ribizel, Barry Smith, Kasia Swirydowicz, Stephen Thomas, Stanimire Tomov, Yaohung M. Tsai, Ichitaro Yamazaki, Urike Meier Yang
Within the past years, hardware vendors have started designing low precision special function units in response to the demand of the Machine Learning community and their demand for high compute power in low precision formats. Also the server-line products are increasingly featuring low-precision special function units, such as the NVIDIA tensor cores in ORNL’s Summit supercomputer providing more than an order of magnitude higher performance than what is available in IEEE double precision. At the same time, the gap between the compute power on the one hand and the memory bandwidth on the other hand keeps increasing, making data access and communication prohibitively expensive compared to arithmetic operations. To start the multiprecision focus effort, we survey the numerical linear algebra community and summarize all existing multiprecision knowledge, expertise, and software capabilities in this landscape analysis report. We also include current efforts and preliminary results that may not yet be considered “mature technology,” but have the potential to grow into production quality within the multiprecision focus effort. As we expect the reader to be familiar with the basics of numerical linear algebra, we refrain from providing a detailed background on the algorithms themselves but focus on how mixed- and multiprecision technology can help improving the performance of these methods and present highlights of application significantly outperforming the traditional fixed precision methods.
Exascale Computing Project Team have presented a comprehensive survey on numerical methods utilizing mixed precision arithmetic, highlighting some application which significantly outperform the traditional fixed precision methods. #HPChttps://t.co/I0XOmsWu3g pic.twitter.com/eMOAbVViTC
— Underfox (@Underfox3) July 15, 2020
8. Power, Preferment, and Patronage: Catholic Bishops, Social Networks, and the Affair(s) of Ex-Cardinal McCarrick
Stephen Bullivant, Giovanni Radhitio Putra Sadewo
Social Network Analysis (SNA) has shed powerful light on cultures where the influence of patronage, preferment, and reciprocal obligations are traditionally important. Accordingly, we argue here that episcopal appointments, culture, and governance within the Catholic Church are ideal topics for SNA interrogation. We analyse original network data for the Catholic Bishops’ Conference of England and Wales, and the United States Conference of Catholic Bishops. Significantly, we show how a network-informed approach may help with the urgent task of understanding the ecclesiastical cultures in which sexual abuse occurs, and/or is enabled, ignored, and covered up. Particular reference is made to Theodore McCarrick, the former DC Archbishop “dismissed from the clerical state” for sexual offences. Commentators naturally use terms like “protege”, “clique”, “network”, and “kingmaker” when discussing both the McCarrick affair and church politics more generally: precisely such folk-descriptions of social and political life that SNA is designed to quantify and explain.
9. Deep Retrieval: An End-to-End Learnable Structure Model for Large-Scale Recommendations
Weihao Gao, Xiangjun Fan, Jiankai Sun, Kai Jia, Wenzhi Xiao, Chong Wang, Xiaobing Liu
One of the core problems in large-scale recommendations is to retrieve top relevant candidates accurately and efficiently, preferably in sub-linear time. Previous approaches are mostly based on a two-step procedure: first learn an inner-product model and then use maximum inner product search (MIPS) algorithms to search top candidates, leading to potential loss of retrieval accuracy. In this paper, we present Deep Retrieval (DR), an end-to-end learnable structure model for large-scale recommendations. DR encodes all candidates into a discrete latent space. Those latent codes for the candidates are model parameters and to be learnt together with other neural network parameters to maximize the same objective function. With the model learnt, a beam search over the latent codes is performed to retrieve the top candidates. Empirically, we showed that DR, with sub-linear computational complexity, can achieve almost the same accuracy as the brute-force baseline.