All Articles

Hot Papers 2021-01-22

1. Characterizing signal propagation to close the performance gap in unnormalized ResNets

Andrew Brock, Soham De, Samuel L. Smith

Batch Normalization is a key component in almost all state-of-the-art image classifiers, but it also introduces practical challenges: it breaks the independence between training examples within a batch, can incur compute and memory overhead, and often results in unexpected bugs. Building on recent theoretical analyses of deep ResNets at initialization, we propose a simple set of analysis tools to characterize signal propagation on the forward pass, and leverage these tools to design highly performant ResNets without activation normalization layers. Crucial to our success is an adapted version of the recently proposed Weight Standardization. Our analysis tools show how this technique preserves the signal in networks with ReLU or Swish activation functions by ensuring that the per-channel activation means do not grow with depth. Across a range of FLOP budgets, our networks attain performance competitive with the state-of-the-art EfficientNets on ImageNet.

2. Learn to Dance with AIST++: Music Conditioned 3D Dance Generation

Ruilong Li, Shan Yang, David A. Ross, Angjoo Kanazawa

In this paper, we present a transformer-based learning framework for 3D dance generation conditioned on music. We carefully design our network architecture and empirically study the keys for obtaining qualitatively pleasing results. The critical components include a deep cross-modal transformer, which well learns the correlation between the music and dance motion; and the full-attention with future-N supervision mechanism which is essential in producing long-range non-freezing motion. In addition, we propose a new dataset of paired 3D motion and music called AIST++, which we reconstruct from the AIST multi-view dance videos. This dataset contains 1.1M frames of 3D dance motion in 1408 sequences, covering 10 genres of dance choreographies and accompanied with multi-view camera parameters. To our knowledge it is the largest dataset of this kind. Rich experiments on AIST++ demonstrate our method produces much better results than the state-of-the-art methods both qualitatively and quantitatively.

3. Noisy intermediate-scale quantum (NISQ) algorithms

Kishor Bharti, Alba Cervera-Lierta, Thi Ha Kyaw, Tobias Haug, Sumner Alperin-Lea, Abhinav Anand, Matthias Degroote, Hermanni Heimonen, Jakob S. Kottmann, Tim Menke, Wai-Keong Mok, Sukin Sim, Leong-Chuan Kwek, Alán Aspuru-Guzik

A universal fault-tolerant quantum computer that can solve efficiently problems such as integer factorization and unstructured database search requires millions of qubits with low error rates and long coherence times. While the experimental advancement towards realizing such devices will potentially take decades of research, noisy intermediate-scale quantum (NISQ) computers already exist. These computers are composed of hundreds of noisy qubits, i.e. qubits that are not error-corrected, and therefore perform imperfect operations in a limited coherence time. In the search for quantum advantage with these devices, algorithms have been proposed for applications in various disciplines spanning physics, machine learning, quantum chemistry and combinatorial optimization. The goal of such algorithms is to leverage the limited available resources to perform classically challenging tasks. In this review, we provide a thorough summary of NISQ computational paradigms and algorithms. We discuss the key structure of these algorithms, their limitations, and advantages. We additionally provide a comprehensive overview of various benchmarking and software tools useful for programming and testing NISQ devices.

4. Can a Fruit Fly Learn Word Embeddings?

Yuchen Liang, Chaitanya K. Ryali, Benjamin Hoover, Leopold Grinberg, Saket Navlakha, Mohammed J. Zaki, Dmitry Krotov

The mushroom body of the fruit fly brain is one of the best studied systems in neuroscience. At its core it consists of a population of Kenyon cells, which receive inputs from multiple sensory modalities. These cells are inhibited by the anterior paired lateral neuron, thus creating a sparse high dimensional representation of the inputs. In this work we study a mathematical formalization of this network motif and apply it to learning the correlational structure between words and their context in a corpus of unstructured text, a common natural language processing (NLP) task. We show that this network can learn semantic representations of words and can generate both static and context-dependent word embeddings. Unlike conventional methods (e.g., BERT, GloVe) that use dense representations for word embedding, our algorithm encodes semantic meaning of words and their context in the form of sparse binary hash codes. The quality of the learned representations is evaluated on word similarity analysis, word-sense disambiguation, and document classification. It is shown that not only can the fruit fly network motif achieve performance comparable to existing methods in NLP, but, additionally, it uses only a fraction of the computational resources (shorter training time and smaller memory footprint).

5. Unifying Cardiovascular Modelling with Deep Reinforcement Learning for Uncertainty Aware Control of Sepsis Treatment

Thesath Nanayakkara, Gilles Clermont, Christopher James Langmead, David Swigon

Sepsis is the leading cause of mortality in the the ICU, responsible for 6% of all hospitalizations and 35% of all in-hospital deaths in USA. However, there is no universally agreed upon strategy for vasopressor and fluid administration. It has also been observed that different patients respond differently to treatment, highlighting the need for individualized treatment. Vasopressors and fluids are administrated with specific effects to cardiovascular physiology in mind and medical research has suggested that physiologic, hemodynamically guided, approaches to treatment. Thus we propose a novel approach, exploiting and unifying complementary strengths of Mathematical Modelling, Deep Learning, Reinforcement Learning and Uncertainty Quantification, to learn individualized, safe, and uncertainty aware treatment strategies. We first infer patient-specific, dynamic cardiovascular states using a novel physiology-driven recurrent neural network trained in an unsupervised manner. This information, along with a learned low dimensional representation of the patient’s lab history and observable data, is then used to derive value distributions using Batch Distributional Reinforcement Learning. Moreover in a safety critical domain it is essential to know what our agent does and does not know, for this we also quantity the model uncertainty associated with each patient state and action, and propose a general framework for uncertainty aware, interpretable treatment policies. This framework can be tweaked easily, to reflect a clinician’s own confidence of of the framework, and can be easily modified to factor in human expert opinion, whenever it’s accessible. Using representative patients and a validation cohort, we show that our method has learned physiologically interpretable generalizable policies.

6. LEAF: A Learnable Frontend for Audio Classification

Neil Zeghidour, Olivier Teboul, Félix de Chaumont Quitry, Marco Tagliasacchi

Mel-filterbanks are fixed, engineered audio features which emulate human perception and have been used through the history of audio understanding up to today. However, their undeniable qualities are counterbalanced by the fundamental limitations of handmade representations. In this work we show that we can train a single learnable frontend that outperforms mel-filterbanks on a wide range of audio signals, including speech, music, audio events and animal sounds, providing a general-purpose learned frontend for audio classification. To do so, we introduce a new principled, lightweight, fully learnable architecture that can be used as a drop-in replacement of mel-filterbanks. Our system learns all operations of audio features extraction, from filtering to pooling, compression and normalization, and can be integrated into any neural network at a negligible parameter cost. We perform multi-task training on eight diverse audio classification tasks, and show consistent improvements of our model over mel-filterbanks and previous learnable alternatives. Moreover, our system outperforms the current state-of-the-art learnable frontend on Audioset, with orders of magnitude fewer parameters.

7. DAF:re: A Challenging, Crowd-Sourced, Large-Scale, Long-Tailed Dataset For Anime Character Recognition

Edwin Arkel Rios, Wen-Huang Cheng, Bo-Cheng Lai

  • retweets: 650, favorites: 52 (01/24/2021 09:44:33)
  • links: abs | pdf
  • cs.CV

In this work we tackle the challenging problem of anime character recognition. Anime, referring to animation produced within Japan and work derived or inspired from it. For this purpose we present DAF:re (DanbooruAnimeFaces:revamped), a large-scale, crowd-sourced, long-tailed dataset with almost 500 K images spread across more than 3000 classes. Additionally, we conduct experiments on DAF:re and similar datasets using a variety of classification models, including CNN based ResNets and self-attention based Vision Transformer (ViT). Our results give new insights into the generalization and transfer learning properties of ViT models on substantially different domain datasets from those used for the upstream pre-training, including the influence of batch and image size in their training. Additionally, we share our dataset, source-code, pre-trained checkpoints and results, as Animesion, the first end-to-end framework for large-scale anime character recognition: https://github.com/arkel23/animesion

8. Pre-training without Natural Images

Hirokatsu Kataoka, Kazushige Okayasu, Asato Matsumoto, Eisuke Yamagata, Ryosuke Yamada, Nakamasa Inoue, Akio Nakamura, Yutaka Satoh

  • retweets: 110, favorites: 41 (01/24/2021 09:44:33)
  • links: abs | pdf
  • cs.CV | cs.LG

Is it possible to use convolutional neural networks pre-trained without any natural images to assist natural image understanding? The paper proposes a novel concept, Formula-driven Supervised Learning. We automatically generate image patterns and their category labels by assigning fractals, which are based on a natural law existing in the background knowledge of the real world. Theoretically, the use of automatically generated images instead of natural images in the pre-training phase allows us to generate an infinite scale dataset of labeled images. Although the models pre-trained with the proposed Fractal DataBase (FractalDB), a database without natural images, does not necessarily outperform models pre-trained with human annotated datasets at all settings, we are able to partially surpass the accuracy of ImageNet/Places pre-trained models. The image representation with the proposed FractalDB captures a unique feature in the visualization of convolutional layers and attentions.

9. MLPF: Efficient machine-learned particle-flow reconstruction using graph neural networks

Joosep Pata, Javier Duarte, Jean-Roch Vlimant, Maurizio Pierini, Maria Spiropulu

In general-purpose particle detectors, the particle flow algorithm may be used to reconstruct a coherent particle-level view of the event by combining information from the calorimeters and the trackers, significantly improving the detector resolution for jets and the missing transverse momentum. In view of the planned high-luminosity upgrade of the CERN Large Hadron Collider, it is necessary to revisit existing reconstruction algorithms and ensure that both the physics and computational performance are sufficient in a high-pileup environment. Recent developments in machine learning may offer a prospect for efficient event reconstruction based on parametric models. We introduce MLPF, an end-to-end trainable machine-learned particle flow algorithm for reconstructing particle flow candidates based on parallelizable, computationally efficient, scalable graph neural networks and a multi-task objective. We report the physics and computational performance of the MLPF algorithm on on a synthetic dataset of ttbar events in HL-LHC running conditions, including the simulation of multiple interaction effects, and discuss potential next steps and considerations towards ML-based reconstruction in a general purpose particle detector.

10. Aesthetics, Personalization and Recommendation: A survey on Deep Learning in Fashion

Wei Gong, Laila Khalid

  • retweets: 83, favorites: 20 (01/24/2021 09:44:34)
  • links: abs | pdf
  • cs.CV | cs.LG

Machine learning is completely changing the trends in the fashion industry. From big to small every brand is using machine learning techniques in order to improve their revenue, increase customers and stay ahead of the trend. People are into fashion and they want to know what looks best and how they can improve their style and elevate their personality. Using Deep learning technology and infusing it with Computer Vision techniques one can do so by utilizing Brain-inspired Deep Networks, and engaging into Neuroaesthetics, working with GANs and Training them, playing around with Unstructured Data,and infusing the transformer architecture are just some highlights which can be touched with the Fashion domain. Its all about designing a system that can tell us information regarding the fashion aspect that can come in handy with the ever growing demand. Personalization is a big factor that impacts the spending choices of customers.The survey also shows remarkable approaches that encroach the subject of achieving that by divulging deep into how visual data can be interpreted and leveraged into different models and approaches. Aesthetics play a vital role in clothing recommendation as users’ decision depends largely on whether the clothing is in line with their aesthetics, however the conventional image features cannot portray this directly. For that the survey also highlights remarkable models like tensor factorization model, conditional random field model among others to cater the need to acknowledge aesthetics as an important factor in Apparel recommendation.These AI inspired deep models can pinpoint exactly which certain style resonates best with their customers and they can have an understanding of how the new designs will set in with the community. With AI and machine learning your businesses can stay ahead of the fashion trends.

11. Robust Reinforcement Learning on State Observations with Learned Optimal Adversary

Huan Zhang, Hongge Chen, Duane Boning, Cho-Jui Hsieh

We study the robustness of reinforcement learning (RL) with adversarially perturbed state observations, which aligns with the setting of many adversarial attacks to deep reinforcement learning (DRL) and is also important for rolling out real-world RL agent under unpredictable sensing noise. With a fixed agent policy, we demonstrate that an optimal adversary to perturb state observations can be found, which is guaranteed to obtain the worst case agent reward. For DRL settings, this leads to a novel empirical adversarial attack to RL agents via a learned adversary that is much stronger than previous ones. To enhance the robustness of an agent, we propose a framework of alternating training with learned adversaries (ATLA), which trains an adversary online together with the agent using policy gradient following the optimal adversarial attack framework. Additionally, inspired by the analysis of state-adversarial Markov decision process (SA-MDP), we show that past states and actions (history) can be useful for learning a robust agent, and we empirically find a LSTM based policy can be more robust under adversaries. Empirical evaluations on a few continuous control environments show that ATLA achieves state-of-the-art performance under strong adversaries. Our code is available at https://github.com/huanzhang12/ATLA_robust_RL.

12. Zero-shot Generalization in Dialog State Tracking through Generative Question Answering

Shuyang Li, Jin Cao, Mukund Sridhar, Henghui Zhu, Shang-Wen Li, Wael Hamza, Julian McAuley

  • retweets: 17, favorites: 34 (01/24/2021 09:44:34)
  • links: abs | pdf
  • cs.CL

Dialog State Tracking (DST), an integral part of modern dialog systems, aims to track user preferences and constraints (slots) in task-oriented dialogs. In real-world settings with constantly changing services, DST systems must generalize to new domains and unseen slot types. Existing methods for DST do not generalize well to new slot names and many require known ontologies of slot types and values for inference. We introduce a novel ontology-free framework that supports natural language queries for unseen constraints and slots in multi-domain task-oriented dialogs. Our approach is based on generative question-answering using a conditional language model pre-trained on substantive English sentences. Our model improves joint goal accuracy in zero-shot domain adaptation settings by up to 9% (absolute) over the previous state-of-the-art on the MultiWOZ 2.1 dataset.