Hot Papers 2020-08-06

1. Hopfield Networks is All You Need

Hubert Ramsauer, Bernhard Schäfl, Johannes Lehner, Philipp Seidl, Michael Widrich, Lukas Gruber, Markus Holzleitner, Milena Pavlović, Geir Kjetil Sandve, Victor Greiff, David Kreil, Michael Kopp, Günter Klambauer, Johannes Brandstetter, Sepp Hochreiter

retweets: 435, favorites: 1918 (08/07/2020 09:11:52)
links: abs | pdf
cs.NE | cs.CL | cs.LG | stat.ML

We show that the transformer attention mechanism is the update rule of a modern Hopfield network with continuous states. This new Hopfield network can store exponentially (with the dimension) many patterns, converges with one update, and has exponentially small retrieval errors. The number of stored patterns is traded off against convergence speed and retrieval error. The new Hopfield network has three types of energy minima (fixed points of the update): (1) global fixed point averaging over all patterns, (2) metastable states averaging over a subset of patterns, and (3) fixed points which store a single pattern. Transformer and BERT models operate in their first layers preferably in the global averaging regime, while they operate in higher layers in metastable states. The gradient in transformers is maximal for metastable states, is uniformly distributed for global averaging, and vanishes for a fixed point near a stored pattern. Using the Hopfield network interpretation, we analyzed learning of transformer and BERT models. Learning starts with attention heads that average and then most of them switch to metastable states. However, the majority of heads in the first layers still averages and can be replaced by averaging, e.g. our proposed Gaussian weighting. In contrast, heads in the last layers steadily learn and seem to use metastable states to collect information created in lower layers. These heads seem to be a promising target for improving transformers. Neural networks with Hopfield networks outperform other methods on immune repertoire classification, where the Hopfield net stores several hundreds of thousands of patterns. We provide a new PyTorch layer called “Hopfield”, which allows to equip deep learning architectures with modern Hopfield networks as a new powerful concept comprising pooling, memory, and attention. GitHub: https://github.com/ml-jku/hopfield-layers

Self-attention mechanism can be viewed as the update rule of a Hopfield network with continuous states.

Deep learning models can take advantage of Hopfield networks as a powerful concept comprising pooling, memory, and attention.https://t.co/FL8PimjVo9 https://t.co/HT79M95lkn pic.twitter.com/Ld2eioVsDG
— hardmaru (@hardmaru) August 6, 2020

Hopfield Networks is All You Need

Shows that attention mechanism of transformers is equivalent to the update rule of a modern Hopfield network with continuous states. https://t.co/gfvAgEeZM4
— Aran Komatsuzaki (@arankomatsuzaki) August 6, 2020

Attention mechanism of transformers is equivalent to the update rule of a modern Hopfield network with continuous states!

Proud to announce the latest groundbreaking paper by Sepp Hochreiter team and our #IARAI colleagues!
👉https://t.co/oFxpw7EPkk #deeplearning #ai @LITAILab pic.twitter.com/PwqZnBxv4E
— IARAI (@IARAInews) August 6, 2020

Hopfield Networks is All You Need
pdf: https://t.co/SWFnVFNS8h
abs: https://t.co/erpgXRmPqJ
github: https://t.co/MWrtQlsNNO pic.twitter.com/0VmtHZK9QX
— AK (@ak92501) August 6, 2020

Great paper but the elephant in the room is... Shouldn't it be "Hopfield Networks *are* All You Need"?https://t.co/kqOPs4Yxkr
— Tiago Ramalho (@tmramalho) August 6, 2020

2. NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections

Ricardo Martin-Brualla, Noha Radwan, Mehdi S. M. Sajjadi, Jonathan T. Barron, Alexey Dosovitskiy, Daniel Duckworth

retweets: 102, favorites: 426 (08/07/2020 09:11:53)
links: abs | pdf
cs.CV | cs.GR | cs.LG

We present a learning-based method for synthesizing novel views of complex outdoor scenes using only unstructured collections of in-the-wild photographs. We build on neural radiance fields (NeRF), which uses the weights of a multilayer perceptron to implicitly model the volumetric density and color of a scene. While NeRF works well on images of static subjects captured under controlled settings, it is incapable of modeling many ubiquitous, real-world phenomena in uncontrolled images, such as variable illumination or transient occluders. In this work, we introduce a series of extensions to NeRF to address these issues, thereby allowing for accurate reconstructions from unstructured image collections taken from the internet. We apply our system, which we dub NeRF-W, to internet photo collections of famous landmarks, thereby producing photorealistic, spatially consistent scene representations despite unknown and confounding factors, resulting in significant improvement over the state of the art.

NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections
pdf: https://t.co/KHINkovxPi
abs: https://t.co/XWSUJZeqMA
project page: https://t.co/7S682vbr18 pic.twitter.com/c3uH73XBJQ
— AK (@ak92501) August 6, 2020

This project wouldn’t have been possible without my amazing coauthors: @rmbrualla, Noha Radwan, Mehdi S. M. Sajjadi, @jon_barron, and Alexey Dosovitskiy. Check out our paper: https://t.co/rEWPAGSAxE
— Daniel Duckworth (@duck) August 6, 2020

3. Generalized Word Shift Graphs: A Method for Visualizing and Explaining Pairwise Comparisons Between Texts

Ryan J. Gallagher, Morgan R. Frank, Lewis Mitchell, Aaron J. Schwartz, Andrew J. Reagan, Christopher M. Danforth, Peter Sheridan Dodds

retweets: 94, favorites: 304 (08/07/2020 09:11:54)
links: abs | pdf
cs.CL | cs.CY | cs.SI | physics.soc-ph

A common task in computational text analyses is to quantify how two corpora differ according to a measurement like word frequency, sentiment, or information content. However, collapsing the texts’ rich stories into a single number is often conceptually perilous, and it is difficult to confidently interpret interesting or unexpected textual patterns without looming concerns about data artifacts or measurement validity. To better capture fine-grained differences between texts, we introduce generalized word shift graphs, visualizations which yield a meaningful and interpretable summary of how individual words contribute to the variation between two texts for any measure that can be formulated as a weighted average. We show that this framework naturally encompasses many of the most commonly used approaches for comparing texts, including relative frequencies, dictionary scores, and entropy-based measures like the Kullback-Leibler and Jensen-Shannon divergences. Through several case studies, we demonstrate how generalized word shift graphs can be flexibly applied across domains for diagnostic investigation, hypothesis generation, and substantive interpretation. By providing a detailed lens into textual shifts between corpora, generalized word shift graphs help computational social scientists, digital humanists, and other text analysis practitioners fashion more robust scientific narratives.

Tired of word clouds? Want to do better sentiment analysis? Not sure how to look at the words underneath your measures?

Our long overdue paper on generalized word shift graphs is finally here!https://t.co/lIBXvbMJWX https://t.co/vSL1REYT8V

So what are they?

1/n pic.twitter.com/4NM6HoZcGg
— Ryan J. Gallagher (@ryanjgallag) August 6, 2020

4. Word meaning in minds and machines

Brenden M. Lake, Gregory L. Murphy

retweets: 19, favorites: 129 (08/07/2020 09:11:54)
links: abs | pdf
cs.CL | cs.AI | cs.LG

Machines show an increasingly broad set of linguistic competencies, thanks to recent progress in Natural Language Processing (NLP). Many algorithms stem from past computational work in psychology, raising the question of whether they understand words as people do. In this paper, we compare how humans and machines represent the meaning of words. We argue that contemporary NLP systems are promising models of human word similarity, but they fall short in many other respects. Current models are too strongly linked to the text-based patterns in large corpora, and too weakly linked to the desires, goals, and beliefs that people use words in order to express. Word meanings must also be grounded in vision and action, and capable of flexible combinations, in ways that current systems are not. We pose concrete challenges for developing machines with a more human-like, conceptual basis for word meaning. We also discuss implications for cognitive science and NLP.

How can we build machines that understand words as people do? Models must look beyond patterns in text to secure a more grounded, conceptual foundation for word meaning, with links to beliefs and desires while supporting flexible composition. w/@glmurphy39 https://t.co/9MCDl4bMUV
— Brenden Lake (@LakeBrenden) August 6, 2020

5. Differentially Private Accelerated Optimization Algorithms

Nurdan Kuru, Ş. İlker Birbil, Mert Gurbuzbalaban, Sinan Yildirim

retweets: 7, favorites: 97 (08/07/2020 09:11:54)
links: abs | pdf
cs.LG | cs.CR | math.OC | stat.ML

We present two classes of differentially private optimization algorithms derived from the well-known accelerated first-order methods. The first algorithm is inspired by Polyak’s heavy ball method and employs a smoothing approach to decrease the accumulated noise on the gradient steps required for differential privacy. The second class of algorithms are based on Nesterov’s accelerated gradient method and its recent multi-stage variant. We propose a noise dividing mechanism for the iterations of Nesterov’s method in order to improve the error behavior of the algorithm. The convergence rate analyses are provided for both the heavy ball and the Nesterov’s accelerated gradient method with the help of the dynamical system analysis techniques. Finally, we conclude with our numerical experiments showing that the presented algorithms have advantages over the well-known differentially private algorithms.

Mahremiyet gözeten optimizasyon algoritmaları üzerine yazdığımız makaleyi de açık erişime koyduk.https://t.co/8Tc0i3DMe3
— İlker Birbil (@sibirbil) August 6, 2020

6. MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and Accompaniment

Hao-Wen Dong, Wen-Yi Hsiao, Li-Chia Yang, Yi-Hsuan Yang

retweets: 15, favorites: 70 (08/07/2020 09:11:54)
links: abs | pdf
eess.AS | cs.AI | cs.LG | cs.SD | stat.ML

Generating music has a few notable differences from generating images and videos. First, music is an art of time, necessitating a temporal model. Second, music is usually composed of multiple instruments/tracks with their own temporal dynamics, but collectively they unfold over time interdependently. Lastly, musical notes are often grouped into chords, arpeggios or melodies in polyphonic music, and thereby introducing a chronological ordering of notes is not naturally suitable. In this paper, we propose three models for symbolic multi-track music generation under the framework of generative adversarial networks (GANs). The three models, which differ in the underlying assumptions and accordingly the network architectures, are referred to as the jamming model, the composer model and the hybrid model. We trained the proposed models on a dataset of over one hundred thousand bars of rock music and applied them to generate piano-rolls of five tracks: bass, drums, guitar, piano and strings. A few intra-track and inter-track objective metrics are also proposed to evaluate the generative results, in addition to a subjective user study. We show that our models can generate coherent music of four bars right from scratch (i.e. without human inputs). We also extend our models to human-AI cooperative music generation: given a specific track composed by human, we can generate four additional tracks to accompany it. All code, the dataset and the rendered audio samples are available at https://salu133445.github.io/musegan/ .

Some projects that really do produce music with GANs:

SeqGAN: https://t.co/LMDaba0NSA
MuseGAN: https://t.co/U5nU42Cg9U
MidiNet: https://t.co/g9s4b8Jb31

My comment about the video above is only a sarcastic analogy. The robotic arms video doesn't use a GAN as far as I know.
— Reza Zadeh (@Reza_Zadeh) December 24, 2017

7. Learning to Denoise Historical Music

Yunpeng Li, Beat Gfeller, Marco Tagliasacchi, Dominik Roblek

retweets: 12, favorites: 64 (08/07/2020 09:11:54)
links: abs | pdf
eess.AS | cs.LG

We propose an audio-to-audio neural network model that learns to denoise old music recordings. Our model internally converts its input into a time-frequency representation by means of a short-time Fourier transform (STFT), and processes the resulting complex spectrogram using a convolutional neural network. The network is trained with both reconstruction and adversarial objectives on a synthetic noisy music dataset, which is created by mixing clean music with real noise samples extracted from quiet segments of old recordings. We evaluate our method quantitatively on held-out test examples of the synthetic dataset, and qualitatively by human rating on samples of actual historical recordings. Our results show that the proposed method is effective in removing noise, while preserving the quality and details of the original music.

Learning to Denoise Historical Music
pdf: https://t.co/xbcZ0MapfH
abs: https://t.co/UJbYfeNYhy pic.twitter.com/77VsaNAgp4
— AK (@ak92501) August 6, 2020

8. Domain-Specific Mappings for Generative Adversarial Style Transfer

Hsin-Yu Chang, Zhixiang Wang, Yung-Yu Chuang

retweets: 11, favorites: 49 (08/07/2020 09:11:54)
links: abs | pdf
cs.CV

Style transfer generates an image whose content comes from one image and style from the other. Image-to-image translation approaches with disentangled representations have been shown effective for style transfer between two image categories. However, previous methods often assume a shared domain-invariant content space, which could compromise the content representation power. For addressing this issue, this paper leverages domain-specific mappings for remapping latent features in the shared content space to domain-specific content spaces. This way, images can be encoded more properly for style transfer. Experiments show that the proposed method outperforms previous style transfer methods, particularly on challenging scenarios that would require semantic correspondences between images. Code and results are available at https://acht7111020.github.io/DSMAP-demo/.

Domain-Specific Mappings for Generative Adversarial Style Transfer
pdf: https://t.co/aw8iZyJyls
abs: https://t.co/U3rlndmPLe
project page: https://t.co/uOOAZdeAqa pic.twitter.com/iLJkkUMDla
— AK (@ak92501) August 6, 2020

Published 7 Aug 2020

ML Lead at Beatrust. (https://beatrust.com)Tatsuya Shirakawa on Twitter