Hot Papers 2021-06-10

1. Knowledge distillation: A good teacher is patient and consistent

Lucas Beyer, Xiaohua Zhai, Amélie Royer, Larisa Markeeva, Rohan Anil, Alexander Kolesnikov

retweets: 5245, favorites: 92 (06/11/2021 09:50:30)
links: abs | pdf
cs.CV | cs.AI | cs.LG

There is a growing discrepancy in computer vision between large-scale models that achieve state-of-the-art performance and models that are affordable in practical applications. In this paper we address this issue and significantly bridge the gap between these two types of models. Throughout our empirical investigation we do not aim to necessarily propose a new method, but strive to identify a robust and effective recipe for making state-of-the-art large scale models affordable in practice. We demonstrate that, when performed correctly, knowledge distillation can be a powerful tool for reducing the size of large models without compromising their performance. In particular, we uncover that there are certain implicit design choices, which may drastically affect the effectiveness of distillation. Our key contribution is the explicit identification of these design choices, which were not previously articulated in the literature. We back up our findings by a comprehensive empirical study, demonstrate compelling results on a wide range of vision datasets and, in particular, obtain a state-of-the-art ResNet-50 model for ImageNet, which achieves 82.8% top-1 accuracy.

Wondering how to distill big vision models?

Check our recipe: a good teacher is patient and consistent!

Thanks to patience and consistency, we obtained the best ever ResNet-50 on ImageNet, of 82.8% accuracy without tricks.

Paper: https://t.co/obaBeuKJNl pic.twitter.com/Ua9lYZu4df
— Xiaohua Zhai (@XiaohuaZhai) June 10, 2021

So you think you know distillation; it's easy, right?

We thought so too with @XiaohuaZhai @__kolesnikov__ @_arohan_ and the amazing @royaleerieme and Larisa Markeeva.

Until we didn't. But now we do again. Hop on for a ride (+the best ever ResNet50?)

🧵👇https://t.co/3SlkXVZcG3 pic.twitter.com/Qp5qiZzV14
— Lucas Beyer (@giffmana) June 10, 2021

2. CoAtNet: Marrying Convolution and Attention for All Data Sizes

Zihang Dai, Hanxiao Liu, Quoc V. Le, Mingxing Tan

retweets: 4858, favorites: 27 (06/11/2021 09:50:30)
links: abs | pdf
cs.CV | cs.LG

Transformers have attracted increasing interests in computer vision, but they still fall behind state-of-the-art convolutional networks. In this work, we show that while Transformers tend to have larger model capacity, their generalization can be worse than convolutional networks due to the lack of the right inductive bias. To effectively combine the strengths from both architectures, we present CoAtNets(pronounced “coat” nets), a family of hybrid models built from two key insights:(1) depthwise Convolution and self-Attention can be naturally unified via simple relative attention; (2) vertically stacking convolution layers and attention layers in a principled way is surprisingly effective in improving generalization, capacity and efficiency. Experiments show that our CoAtNets achieve state-of-the-art performance under different resource constraints across various datasets. For example, CoAtNet achieves 86.0% ImageNet top-1 accuracy without extra data, and 89.77% with extra JFT data, outperforming prior arts of both convolutional networks and Transformers. Notably, when pre-trained with 13M images fromImageNet-21K, our CoAtNet achieves 88.56% top-1 accuracy, matching ViT-huge pre-trained with 300M images from JFT while using 23x less data.

Happy to introduce CoAtNet: combining convolution and self-attention in a principled way to obtain better capacity and better generalization.

88.56% top-1 with ImageNet21K (13M imgs), matching ViT-huge with JFT (300M imgs).

Paper: https://t.co/AQE33LuzSr pic.twitter.com/YEly0cSaTp
— Mingxing Tan (@tanmingxing) June 10, 2021

3. SpeechBrain: A General-Purpose Speech Toolkit

Mirco Ravanelli, Titouan Parcollet, Peter Plantinga, Aku Rouhe, Samuele Cornell, Loren Lugosch, Cem Subakan, Nauman Dawalatabad, Abdelwahab Heba, Jianyuan Zhong, Ju-Chieh Chou, Sung-Lin Yeh, Szu-Wei Fu, Chien-Feng Liao, Elena Rastorgueva, François Grondin, William Aris, Hwidong Na, Yan Gao, Renato De Mori, Yoshua Bengio

retweets: 2432, favorites: 283 (06/11/2021 09:50:31)
links: abs | pdf
eess.AS | cs.AI | cs.LG | cs.SD

SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to facilitate the research and development of neural speech processing technologies by being simple, flexible, user-friendly, and well-documented. This paper describes the core architecture designed to support several tasks of common interest, allowing users to naturally conceive, compare and share novel speech processing pipelines. SpeechBrain achieves competitive or state-of-the-art performance in a wide range of speech benchmarks. It also provides training recipes, pretrained models, and inference scripts for popular speech datasets, as well as tutorials which allow anyone with basic Python proficiency to familiarize themselves with speech technologies.

I'm happy to announce that a preprint paper on #SpeechBrain is now available on #arXiv:

Preprint: https://t.co/qibipubtfW
Website: https://t.co/a1wqxLucgw

That's a good read for the weekend after #ICASSP2021! 😄@PyTorch @huggingface #DeepLearning #MachineLearning #AI #Speech pic.twitter.com/YViSKLGYae
— Mirco Ravanelli (@mirco_ravanelli) June 10, 2021

Want to know more about @SpeechBrain1? Just have a look at the new paper 👀👀

pdf: https://t.co/op2i0eXi9r
abs: https://t.co/HLFUQ4vSIi

Tutorials: https://t.co/30eh9ZOdz6
GitHub: https://t.co/xN0veKMFqp
HuggingFace: https://t.co/IElc6nYvzN
Website: https://t.co/LXOB2scbSR pic.twitter.com/iKbGtHdl6Q
— Titouan Parcollet (@ParcolletT) June 10, 2021

SpeechBrain: A General-Purpose Speech Toolkit
pdf: https://t.co/DmBMdLniMF
abs: https://t.co/P47ckJ22md

open-source and all-in-one speech toolkit pic.twitter.com/pwANWxBnFT
— AK (@ak92501) June 10, 2021

4. Pretrained Encoders are All You Need

Mina Khan, P Srivatsa, Advait Rane, Shriram Chenniappa, Rishabh Anand, Sherjil Ozair, Pattie Maes

retweets: 960, favorites: 126 (06/11/2021 09:50:31)
links: abs | pdf
cs.LG

Data-efficiency and generalization are key challenges in deep learning and deep reinforcement learning as many models are trained on large-scale, domain-specific, and expensive-to-label datasets. Self-supervised models trained on large-scale uncurated datasets have shown successful transfer to diverse settings. We investigate using pretrained image representations and spatio-temporal attention for state representation learning in Atari. We also explore fine-tuning pretrained representations with self-supervised techniques, i.e., contrastive predictive coding, spatio-temporal contrastive learning, and augmentations. Our results show that pretrained representations are at par with state-of-the-art self-supervised methods trained on domain-specific data. Pretrained representations, thus, yield data and compute-efficient state representations. https://github.com/PAL-ML/PEARL_v1

Pretrained Encoders are All You Need
pdf: https://t.co/61H9Es76xA
abs: https://t.co/nORLGMoKvr
github: https://t.co/d0nQVCmwk5 pic.twitter.com/MG3gU4pdNR
— AK (@ak92501) June 10, 2021

5. Pretraining Representations for Data-Efficient Reinforcement Learning

Max Schwarzer, Nitarshan Rajkumar, Michael Noukhovitch, Ankesh Anand, Laurent Charlin, Devon Hjelm, Philip Bachman, Aaron Courville

retweets: 667, favorites: 198 (06/11/2021 09:50:31)
links: abs | pdf
cs.LG

Data efficiency is a key challenge for deep reinforcement learning. We address this problem by using unlabeled data to pretrain an encoder which is then finetuned on a small amount of task-specific data. To encourage learning representations which capture diverse aspects of the underlying MDP, we employ a combination of latent dynamics modelling and unsupervised goal-conditioned RL. When limited to 100k steps of interaction on Atari games (equivalent to two hours of human experience), our approach significantly surpasses prior work combining offline representation pretraining with task-specific finetuning, and compares favourably with other pretraining methods that require orders of magnitude more data. Our approach shows particular promise when combined with larger models as well as more diverse, task-aligned observational data — approaching human-level performance and data-efficiency on Atari in our best setting. We provide code associated with this work at https://github.com/mila-iqia/SGI.

Deep RL agents usually start from tabula rasa, and struggle to match the data efficiency of humans who rely on strong priors. Can we even the playing field by starting agents off with strong representations of their environments?

We certainly think so: https://t.co/qttjqn7Yhf pic.twitter.com/EbTjr6vzl0
— Max Schwarzer (@max_a_schwarzer) June 10, 2021

Pretraining Representations for Data-Efficient Reinforcement Learning

Proposes SGI, which significantly surpasses prior work on Atari with the steps limited to 100k with an improved unsupervised goal-conditioned RL.

abs: https://t.co/27XPu6NajO
code: https://t.co/758gWsD2yK pic.twitter.com/jpGYoxYMim
— Aran Komatsuzaki (@arankomatsuzaki) June 10, 2021

Pretraining Representations for Data-Efficient Reinforcement Learning
pdf: https://t.co/inaFoYhQlY
abs: https://t.co/ekXdkFikqF
github: https://t.co/Cj9ml9bbv6

uses a combination of pretraining objectives to encourage the agent to learn multiple aspects of environment dynamics pic.twitter.com/g7dsofWDG9
— AK (@ak92501) June 10, 2021

6. AdaMatch: A Unified Approach to Semi-Supervised Learning and Domain Adaptation

David Berthelot, Rebecca Roelofs, Kihyuk Sohn, Nicholas Carlini, Alex Kurakin

retweets: 616, favorites: 111 (06/11/2021 09:50:32)
links: abs | pdf
cs.LG | cs.AI | cs.CV

We extend semi-supervised learning to the problem of domain adaptation to learn significantly higher-accuracy models that train on one data distribution and test on a different one. With the goal of generality, we introduce AdaMatch, a method that unifies the tasks of unsupervised domain adaptation (UDA), semi-supervised learning (SSL), and semi-supervised domain adaptation (SSDA). In an extensive experimental study, we compare its behavior with respective state-of-the-art techniques from SSL, SSDA, and UDA on vision classification tasks. We find AdaMatch either matches or significantly exceeds the state-of-the-art in each case using the same hyper-parameters regardless of the dataset or task. For example, AdaMatch nearly doubles the accuracy compared to that of the prior state-of-the-art on the UDA task for DomainNet and even exceeds the accuracy of the prior state-of-the-art obtained with pre-training by 6.4% when AdaMatch is trained completely from scratch. Furthermore, by providing AdaMatch with just one labeled example per class from the target domain (i.e., the SSDA setting), we increase the target accuracy by an additional 6.1%, and with 5 labeled examples, by 13.6%.

AdaMatch: A Unified Approach to Semi-Supervised
Learning and Domain Adaptation
pdf: https://t.co/9BUJQK3SdZ
abs: https://t.co/G1AOXAopye

a general method designed to boost accuracy on domain shifts when given access to unlabeled data from the new domain pic.twitter.com/n8PCfST3ql
— AK (@ak92501) June 10, 2021

New paper: AdaMatch - Unifying Unsupervised Domain Adaptation (UDA) and Semi-Supervised Learning (SSL) and SSDA. Nearly doubles SotA accuracy for UDA on non-pretrained DomainNet. https://t.co/F0huwzXvhf
1/3
— David Berthelot (@D_Berthelot_ML) June 11, 2021

7. PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training

Kimin Lee, Laura Smith, Pieter Abbeel

retweets: 358, favorites: 117 (06/11/2021 09:50:32)
links: abs | pdf
cs.LG | cs.AI

Conveying complex objectives to reinforcement learning (RL) agents can often be difficult, involving meticulous design of reward functions that are sufficiently informative yet easy enough to provide. Human-in-the-loop RL methods allow practitioners to instead interactively teach agents through tailored feedback; however, such approaches have been challenging to scale since human feedback is very expensive. In this work, we aim to make this process more sample- and feedback-efficient. We present an off-policy, interactive RL algorithm that capitalizes on the strengths of both feedback and off-policy learning. Specifically, we learn a reward model by actively querying a teacher’s preferences between two clips of behavior and use it to train an agent. To enable off-policy learning, we relabel all the agent’s past experience when its reward model changes. We additionally show that pre-training our agents with unsupervised exploration substantially increases the mileage of its queries. We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods, including a variety of locomotion and robotic manipulation skills. We also show that our method is able to utilize real-time human feedback to effectively prevent reward exploitation and learn new behaviors that are difficult to specify with standard reward functions.

Can we learn policies using human feedback without pre-defined rewards efficiently?

We find unsupervised RL and off-policy learning can improve the preference-based RL in PEBBLE!

📑Paper: https://t.co/4NqjsWl1SJ
💻Code & video: https://t.co/cdzy6QZayx
w/ Laura Smith, @pabbeel pic.twitter.com/OmXO9U9Z8i
— Kimin (@kimin_le2) June 10, 2021

PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training
pdf: https://t.co/ZSP5uNGCgl
abs: https://t.co/548k5SKXfY
project page: https://t.co/QCDtMv7OeJ
github: https://t.co/oncteZAPgC pic.twitter.com/sheykWyKL6
— AK (@ak92501) June 10, 2021

8. FastSeq: Make Sequence Generation Faster

Yu Yan, Fei Hu, Jiusheng Chen, Nikhil Bhendawade, Ting Ye, Yeyun Gong, Nan Duan, Desheng Cui, Bingyu Chi, Ruifei Zhang

retweets: 281, favorites: 107 (06/11/2021 09:50:32)
links: abs | pdf
cs.CL | cs.LG

Transformer-based models have made tremendous impacts in natural language generation. However the inference speed is a bottleneck due to large model size and intensive computing involved in auto-regressive decoding process. We develop FastSeq framework to accelerate sequence generation without accuracy loss. The proposed optimization techniques include an attention cache optimization, an efficient algorithm for detecting repeated n-grams, and an asynchronous generation pipeline with parallel I/O. These optimizations are general enough to be applicable to Transformer-based models (e.g., T5, GPT2, and UniLM). Our benchmark results on a set of widely used and diverse models demonstrate 4-9x inference speed gain. Additionally, FastSeq is easy to use with a simple one-line code change. The source code is available at https://github.com/microsoft/fastseq.

FastSeq: Make Sequence Generation Faster

Demonstrates 4-9x inference speed gain on various Transformer-based models with a series of optimization methods.

abs: https://t.co/U5opo0VxSf
code: https://t.co/oH0GCkadYD pic.twitter.com/USsul0lMAB
— Aran Komatsuzaki (@arankomatsuzaki) June 10, 2021

FastSeq: Make Sequence Generation Faster
pdf: https://t.co/oP1I6IfxGN
abs: https://t.co/6GOJGfQa3I
github: https://t.co/fqczhpNqNa

provides general solutions for speeding up the sequence generation without accuracy loss pic.twitter.com/Nfiw6ou928
— AK (@ak92501) June 10, 2021

9. NeRF in detail: Learning to sample for view synthesis

Relja Arandjelović, Andrew Zisserman

retweets: 191, favorites: 147 (06/11/2021 09:50:33)
links: abs | pdf
cs.CV | cs.GR | cs.LG

Neural radiance fields (NeRF) methods have demonstrated impressive novel view synthesis performance. The core approach is to render individual rays by querying a neural network at points sampled along the ray to obtain the density and colour of the sampled points, and integrating this information using the rendering equation. Since dense sampling is computationally prohibitive, a common solution is to perform coarse-to-fine sampling. In this work we address a clear limitation of the vanilla coarse-to-fine approach — that it is based on a heuristic and not trained end-to-end for the task at hand. We introduce a differentiable module that learns to propose samples and their importance for the fine network, and consider and compare multiple alternatives for its neural architecture. Training the proposal module from scratch can be unstable due to lack of supervision, so an effective pre-training strategy is also put forward. The approach, named `NeRF in detail’ (NeRF-ID), achieves superior view synthesis quality over NeRF and the state-of-the-art on the synthetic Blender benchmark and on par or better performance on the real LLFF-NeRF scenes. Furthermore, by leveraging the predicted sample importance, a 25% saving in computation can be achieved without significantly sacrificing the rendering quality.

In our new paper "NeRF in detail: Learning to sample for view synthesis" (aka yet another NeRF paper on your to-read list) we replace the heuristic coarse-to-fine strategy of NeRF via a learnt one. Improvements in rendering quality and speed. https://t.co/434YoDKrFZ pic.twitter.com/BLclVqYbK5
— Relja Arandjelović (@relja_work) June 10, 2021

NeRF in detail: Learning to sample for view synthesis
pdf: https://t.co/h4qxLHFthk
abs: https://t.co/Vf7BvqS6sK

a ‘proposer’ module that learns the hierarchical coarse-to-fine sampling, thus enabling NeRF to be trained end-to-end for the view synthesis task pic.twitter.com/0pp7R4qsZr
— AK (@ak92501) June 10, 2021

10. Generative Models as a Data Source for Multiview Representation Learning

Ali Jahanian, Xavier Puig, Yonglong Tian, Phillip Isola

retweets: 225, favorites: 95 (06/11/2021 09:50:33)
links: abs | pdf
cs.CV

Generative models are now capable of producing highly realistic images that look nearly indistinguishable from the data on which they are trained. This raises the question: if we have good enough generative models, do we still need datasets? We investigate this question in the setting of learning general-purpose visual representations from a black-box generative model rather than directly from data. Given an off-the-shelf image generator without any access to its training data, we train representations from the samples output by this generator. We compare several representation learning methods that can be applied to this setting, using the latent space of the generator to generate multiple “views” of the same semantic content. We show that for contrastive methods, this multiview data can naturally be used to identify positive pairs (nearby in latent space) and negative pairs (far apart in latent space). We find that the resulting representations rival those learned directly from real data, but that good performance requires care in the sampling strategy applied and the training method. Generative models can be viewed as a compressed and organized copy of a dataset, and we envision a future where more and more “model zoos” proliferate while datasets become increasingly unwieldy, missing, or private. This paper suggests several techniques for dealing with visual representation learning in such a future. Code is released on our project page: https://ali-design.github.io/GenRep/

Generative Models as a Data Source for Multiview Representation Learning
pdf: https://t.co/VGQRmL0BS1
abs: https://t.co/50gFZE92QQ
project page: https://t.co/FfqqQKTWCY pic.twitter.com/rRo5HrZD3i
— AK (@ak92501) June 10, 2021

データセットから生成モデルを学習し、生成したデータのみから表現学習する。潜在変数上での近傍を正例としさらに画像上でオーグメンテーションを適用した方が良い表現が得られる。元のデータ上で学習した場合に近い性能が出るが超えず、サンプル数増で精度改善するがサチる https://t.co/x7urGJcndz
— Daisuke Okanohara (@hillbig) June 10, 2021

Ruihui Li, Xianzhi Li, Pheng-Ann Heng, Chi-Wing Fu

retweets: 196, favorites: 59 (06/11/2021 09:50:33)
links: abs | pdf
cs.CV

Point clouds produced by 3D scanning are often sparse, non-uniform, and noisy. Recent upsampling approaches aim to generate a dense point set, while achieving both distribution uniformity and proximity-to-surface, and possibly amending small holes, all in a single network. After revisiting the task, we propose to disentangle the task based on its multi-objective nature and formulate two cascaded sub-networks, a dense generator and a spatial refiner. The dense generator infers a coarse but dense output that roughly describes the underlying surface, while the spatial refiner further fine-tunes the coarse output by adjusting the location of each point. Specifically, we design a pair of local and global refinement units in the spatial refiner to evolve a coarse feature map. Also, in the spatial refiner, we regress a per-point offset vector to further adjust the coarse outputs in fine-scale. Extensive qualitative and quantitative results on both synthetic and real-scanned datasets demonstrate the superiority of our method over the state-of-the-arts.

Point Cloud Upsampling via Disentangled Refinement
pdf: https://t.co/BdfjRbBQfZ
abs: https://t.co/iIjoyyd2nL

disentangle the task based on its multi-objective nature and formulate two cascaded sub-networks, a dense generator and a spatial refiner pic.twitter.com/qi5mkwNOm0
— AK (@ak92501) June 10, 2021

12. Vector Quantized Models for Planning

Sherjil Ozair, Yazhe Li, Ali Razavi, Ioannis Antonoglou, Aäron van den Oord, Oriol Vinyals

retweets: 110, favorites: 83 (06/11/2021 09:50:33)
links: abs | pdf
cs.LG | cs.AI | stat.ML

Recent developments in the field of model-based RL have proven successful in a range of environments, especially ones where planning is essential. However, such successes have been limited to deterministic fully-observed environments. We present a new approach that handles stochastic and partially-observable environments. Our key insight is to use discrete autoencoders to capture the multiple possible effects of an action in a stochastic environment. We use a stochastic variant of Monte Carlo tree search to plan over both the agent’s actions and the discrete latent variables representing the environment’s response. Our approach significantly outperforms an offline version of MuZero on a stochastic interpretation of chess where the opponent is considered part of the environment. We also show that our approach scales to DeepMind Lab, a first-person 3D environment with large visual observations and partial observability.

Vector Quantized Models for Planning
pdf: https://t.co/dWHtizegNQ
abs: https://t.co/MD4oilCux2
project page: https://t.co/HCxGZ10BUN

outperforms modelfree baselines and performs competitively against offline MuZero and Stockfish Level 15 while being a more general algorithm pic.twitter.com/HLMjrMDcra
— AK (@ak92501) June 10, 2021

13. Geometry-Consistent Neural Shape Representation with Implicit Displacement Fields

Wang Yifan, Lukas Rahmann, Olga Sorkine-Hornung

retweets: 132, favorites: 60 (06/11/2021 09:50:33)
links: abs | pdf
cs.CV | cs.GR | cs.LG

We present implicit displacement fields, a novel representation for detailed 3D geometry. Inspired by a classic surface deformation technique, displacement mapping, our method represents a complex surface as a smooth base surface plus a displacement along the base’s normal directions, resulting in a frequency-based shape decomposition, where the high frequency signal is constrained geometrically by the low frequency signal. Importantly, this disentanglement is unsupervised thanks to a tailored architectural design that has an innate frequency hierarchy by construction. We explore implicit displacement field surface reconstruction and detail transfer and demonstrate superior representational power, training stability and generalizability.

Geometry-Consistent Neural Shape Representation with Implicit Displacement Fields
pdf: https://t.co/SqfRmlgILy
abs: https://t.co/zRApiIKl3o @toomanyyifans
novel parameterization of neural implicit shape representation based on displacement mapping for detailed geometry pic.twitter.com/rFS5YviVCI
— AK (@ak92501) June 10, 2021

14. Symmetric Spaces for Graph Embeddings: A Finsler-Riemannian Approach

Federico López, Beatrice Pozzetti, Steve Trettel, Michael Strube, Anna Wienhard

retweets: 56, favorites: 46 (06/11/2021 09:50:34)
links: abs | pdf
cs.LG | cs.CG

Learning faithful graph representations as sets of vertex embeddings has become a fundamental intermediary step in a wide range of machine learning applications. We propose the systematic use of symmetric spaces in representation learning, a class encompassing many of the previously used embedding targets. This enables us to introduce a new method, the use of Finsler metrics integrated in a Riemannian optimization scheme, that better adapts to dissimilar structures in the graph. We develop a tool to analyze the embeddings and infer structural properties of the data sets. For implementation, we choose Siegel spaces, a versatile family of symmetric spaces. Our approach outperforms competitive baselines for graph reconstruction tasks on various synthetic and real-world datasets. We further demonstrate its applicability on two downstream tasks, recommender systems and node classification.

Would you like to embed graphs in a space that simultaneously contains Euclidean and hyperbolic subspaces, products thereof, and SPD submanifolds? 🤯

Happy to share our work on Symmetric Spaces for Graph Embeddings, to be presented at @icmlconf: https://t.co/Nip7GnTaar (1/5) pic.twitter.com/yMu3HgN2cI
— Federico López (@fedelopez77) June 10, 2021

15. Recovering AES Keys with a Deep Cold Boot Attack

Itamar Zimerman, Eliya Nachmani, Lior Wolf

retweets: 30, favorites: 33 (06/11/2021 09:50:34)
links: abs | pdf
cs.CR | cs.IT | cs.LG

Cold boot attacks inspect the corrupted random access memory soon after the power has been shut down. While most of the bits have been corrupted, many bits, at random locations, have not. Since the keys in many encryption schemes are being expanded in memory into longer keys with fixed redundancies, the keys can often be restored. In this work, we combine a novel cryptographic variant of a deep error correcting code technique with a modified SAT solver scheme to apply the attack on AES keys. Even though AES consists of Rijndael S-box elements, that are specifically designed to be resistant to linear and differential cryptanalysis, our method provides a novel formalization of the AES key scheduling as a computational graph, which is implemented by a neural message passing network. Our results show that our methods outperform the state of the art attack methods by a very large margin.

Happy to share our DeepCrypto paper "Recovering AES Keys with a Deep Cold Boot Attack". Accepted to ICML 2021. w/ @ItamarZimerman & Lior Wolf. TD;LR AES attack with neural S-box.https://t.co/gmdFLGQ1fb
@TelAvivUni @facebookai @icmlconf pic.twitter.com/bq2k9o6ZHM
— Eliya Nachmani (@NachmaniEliya) June 10, 2021

16. Neural Extractive Search

Shauli Ravfogel, Hillel Taub-Tabib, Yoav Goldberg

retweets: 30, favorites: 27 (06/11/2021 09:50:34)
links: abs | pdf
cs.CL | cs.IR

Domain experts often need to extract structured information from large corpora. We advocate for a search paradigm called extractive search”, in which a search query is enriched with capture-slots, to allow for such rapid extraction. Such an extractive search system can be built around syntactic structures, resulting in high-precision, low-recall results. We show how the recall can be improved using neural retrieval and alignment. The goals of this paper are to concisely introduce the extractive-search paradigm; and to demonstrate a prototype neural retrieval system for extractive search and its benefits and potential. Our prototype is available at \url{https://spike.neural-sim.apps.allenai.org/} and a video demonstration is available at \url{https://vimeo.com/559586687}.

Neural Extractive Search
pdf: https://t.co/LH8qulaqfN
abs: https://t.co/KS8MX3wKo5
prototype: https://t.co/FUYUOujoh5 pic.twitter.com/tDaHJvxeke
— AK (@ak92501) June 10, 2021

Published 11 Jun 2021

ML Lead at Beatrust. (https://beatrust.com)Tatsuya Shirakawa on Twitter