Hot Papers 2020-09-07

1. KILT: a Benchmark for Knowledge Intensive Language Tasks

Fabio Petroni, Aleksandra Piktus, Angela Fan, Patrick Lewis, Majid Yazdani, Nicola De Cao, James Thorne, Yacine Jernite, Vassilis Plachouras, Tim Rocktäschel, Sebastian Riedel

retweets: 91, favorites: 267 (09/08/2020 08:08:17)
links: abs | pdf
cs.CL | cs.AI | cs.IR | cs.LG

Challenging problems such as open-domain question answering, fact checking, slot filling and entity linking require access to large, external knowledge sources. While some models do well on individual tasks, developing general models is difficult as each task might require computationally expensive indexing of custom knowledge sources, in addition to dedicated infrastructure. To catalyze research on models that condition on specific information in large textual resources, we present a benchmark for knowledge-intensive language tasks (KILT). All tasks in KILT are grounded in the same snapshot of Wikipedia, reducing engineering turnaround through the re-use of components, as well as accelerating research into task-agnostic memory architectures. We test both task-specific and general baselines, evaluating downstream performance in addition to the ability of the models to provide provenance. We find that a shared dense vector index coupled with a seq2seq model is a strong baseline, outperforming more tailor-made approaches for fact checking, open-domain question answering and dialogue, and yielding competitive results on entity linking and slot filling, by generating disambiguated text. KILT data and code are available at https://github.com/facebookresearch/KILT.

I'm excited to present KILT, a benchmark for Knowledge Intensive Language Tasks. 11 datasets, 5 task families, a unified knowledge source and interface -> https://t.co/cL1VFJoH8Y. Collaboration b/w @facebookai @ucl_nlp @labo_Loria @cambridgenlp @huggingface @AmsterdamNLP 🚀 [1/6] pic.twitter.com/VEjUDf7now
— Fabio Petroni (@Fabio_Petroni) September 7, 2020

2. Qibo: a framework for quantum simulation with hardware acceleration

Stavros Efthymiou, Sergi Ramos-Calderer, Carlos Bravo-Prieto, Adrián Pérez-Salinas, Diego García-Martín, Artur Garcia-Saez, José Ignacio Latorre, Stefano Carrazza

retweets: 18, favorites: 84 (09/08/2020 08:08:18)
links: abs | pdf
quant-ph | cs.DC | cs.LG

We present Qibo, a new open-source software for fast evaluation of quantum circuits and adiabatic evolution which takes full advantage of hardware accelerators. The growing interest in quantum computing and the recent developments of quantum hardware devices motivates the development of new advanced computational tools focused on performance and usage simplicity. In this work we introduce a new quantum simulation framework that enables developers to delegate all complicated aspects of hardware or platform implementation to the library so they can focus on the problem and quantum algorithms at hand. This software is designed from scratch with simulation performance, code simplicity and user friendly interface as target goals. It takes advantage of hardware acceleration such as multi-threading CPU, single GPU and multi-GPU devices.

QIBO released: https://t.co/2HkKlNy3I0, https://t.co/gr72ZFhOJN. An opensource quantum language with support for fast performance on GPU&multiGPU. Equipped with VQE, QAOA, Adiabatic Evolution (optimization of scheduling), examples for 3SAT oracle, hash, condensed matter, finance.
— José Ignacio Latorre (@j_i_latorre) September 7, 2020

We are thrilled to announce the Qibo release! Qibo is a new open-source software for the simulation of quantum circuits and adiabatic evolution. Check out its impressive performance compared to other frameworks, especially for a large number of qubits.https://t.co/OnEBcQNzJt pic.twitter.com/7LUx22oCzw
— Carlos Bravo-Prieto (@charl_bp) September 7, 2020

3. TiVGAN: Text to Image to Video Generation with Step-by-Step Evolutionary Generator

Doyeon Kim, Donggyu Joo, Junmo Kim

retweets: 16, favorites: 68 (09/08/2020 08:08:18)
links: abs | pdf
cs.CV

Advances in technology have led to the development of methods that can create desired visual multimedia. In particular, image generation using deep learning has been extensively studied across diverse fields. In comparison, video generation, especially on conditional inputs, remains a challenging and less explored area. To narrow this gap, we aim to train our model to produce a video corresponding to a given text description. We propose a novel training framework, Text-to-Image-to-Video Generative Adversarial Network (TiVGAN), which evolves frame-by-frame and finally produces a full-length video. In the first phase, we focus on creating a high-quality single video frame while learning the relationship between the text and an image. As the steps proceed, our model is trained gradually on more number of consecutive frames.This step-by-step learning process helps stabilize the training and enables the creation of high-resolution video based on conditional text descriptions. Qualitative and quantitative experimental results on various datasets demonstrate the effectiveness of the proposed method.

TiVGAN: Text to Image to Video Generation with Step-by-Step Evolutionary Generator
pdf: https://t.co/jS8ZQDsk0x
abs: https://t.co/5BYGvtGJOj pic.twitter.com/KdjCmHtauE
— AK (@ak92501) September 7, 2020

4. A Haystack Full of Needles: Scalable Detection of IoT Devices in the Wild

Said Jawad Saidi, Anna Maria Mandalari, Roman Kolcun, Hamed Haddadi, Daniel J. Dubois, David Choffnes, Georgios Smaragdakis, Anja Feldmann

retweets: 18, favorites: 60 (09/08/2020 08:08:18)
links: abs | pdf
cs.NI

Consumer Internet of Things (IoT) devices are extremely popular, providing users with rich and diverse functionalities, from voice assistants to home appliances. These functionalities often come with significant privacy and security risks, with notable recent large scale coordinated global attacks disrupting large service providers. Thus, an important first step to address these risks is to know what IoT devices are where in a network. While some limited solutions exist, a key question is whether device discovery can be done by Internet service providers that only see sampled flow statistics. In particular, it is challenging for an ISP to efficiently and effectively track and trace activity from IoT devices deployed by its millions of subscribers —all with sampled network data. In this paper, we develop and evaluate a scalable methodology to accurately detect and monitor IoT devices at subscriber lines with limited, highly sampled data in-the-wild. Our findings indicate that millions of IoT devices are detectable and identifiable within hours, both at a major ISP as well as an IXP, using passive, sparsely sampled network flow headers. Our methodology is able to detect devices from more than 77% of the studied IoT manufacturers, including popular devices such as smart speakers. While our methodology is effective for providing network analytics, it also highlights significant privacy consequences.

Household IoT devices are gaining popularity, so how easy is it to detect them at scale? In our upcoming IMC'20 paper we show that it is easy to infer hundreds of thousands of IoT devices in the wild, only using highly sampled network flows, without DPI. https://t.co/zhR5K2Abc8 pic.twitter.com/hmgsZLs1ze
— Georgios Smaragdakis (@GSmaragdakis) September 7, 2020

5. SketchPatch: Sketch Stylization via Seamless Patch-level Synthesis

Noa Fish, Lilach Perry, Amit Bermano, Daniel Cohen-Or

retweets: 16, favorites: 53 (09/08/2020 08:08:18)
links: abs | pdf
cs.GR | cs.CV | cs.LG

The paradigm of image-to-image translation is leveraged for the benefit of sketch stylization via transfer of geometric textural details. Lacking the necessary volumes of data for standard training of translation systems, we advocate for operation at the patch level, where a handful of stylized sketches provide ample mining potential for patches featuring basic geometric primitives. Operating at the patch level necessitates special consideration of full sketch translation, as individual translation of patches with no regard to neighbors is likely to produce visible seams and artifacts at patch borders. Aligned pairs of styled and plain primitives are combined to form input hybrids containing styled elements around the border and plain elements within, and given as input to a seamless translation (ST) generator, whose output patches are expected to reconstruct the fully styled patch. An adversarial addition promotes generalization and robustness to diverse geometries at inference time, forming a simple and effective system for arbitrary sketch stylization, as demonstrated upon a variety of styles and sketches.

SketchPatch: Sketch Stylization via Seamless Patch-level Synthesis
pdf: https://t.co/bl6VmwGyIW
abs: https://t.co/B0uvNKcNb8 pic.twitter.com/nlM5sj07ti
— AK (@ak92501) September 7, 2020

6. A Comprehensive Analysis of Information Leakage in Deep Transfer Learning

Cen Chen, Bingzhe Wu, Minghui Qiu, Li Wang, Jun Zhou

retweets: 15, favorites: 41 (09/08/2020 08:08:18)
links: abs | pdf
cs.CL | cs.LG

Transfer learning is widely used for transferring knowledge from a source domain to the target domain where the labeled data is scarce. Recently, deep transfer learning has achieved remarkable progress in various applications. However, the source and target datasets usually belong to two different organizations in many real-world scenarios, potential privacy issues in deep transfer learning are posed. In this study, to thoroughly analyze the potential privacy leakage in deep transfer learning, we first divide previous methods into three categories. Based on that, we demonstrate specific threats that lead to unintentional privacy leakage in each category. Additionally, we also provide some solutions to prevent these threats. To the best of our knowledge, our study is the first to provide a thorough analysis of the information leakage issues in deep transfer learning methods and provide potential solutions to the issue. Extensive experiments on two public datasets and an industry dataset are conducted to show the privacy leakage under different deep transfer learning settings and defense solution effectiveness.

A comprehensive analysis of the potential privacy leakage in deep transfer learning.

If you are deploying deep learning models in production this paper provides a summary of common threats to be aware of and how to potentially prevent them.https://t.co/7mkVG1xNLP pic.twitter.com/WjPzWDL0Ef
— elvis (@omarsar0) September 7, 2020

Published 8 Sep 2020

ML Lead at Beatrust. (https://beatrust.com)Tatsuya Shirakawa on Twitter