1. Knowledge Efficient Deep Learning for Natural Language Processing
Hai Wang
Deep learning has become the workhorse for a wide range of natural language processing applications. But much of the success of deep learning relies on annotated examples. Annotation is time-consuming and expensive to produce at scale. Here we are interested in methods for reducing the required quantity of annotated data — by making the learning methods more knowledge efficient so as to make them more applicable in low annotation (low resource) settings. There are various classical approaches to making the models more knowledge efficient such as multi-task learning, transfer learning, weakly supervised and unsupervised learning etc. This thesis focuses on adapting such classical methods to modern deep learning models and algorithms. This thesis describes four works aimed at making machine learning models more knowledge efficient. First, we propose a knowledge rich deep learning model (KRDL) as a unifying learning framework for incorporating prior knowledge into deep models. In particular, we apply KRDL built on Markov logic networks to denoise weak supervision. Second, we apply a KRDL model to assist the machine reading models to find the correct evidence sentences that can support their decision. Third, we investigate the knowledge transfer techniques in multilingual setting, where we proposed a method that can improve pre-trained multilingual BERT based on the bilingual dictionary. Fourth, we present an episodic memory network for language modelling, in which we encode the large external knowledge for the pre-trained GPT.
Ph.D. thesis discussing ways on how to develop knowledge efficient deep learning-based methods for NLP and making them more applicable in low resource settings.
— elvis (@omarsar0) September 1, 2020
by Hai Wanghttps://t.co/AYBw4V9DPm pic.twitter.com/g06pPvADxV
Knowledge Efficient Deep Learning for Natural Language Processing https://t.co/eV1nNNOA2D
— arXiv CS-CL (@arxiv_cscl) September 1, 2020
2. Off-Path TCP Exploits of the Mixed IPID Assignment
Xuewei Feng, Chuanpu Fu, Qi Li, Kun Sun, Ke Xu
In this paper, we uncover a new off-path TCP hijacking attack that can be used to terminate victim TCP connections or inject forged data into victim TCP connections by manipulating the new mixed IPID assignment method, which is widely used in Linux kernel version 4.18 and beyond to help defend against TCP hijacking attacks. The attack has three steps. First, an off-path attacker can downgrade the IPID assignment for TCP packets from the more secure per-socket-based policy to the less secure hash-based policy, building a shared IPID counter that forms a side channel on the victim. Second, the attacker detects the presence of TCP connections by observing the shared IPID counter on the victim. Third, the attacker infers the sequence number and the acknowledgment number of the detected connection by observing the side channel of the shared IPID counter. Consequently, the attacker can completely hijack the connection, i.e., resetting the connection or poisoning the data stream. We evaluate the impacts of this off-path TCP attack in the real world. Our case studies of SSH DoS, manipulating web traffic, and poisoning BGP routing tables show its threat on a wide range of applications. Our experimental results show that our off-path TCP attack can be constructed within 215 seconds and the success rate is over 88%. Finally, we analyze the root cause of the exploit and develop a new IPID assignment method to defeat this attack. We prototype our defense in Linux 4.18 and confirm its effectiveness through extensive evaluation over real applications on the Internet.
"a new off-path TCP hijacking attack that can be used to terminate victim TCP connections or inject forged data into victim TCP connections " The paper is really interesting and they implemented a new IPID assignment proposal for Linux. https://t.co/W5aYYbE1hp pic.twitter.com/Go36x2He5H
— Alexandre Dulaunoy (@adulau) September 1, 2020
3. Beyond variance reduction: Understanding the true impact of baselines on policy optimization
Wesley Chung, Valentin Thomas, Marlos C. Machado, Nicolas Le Roux
Policy gradients methods are a popular and effective choice to train reinforcement learning agents in complex environments. The variance of the stochastic policy gradient is often seen as a key quantity to determine the effectiveness of the algorithm. Baselines are a common addition to reduce the variance of the gradient, but previous works have hardly ever considered other effects baselines may have on the optimization process. Using simple examples, we find that baselines modify the optimization dynamics even when the variance is the same. In certain cases, a baseline with lower variance may even be worse than another with higher variance. Furthermore, we find that the choice of baseline can affect the convergence of natural policy gradient, where certain baselines may lead to convergence to a suboptimal policy for any stepsize. Such behaviour emerges when sampling is constrained to be done using the current policy and we show how decoupling the sampling policy from the current policy guarantees convergence for a much wider range of baselines. More broadly, this work suggests that a more careful treatment of stochasticity in the updates---beyond the immediate variance---is necessary to understand the optimization process of policy gradient algorithms.
[1/6] Our new preprint is now available on arXiv. We revisit baselines in policy gradient methods and show that they have a much bigger role than simply variance reduction! With
— Marlos C. Machado (@MarlosCMachado) September 1, 2020
Wesley Chung, Valentin Thomas, and @le_roux_nicolas.https://t.co/4lvyyHXSyB pic.twitter.com/pFPrFySgAy
4. Efficient Computation of Expectations under Spanning Tree Distributions
Ran Zmigrod, Tim Vieira, Ryan Cotterell
We give a general framework for inference in spanning tree models. We propose unified algorithms for the important cases of first-order expectations and second-order expectations in edge-factored, non-projective spanning-tree models. Our algorithms exploit a fundamental connection between gradients and expectations, which allows us to derive efficient algorithms. These algorithms are easy to implement, given the prevalence of automatic differentiation software. We motivate the development of our framework with several cautionary tales of previous re-search, which has developed numerous less-than-optimal algorithms for computing expectations and their gradients. We demonstrate how our framework efficiently computes several quantities with known algorithms, including the expected attachment score, entropy, and generalized expectation criteria. As a bonus, we give algorithms for quantities that are missing in the literature, including the KL divergence. In all cases, our approach matches the efficiency of existing algorithms and, in several cases, reducesthe runtime complexity by a factor (or two)of the sentence length. We validate the implementation of our framework through runtime experiments. We find our algorithms are upto and times faster than previous algorithms for computing the Shannon entropy and the gradient of the generalized expectation objective, respectively.
Check out our new TACL paper: Efficient Computation of Expectations under Spanning Tree Distributions [https://t.co/QWizOyQxpZ]:
— Ran Zmigrod (@RanZmigrod) September 1, 2020
🌲We save a factor of O(n) over existing algorithms
✨Explain the backprop ⇔ expectation connection
🔥Open source PyTorch implementation coming soon
5. Dual Attention GANs for Semantic Image Synthesis
Hao Tang, Song Bai, Nicu Sebe
In this paper, we focus on the semantic image synthesis task that aims at transferring semantic label maps to photo-realistic images. Existing methods lack effective semantic constraints to preserve the semantic information and ignore the structural correlations in both spatial and channel dimensions, leading to unsatisfactory blurry and artifact-prone results. To address these limitations, we propose a novel Dual Attention GAN (DAGAN) to synthesize photo-realistic and semantically-consistent images with fine details from the input layouts without imposing extra training overhead or modifying the network architectures of existing methods. We also propose two novel modules, i.e., position-wise Spatial Attention Module (SAM) and scale-wise Channel Attention Module (CAM), to capture semantic structure attention in spatial and channel dimensions, respectively. Specifically, SAM selectively correlates the pixels at each position by a spatial attention map, leading to pixels with the same semantic label being related to each other regardless of their spatial distances. Meanwhile, CAM selectively emphasizes the scale-wise features at each channel by a channel attention map, which integrates associated features among all channel maps regardless of their scales. We finally sum the outputs of SAM and CAM to further improve feature representation. Extensive experiments on four challenging datasets show that DAGAN achieves remarkably better results than state-of-the-art methods, while using fewer model parameters. The source code and trained models are available at https://github.com/Ha0Tang/DAGAN.
Dual Attention GANs for Semantic Image Synthesis
— AK (@ak92501) September 1, 2020
pdf: https://t.co/bB7Z9I5iEc
abs: https://t.co/u7VtCXV9wI
github: https://t.co/Wao9W0tc1b pic.twitter.com/pwpsIO7yQQ
6. DeepFacePencil: Creating Face Images from Freehand Sketches
Yuhang Li, Xuejin Chen, Binxin Yang, Zihan Chen, Zhihua Cheng, Zheng-Jun Zha
In this paper, we explore the task of generating photo-realistic face images from hand-drawn sketches. Existing image-to-image translation methods require a large-scale dataset of paired sketches and images for supervision. They typically utilize synthesized edge maps of face images as training data. However, these synthesized edge maps strictly align with the edges of the corresponding face images, which limit their generalization ability to real hand-drawn sketches with vast stroke diversity. To address this problem, we propose DeepFacePencil, an effective tool that is able to generate photo-realistic face images from hand-drawn sketches, based on a novel dual generator image translation network during training. A novel spatial attention pooling (SAP) is designed to adaptively handle stroke distortions which are spatially varying to support various stroke styles and different levels of details. We conduct extensive experiments and the results demonstrate the superiority of our model over existing methods on both image quality and model generalization to hand-drawn sketches.
DeepFacePencil: Creating Face Images from Freehand Sketches
— AK (@ak92501) September 1, 2020
pdf: https://t.co/yk1NTCFoHT
abs: https://t.co/9esHnFMZ0l pic.twitter.com/pcChZIEC99