1. A bounded-noise mechanism for differential privacy
Yuval Dagan, Gil Kur
Answering multiple counting queries is one of the best-studied problems in differential privacy. Its goal is to output an approximation of the average of vectors , while preserving the privacy with respect to any . We present an -private mechanism with optimal error for most values of . This result settles the conjecture of Steinke and Ullman [2020] for the these values of . Our algorithm adds independent noise of bounded magnitude to each of the coordinates, while prior solutions relied on unbounded noise such as the Laplace and Gaussian mechanisms.
OH MY GOD!
— Thomas Steinke (@shortstein) December 8, 2020
It has been solved.
🎉🎉🎉🎉🎉🎉🎉https://t.co/HJmA70ykxF pic.twitter.com/cd9xkydd3V
Amazing! Preprint by Yuval Dagan (@YuvalDagan3) & Gil Kur (@GilKur1) solves an open problem of Steinke (@shortstein) & Ullman (@thejonullman)! They shaved the last sqrt(log log log k) factor for answering k queries, winning themselves a sushi dinner! 🍣🍣🍣https://t.co/9edkfWrbJb pic.twitter.com/euzFGlMTae
— Gautam Kamath ✈️ NeurIPS 2020 (@thegautamkamath) December 8, 2020
2. Selective Inference for Hierarchical Clustering
Lucy L. Gao, Jacob Bien, Daniela Witten
Testing for a difference in means between two groups is fundamental to answering research questions across virtually every scientific area. Classical tests control the Type I error rate when the groups are defined a priori. However, when the groups are instead defined via a clustering algorithm, then applying a classical test for a difference in means between the groups yields an extremely inflated Type I error rate. Notably, this problem persists even if two separate and independent data sets are used to define the groups and to test for a difference in their means. To address this problem, in this paper, we propose a selective inference approach to test for a difference in means between two clusters obtained from any clustering method. Our procedure controls the selective Type I error rate by accounting for the fact that the null hypothesis was generated from the data. We describe how to efficiently compute exact p-values for clusters obtained using agglomerative hierarchical clustering with many commonly used linkages. We apply our method to simulated data and to single-cell RNA-seq data.
Our preprint on fixing double-dipping in the clustering setting is now on arxiv! https://t.co/Vrnpii0X13 Joint work with Jacob Bien (USC) and @daniela_witten 1/2 https://t.co/7jGow89RDu
— Lucy L. Gao (@lucylgao) December 8, 2020
3. MPG: A Multi-ingredient Pizza Image Generator with Conditional StyleGANs
Fangda Han, Guoyao Hao, Ricardo Guerrero, Vladimir Pavlovic
Multilabel conditional image generation is a challenging problem in computer vision. In this work we propose Multi-ingredient Pizza Generator (MPG), a conditional Generative Neural Network (GAN) framework for synthesizing multilabel images. We design MPG based on a state-of-the-art GAN structure called StyleGAN2, in which we develop a new conditioning technique by enforcing intermediate feature maps to learn scalewise label information. Because of the complex nature of the multilabel image generation problem, we also regularize synthetic image by predicting the corresponding ingredients as well as encourage the discriminator to distinguish between matched image and mismatched image. To verify the efficacy of MPG, we test it on Pizza10, which is a carefully annotated multi-ingredient pizza image dataset. MPG can successfully generate photo-realist pizza images with desired ingredients. The framework can be easily extend to other multilabel image generation scenarios.
Love Figure 8. of https://t.co/sJc3fcqHQe
— hardmaru (@hardmaru) December 8, 2020
“Images generated from different combinations of ingredient list and style noise. Images in the same row are generated with identical style noise.” https://t.co/Svx46eNmas pic.twitter.com/AnOjtuD3Sx
MPG: A Multi-ingredient Pizza Image Generator with Conditional StyleGANs. https://t.co/KbxsKVDxjJ pic.twitter.com/ilvONRS3MR
— arxiv (@arxiv_org) December 8, 2020
MPG: A Multi-ingredient Pizza Image Generator with Conditional StyleGANs
— AK (@ak92501) December 8, 2020
pdf: https://t.co/2Hs0Xth3ug
abs: https://t.co/vkuUsASiOx pic.twitter.com/wwKtnleHn2
4. MFST: A Python OpenFST Wrapper With Support for Custom Semirings and Jupyter Notebooks
Matthew Francis-Landau
This paper introduces mFST, a new Python library for working with Finite-State Machines based on OpenFST. mFST is a thin wrapper for OpenFST and exposes all of OpenFST’s methods for manipulating FSTs. Additionally, mFST is the only Python wrapper for OpenFST that exposes OpenFST’s ability to define a custom semirings. This makes mFST ideal for developing models that involve learning the weights on a FST or creating neuralized FSTs. mFST has been designed to be easy to get started with and has been previously used in homework assignments for a NLP class as well in projects for integrating FSTs and neural networks. In this paper, we exhibit mFST API and how to use mFST to build a simple neuralized FST with PyTorch.
New paper on arXiv about mFST, a Python library for working with Finite-State Machines with Custom Semirings
— Matthew FL (@matthewfl) December 8, 2020
Paper: https://t.co/aBbHZvyx5R
Code: https://t.co/XBuKV5OBwZ
In the paper, I demonstate how quickly get started with FSTs and how one could mix PyTorch+FSTs pic.twitter.com/qIUx7diGlp
5. Dynamic Neural Radiance Fields for Monocular 4D Facial Avatar Reconstruction
Guy Gafni, Justus Thies, Michael Zollhöfer, Matthias Nießner
We present dynamic neural radiance fields for modeling the appearance and dynamics of a human face. Digitally modeling and reconstructing a talking human is a key building-block for a variety of applications. Especially, for telepresence applications in AR or VR, a faithful reproduction of the appearance including novel viewpoints or head-poses is required. In contrast to state-of-the-art approaches that model the geometry and material properties explicitly, or are purely image-based, we introduce an implicit representation of the head based on scene representation networks. To handle the dynamics of the face, we combine our scene representation network with a low-dimensional morphable model which provides explicit control over pose and expressions. We use volumetric rendering to generate images from this hybrid representation and demonstrate that such a dynamic neural scene representation can be learned from monocular input data only, without the need of a specialized capture setup. In our experiments, we show that this learned volumetric representation allows for photo-realistic image generation that surpasses the quality of state-of-the-art video-based reenactment methods.
Dynamic Neural Radiance Fields for Monocular 4D Facial Avatar Reconstruction
— AK (@ak92501) December 8, 2020
pdf: https://t.co/o70GUFyOWf
abs: https://t.co/lMIqim3DNA
project page: https://t.co/SkKhrqWW7E pic.twitter.com/4e97uOr0kj
Check out our Dynamic Neural Radiance Fields approach for 4D Facial Avatars
— Justus Thies (@JustusThies) December 8, 2020
Video: https://t.co/oU7VsBzhSE
Paper: https://t.co/gVWak521kR
Project Page: https://t.co/OzP4kOCCR3
Kudos to @GafniGuy #NerFACE pic.twitter.com/lJ9eh7LQDX
6. Machine learning for public policy: Do we need to sacrifice accuracy to make models fair?
Kit T. Rodolfa, Hemank Lamba, Rayid Ghani
Growing applications of machine learning in policy settings have raised concern for fairness implications, especially for racial minorities, but little work has studied the practical trade-offs between fairness and accuracy in real-world settings. This empirical study fills this gap by investigating the accuracy cost of mitigating disparities across several policy settings, focusing on the common context of using machine learning to inform benefit allocation in resource-constrained programs across education, mental health, criminal justice, and housing safety. In each setting, explicitly focusing on achieving equity and using our proposed post-hoc disparity mitigation methods, fairness was substantially improved without sacrificing accuracy, challenging the commonly held assumption that reducing disparities either requires accepting an appreciable drop in accuracy or the development of novel, complex methods.
There is not always a tradeoff between "accuracy" and "fairness" in ML/AI. Our recent empirical work investigating the accuracy cost of mitigating disparities across policy problems - we find that we can achieve fairness without sacrificing accuracy https://t.co/K739cB094M
— Rayid Ghani (@rayidghani) December 8, 2020
7. Parallel Training of Deep Networks with Local Updates
Michael Laskin, Luke Metz, Seth Nabarrao, Mark Saroufim, Badreddine Noune, Carlo Luschi, Jascha Sohl-Dickstein, Pieter Abbeel
Deep learning models trained on large data sets have been widely successful in both vision and language domains. As state-of-the-art deep learning architectures have continued to grow in parameter count so have the compute budgets and times required to train them, increasing the need for compute-efficient methods that parallelize training. Two common approaches to parallelize the training of deep networks have been data and model parallelism. While useful, data and model parallelism suffer from diminishing returns in terms of compute efficiency for large batch sizes. In this paper, we investigate how to continue scaling compute efficiently beyond the point of diminishing returns for large batches through local parallelism, a framework which parallelizes training of individual layers in deep networks by replacing global backpropagation with truncated layer-wise backpropagation. Local parallelism enables fully asynchronous layer-wise parallelism with a low memory footprint, and requires little communication overhead compared with model parallelism. We show results in both vision and language domains across a diverse set of architectures, and find that local parallelism is particularly effective in the high-compute regime.
Excited to share a paper on local updates as an alternative to global backprop, co-led with @Luke_Metz + @graphcoreai @GoogleAI & @berkeley_ai.
— Michael (Misha) Laskin (@MishaLaskin) December 8, 2020
tl;dr - Local updates can improve the efficiency of training deep nets in the high-compute regime.
👉 https://t.co/viEiwBx6Wf
1/N
I'm very excited to finally share our work on Training Deep Networks with Local Updates
— Mark Saroufim (@marksaroufim) December 8, 2020
Model Parallelism suffers from high communication costs and poor utilization
Data and Pipeline Parallelism introduce a tradeoff between consistency and utilizationhttps://t.co/WurG9vQD6g pic.twitter.com/U5H3Uc7SzD
8. NeRV: Neural Reflectance and Visibility Fields for Relighting and View Synthesis
Pratul P. Srinivasan, Boyang Deng, Xiuming Zhang, Matthew Tancik, Ben Mildenhall, Jonathan T. Barron
We present a method that takes as input a set of images of a scene illuminated by unconstrained known lighting, and produces as output a 3D representation that can be rendered from novel viewpoints under arbitrary lighting conditions. Our method represents the scene as a continuous volumetric function parameterized as MLPs whose inputs are a 3D location and whose outputs are the following scene properties at that input location: volume density, surface normal, material parameters, distance to the first surface intersection in any direction, and visibility of the external environment in any direction. Together, these allow us to render novel views of the object under arbitrary lighting, including indirect illumination effects. The predicted visibility and surface intersection fields are critical to our model’s ability to simulate direct and indirect illumination during training, because the brute-force techniques used by prior work are intractable for lighting conditions outside of controlled setups with a single light. Our method outperforms alternative approaches for recovering relightable 3D scene representations, and performs well in complex lighting settings that have posed a significant challenge to prior work.
NeRV: Neural Reflectance and Visibility Fields for Relighting and View Synthesis
— AK (@ak92501) December 8, 2020
pdf: https://t.co/vAIcTSDXvd
abs: https://t.co/z5xZ0cgZeT
project page: https://t.co/SO4W7P01oX pic.twitter.com/3v7IHNxXgB
9. Deep Learning for Human Mobility: a Survey on Data and Models
Massimiliano Luca, Gianni Barlacchi, Bruno Lepri, Luca Pappalardo
The study of human mobility is crucial due to its impact on several aspects of our society, such as disease spreading, urban planning, well-being, pollution, and more. The proliferation of digital mobility data, such as phone records, GPS traces, and social media posts, combined with the outstanding predictive power of artificial intelligence, triggered the application of deep learning to human mobility. In particular, the literature is focusing on three tasks: next-location prediction, i.e., predicting an individual’s future locations; crowd flow prediction, i.e., forecasting flows on a geographic region; and trajectory generation, i.e., generating realistic individual trajectories. Existing surveys focus on single tasks, data sources, mechanistic or traditional machine learning approaches, while a comprehensive description of deep learning solutions is missing. This survey provides: (i) basic notions on mobility and deep learning; (ii) a review of data sources and public datasets; (iii) a description of deep learning models and (iv) a discussion about relevant open challenges. Our survey is a guide to the leading deep learning solutions to next-location prediction, crowd flow prediction, and trajectory generation. At the same time, it helps deep learning scientists and practitioners understand the fundamental concepts and the open challenges of the study of human mobility.
📢🔥 The latest survey on #DeepLearning approaches to
— Luca Pappalardo (@lucpappalard) December 8, 2020
1⃣Next-Location prediction2⃣Crowd-Flow prediction and3⃣Trajectory Generation! (+ list of open datasets)
Arxiv 👉 https://t.co/q4wCmjxytJ
GitHub 👉 https://t.co/NJUf9WuHmT
with @luca_msl @GianniBarlacchi @brulepri pic.twitter.com/Uk31F85d18
10. Perspectives on Sim2Real Transfer for Robotics: A Summary of the R:SS 2020 Workshop
Sebastian Höfer, Kostas Bekris, Ankur Handa, Juan Camilo Gamboa, Florian Golemo, Melissa Mozifian, Chris Atkeson, Dieter Fox, Ken Goldberg, John Leonard, C. Karen Liu, Jan Peters, Shuran Song, Peter Welinder, Martha White
This report presents the debates, posters, and discussions of the Sim2Real workshop held in conjunction with the 2020 edition of the “Robotics: Science and System” conference. Twelve leaders of the field took competing debate positions on the definition, viability, and importance of transferring skills from simulation to the real world in the context of robotics problems. The debaters also joined a large panel discussion, answering audience questions and outlining the future of Sim2Real in robotics. Furthermore, we invited extended abstracts to this workshop which are summarized in this report. Based on the workshop, this report concludes with directions for practitioners exploiting this technology and for researchers further exploring open problems in this area.
Perspectives on Sim2Real Transfer for Robotics: A Summary of the R:SS 2020 Workshophttps://t.co/myhkkNWLzE pic.twitter.com/DvWGvir7c1
— sim2real (@sim2realAIorg) December 8, 2020
Perspectives on Sim2Real Transfer for Robotics: A Summary of the R:SS 2020 Workshop https://t.co/EubRlfLRZ6 #robotics pic.twitter.com/EBDIoPWrym
— Tomasz Malisiewicz (@quantombone) December 8, 2020
11. Multi-Instrumentalist Net: Unsupervised Generation of Music from Body Movements
Kun Su, Xiulong Liu, Eli Shlizerman
We propose a novel system that takes as an input body movements of a musician playing a musical instrument and generates music in an unsupervised setting. Learning to generate multi-instrumental music from videos without labeling the instruments is a challenging problem. To achieve the transformation, we built a pipeline named ‘Multi-instrumentalistNet’ (MI Net). At its base, the pipeline learns a discrete latent representation of various instruments music from log-spectrogram using a Vector Quantized Variational Autoencoder (VQ-VAE) with multi-band residual blocks. The pipeline is then trained along with an autoregressive prior conditioned on the musician’s body keypoints movements encoded by a recurrent neural network. Joint training of the prior with the body movements encoder succeeds in the disentanglement of the music into latent features indicating the musical components and the instrumental features. The latent space results in distributions that are clustered into distinct instruments from which new music can be generated. Furthermore, the VQ-VAE architecture supports detailed music generation with additional conditioning. We show that a Midi can further condition the latent space such that the pipeline will generate the exact content of the music being played by the instrument in the video. We evaluate MI Net on two datasets containing videos of 13 instruments and obtain generated music of reasonable audio quality, easily associated with the corresponding instrument, and consistent with the music audio content.
Multi-Instrumentalist Net: Unsupervised Generation of Music from Body Movements
— AK (@ak92501) December 8, 2020
pdf: https://t.co/EDKPysqtO0
abs: https://t.co/AoE4p4rG70 pic.twitter.com/WjvlNcr681
12. Spatially-Adaptive Pixelwise Networks for Fast Image Translation
Tamar Rott Shaham, Michael Gharbi, Richard Zhang, Eli Shechtman, Tomer Michaeli
We introduce a new generator architecture, aimed at fast and efficient high-resolution image-to-image translation. We design the generator to be an extremely lightweight function of the full-resolution image. In fact, we use pixel-wise networks; that is, each pixel is processed independently of others, through a composition of simple affine transformations and nonlinearities. We take three important steps to equip such a seemingly simple function with adequate expressivity. First, the parameters of the pixel-wise networks are spatially varying so they can represent a broader function class than simple 1x1 convolutions. Second, these parameters are predicted by a fast convolutional network that processes an aggressively low-resolution representation of the input; Third, we augment the input image with a sinusoidal encoding of spatial coordinates, which provides an effective inductive bias for generating realistic novel high-frequency image content. As a result, our model is up to 18x faster than state-of-the-art baselines. We achieve this speedup while generating comparable visual quality across different image resolutions and translation domains.
Spatially-Adaptive Pixelwise Networks for Fast Image Translation
— AK (@ak92501) December 8, 2020
pdf: https://t.co/PSU0oKUIph
abs: https://t.co/oJJizjm26y
project page: https://t.co/5AgvTv28n7 pic.twitter.com/d5Cji0JNxj
13. EfficientTTS: An Efficient and High-Quality Text-to-Speech Architecture
Chenfeng Miao, Shuang Liang, Zhencheng Liu, Minchuan Chen, Jun Ma, Shaojun Wang, Jing Xiao
In this work, we address the Text-to-Speech (TTS) task by proposing a non-autoregressive architecture called EfficientTTS. Unlike the dominant non-autoregressive TTS models, which are trained with the need of external aligners, EfficientTTS optimizes all its parameters with a stable, end-to-end training procedure, while allowing for synthesizing high quality speech in a fast and efficient manner. EfficientTTS is motivated by a new monotonic alignment modeling approach (also introduced in this work), which specifies monotonic constraints to the sequence alignment with almost no increase of computation. By combining EfficientTTS with different feed-forward network structures, we develop a family of TTS models, including both text-to-melspectrogram and text-to-waveform networks. We experimentally show that the proposed models significantly outperform counterpart models such as Tacotron 2 and Glow-TTS in terms of speech quality, training efficiency and synthesis speed, while still producing the speeches of strong robustness and great diversity. In addition, we demonstrate that proposed approach can be easily extended to autoregressive models such as Tacotron 2.
EfficientTTS: An Efficient and High-Quality Text-to-Speech Architecture
— AK (@ak92501) December 8, 2020
pdf: https://t.co/HadDs7Bv8Y
abs: https://t.co/dgKUFLzNvX
project page: https://t.co/59vF5oygIh pic.twitter.com/vUpG3zDr0K
14. Grammar-Aware Question-Answering on Quantum Computers
Konstantinos Meichanetzidis, Alexis Toumi, Giovanni de Felice, Bob Coecke
Natural language processing (NLP) is at the forefront of great advances in contemporary AI, and it is arguably one of the most challenging areas of the field. At the same time, with the steady growth of quantum hardware and notable improvements towards implementations of quantum algorithms, we are approaching an era when quantum computers perform tasks that cannot be done on classical computers with a reasonable amount of resources. This provides a new range of opportunities for AI, and for NLP specifically. Earlier work has already demonstrated a potential quantum advantage for NLP in a number of manners: (i) algorithmic speedups for search-related or classification tasks, which are the most dominant tasks within NLP, (ii) exponentially large quantum state spaces allow for accommodating complex linguistic structures, (iii) novel models of meaning employing density matrices naturally model linguistic phenomena such as hyponymy and linguistic ambiguity, among others. In this work, we perform the first implementation of an NLP task on noisy intermediate-scale quantum (NISQ) hardware. Sentences are instantiated as parameterised quantum circuits. We encode word-meanings in quantum states and we explicitly account for grammatical structure, which even in mainstream NLP is not commonplace, by faithfully hard-wiring it as entangling operations. This makes our approach to quantum natural language processing (QNLP) particularly NISQ-friendly. Our novel QNLP model shows concrete promise for scalability as the quality of the quantum hardware improves in the near future.
Yay, our latest paper on QNLP experiments is out on the arXiv! We tried to explain Shakespeare's Romeo and Juliet to a small and noisy quantum computer, and it did get some of it: "Romeo who loves Juliet dies" @coecke @konstantinosmei @gio_defel https://t.co/sjW0RH2sQ1 pic.twitter.com/rjfingbnvh
— alexis.toumi (@AlexisToumi) December 8, 2020
15. Sample-efficient proper PAC learning with approximate differential privacy
Badih Ghazi, Noah Golowich, Ravi Kumar, Pasin Manurangsi
In this paper we prove that the sample complexity of properly learning a class of Littlestone dimension with approximate differential privacy is , ignoring privacy and accuracy parameters. This result answers a question of Bun et al. (FOCS 2020) by improving upon their upper bound of on the sample complexity. Prior to our work, finiteness of the sample complexity for privately learning a class of finite Littlestone dimension was only known for improper private learners, and the fact that our learner is proper answers another question of Bun et al., which was also asked by Bousquet et al. (NeurIPS 2020). Using machinery developed by Bousquet et al., we then show that the sample complexity of sanitizing a binary hypothesis class is at most polynomial in its Littlestone dimension and dual Littlestone dimension. This implies that a class is sanitizable if and only if it has finite Littlestone dimension. An important ingredient of our proofs is a new property of binary hypothesis classes that we call irreducibility, which may be of independent interest.
16. iGibson, a Simulation Environment for Interactive Tasks in Large Realistic Scenes
Bokui Shen, Fei Xia, Chengshu Li, Roberto Martín-Martín, Linxi Fan, Guanzhi Wang, Shyamal Buch, Claudia D’Arpino, Sanjana Srivastava, Lyne P. Tchapmi, Micael E. Tchapmi, Kent Vainio, Li Fei-Fei, Silvio Savarese
We present iGibson, a novel simulation environment to develop robotic solutions for interactive tasks in large-scale realistic scenes. Our environment contains fifteen fully interactive home-sized scenes populated with rigid and articulated objects. The scenes are replicas of 3D scanned real-world homes, aligning the distribution of objects and layout to that of the real world. iGibson integrates several key features to facilitate the study of interactive tasks: i) generation of high-quality visual virtual sensor signals (RGB, depth, segmentation, LiDAR, flow, among others), ii) domain randomization to change the materials of the objects (both visual texture and dynamics) and/or their shapes, iii) integrated sampling-based motion planners to generate collision-free trajectories for robot bases and arms, and iv) intuitive human-iGibson interface that enables efficient collection of human demonstrations. Through experiments, we show that the full interactivity of the scenes enables agents to learn useful visual representations that accelerate the training of downstream manipulation tasks. We also show that iGibson features enable the generalization of navigation agents, and that the human-iGibson interface and integrated motion planners facilitate efficient imitation learning of simple human demonstrated behaviors. iGibson is open-sourced with comprehensive examples and documentation. For more information, visit our project website: http://svl.stanford.edu/igibson/
iGibson, a Simulation Environment for Interactive Tasks in Large RealisticScenes
— AK (@ak92501) December 8, 2020
pdf: https://t.co/y2dcMTkRym
abs: https://t.co/zHkZDkB2qt
project page: https://t.co/lizGdNZyFp pic.twitter.com/o6UiE2HSDy
17. MemPool: A Shared-L1 Memory Many-Core Cluster with a Low-Latency Interconnect
Matheus Cavalcante, Samuel Riedel, Antonio Pullini, Luca Benini
A key challenge in scaling shared-L1 multi-core clusters towards many-core (more than 16 cores) configurations is to ensure low-latency and efficient access to the L1 memory. In this work we demonstrate that it is possible to scale up the shared-L1 architecture: We present MemPool, a 32 bit many-core system with 256 fast RV32IMA “Snitch” cores featuring application-tunable execution units, running at 700 MHz in typical conditions (TT/0.80 V/25{\deg}C). MemPool is easy to program, with all the cores sharing a global view of a large L1 scratchpad memory pool, accessible within at most 5 cycles. In MemPool’s physical-aware design, we emphasized the exploration, design, and optimization of the low-latency processor-to-L1-memory interconnect. We compare three candidate topologies, analyzing them in terms of latency, throughput, and back-end feasibility. The chosen topology keeps the average latency at fewer than 6 cycles, even for a heavy injected load of 0.33 request/core/cycle. We also propose a lightweight addressing scheme that maps each core private data to a memory bank accessible within one cycle, which leads to performance gains of up to 20% in real-world signal processing benchmarks. The addressing scheme is also highly efficient in terms of energy consumption since requests to local banks consume only half of the energy required to access remote banks. Our design achieves competitive performance with respect to an ideal, non-implementable full-crossbar baseline.
Our latest large-scale @pulp_platform embodiment is out (to appear at DATE21). Mempool is a giant cluster with 256 snitch cores clocked at 700MHz and with max zero-load latency to L1 memory of 5cycles. Quite a lot of new stuff https://t.co/03H8sg45af
— Luca Benini (@LucaBeniniZhFe) December 8, 2020
18. Data Boost: Text Data Augmentation Through Reinforcement Learning Guided Conditional Generation
Ruibo Liu, Guangxuan Xu, Chenyan Jia, Weicheng Ma, Lili Wang, Soroush Vosoughi
Data augmentation is proven to be effective in many NLU tasks, especially for those suffering from data scarcity. In this paper, we present a powerful and easy to deploy text augmentation framework, Data Boost, which augments data through reinforcement learning guided conditional generation. We evaluate Data Boost on three diverse text classification tasks under five different classifier architectures. The result shows that Data Boost can boost the performance of classifiers especially in low-resource data scenarios. For instance, Data Boost improves F1 for the three tasks by 8.7% on average when given only 10% of the whole data for training. We also compare Data Boost with six prior text augmentation methods. Through human evaluations (N=178), we confirm that Data Boost augmentation has comparable quality as the original data with respect to readability and class consistency.
19. Rethinking FUN: Frequency-Domain Utilization Networks
Kfir Goldberg, Stav Shapiro, Elad Richardson, Shai Avidan
The search for efficient neural network architectures has gained much focus in recent years, where modern architectures focus not only on accuracy but also on inference time and model size. Here, we present FUN, a family of novel Frequency-domain Utilization Networks. These networks utilize the inherent efficiency of the frequency-domain by working directly in that domain, represented with the Discrete Cosine Transform. Using modern techniques and building blocks such as compound-scaling and inverted-residual layers we generate a set of such networks allowing one to balance between size, latency and accuracy while outperforming competing RGB-based models. Extensive evaluations verifies that our networks present strong alternatives to previous approaches. Moreover, we show that working in frequency domain allows for dynamic compression of the input at inference time without any explicit change to the architecture.
Rethinking FUN: Frequency-Domain Utilization Networks
— AK (@ak92501) December 8, 2020
pdf: https://t.co/ultpzPoXrn
abs: https://t.co/8YxaTFXshQ
github: https://t.co/Mca7SeZbyL pic.twitter.com/SVffoh8kiI