1. Convolutional Neural Nets: Foundations, Computations, and New Applications
Shengli Jiang, Victor M. Zavala
We review mathematical foundations of convolutional neural nets (CNNs) with the goals of: i) highlighting connections with techniques from statistics, signal processing, linear algebra, differential equations, and optimization, ii) demystifying underlying computations, and iii) identifying new types of applications. CNNs are powerful machine learning models that highlight features from grid data to make predictions (regression and classification). The grid data object can be represented as vectors (in 1D), matrices (in 2D), or tensors (in 3D or higher dimensions) and can incorporate multiple channels (thus providing high flexibility in the input data representation). For example, an image can be represented as a 2D grid data object that contains red, green, and blue (RBG) channels (each channel is a 2D matrix). Similarly, a video can be represented as a 3D grid data object (two spatial dimensions plus time) with RGB channels (each channel is a 3D tensor). CNNs highlight features from the grid data by performing convolution operations with different types of operators. The operators highlight different types of features (e.g., patterns, gradients, geometrical features) and are learned by using optimization techniques. In other words, CNNs seek to identify optimal operators that best map the input data to the output data. A common misconception is that CNNs are only capable of processing image or video data but their application scope is much wider; specifically, datasets encountered in diverse applications can be expressed as grid data. Here, we show how to apply CNNs to new types of applications such as optimal control, flow cytometry, multivariate process monitoring, and molecular simulations.
🎓 Convolutional Neural Nets: Foundations, Computations, and New Applications
— elvis (@omarsar0) January 14, 2021
Very tidy and accessible report on CNNs. This is something that could be recommended to students as additional reading in an intro to deep learning course.https://t.co/zc1Cw5lxU9 pic.twitter.com/4tsOvYguL2
2. Asymmetric self-play for automatic goal discovery in robotic manipulation
OpenAI OpenAI, Matthias Plappert, Raul Sampedro, Tao Xu, Ilge Akkaya, Vineet Kosaraju, Peter Welinder, Ruben D’Sa, Arthur Petron, Henrique Ponde de Oliveira Pinto, Alex Paino, Hyeonwoo Noh, Lilian Weng, Qiming Yuan, Casey Chu, Wojciech Zaremba
We train a single, goal-conditioned policy that can solve many robotic manipulation tasks, including tasks with previously unseen goals and objects. We rely on asymmetric self-play for goal discovery, where two agents, Alice and Bob, play a game. Alice is asked to propose challenging goals and Bob aims to solve them. We show that this method can discover highly diverse and complex goals without any human priors. Bob can be trained with only sparse rewards, because the interaction between Alice and Bob results in a natural curriculum and Bob can learn from Alice’s trajectory when relabeled as a goal-conditioned demonstration. Finally, our method scales, resulting in a single policy that can generalize to many unseen tasks such as setting a table, stacking blocks, and solving simple puzzles. Videos of a learned policy is available at https://robotics-self-play.github.io.
Our paper on training a single goal-conditioned policy 100% with asymmetric self-play to generalize to many unseen objects and tasks: https://t.co/ZJPlvDTees and more cool videos are available at https://t.co/qEjvN9YLfv (The attached video is zero-shot) pic.twitter.com/1yPQc3ZN9x
— Lilian Weng (@lilianweng) January 14, 2021
Asymmetric self-play for automatic goal discovery in robotic manipulation
— AK (@ak92501) January 14, 2021
pdf: https://t.co/ywPWPXk3KF
abs: https://t.co/iZv3kiYEJ4
project page: https://t.co/5aLAjdTrdB pic.twitter.com/1DyDO1Hkej
3. Robustness Gym: Unifying the NLP Evaluation Landscape
Karan Goel, Nazneen Rajani, Jesse Vig, Samson Tan, Jason Wu, Stephan Zheng, Caiming Xiong, Mohit Bansal, Christopher RĂ©
Despite impressive performance on standard benchmarks, deep neural networks are often brittle when deployed in real-world systems. Consequently, recent research has focused on testing the robustness of such models, resulting in a diverse set of evaluation methodologies ranging from adversarial attacks to rule-based data transformations. In this work, we identify challenges with evaluating NLP systems and propose a solution in the form of Robustness Gym (RG), a simple and extensible evaluation toolkit that unifies 4 standard evaluation paradigms: subpopulations, transformations, evaluation sets, and adversarial attacks. By providing a common platform for evaluation, Robustness Gym enables practitioners to compare results from all 4 evaluation paradigms with just a few clicks, and to easily develop and share novel evaluation methods using a built-in set of abstractions. To validate Robustness Gym’s utility to practitioners, we conducted a real-world case study with a sentiment-modeling team, revealing performance degradations of 18%+. To verify that Robustness Gym can aid novel research analyses, we perform the first study of state-of-the-art commercial and academic named entity linking (NEL) systems, as well as a fine-grained analysis of state-of-the-art summarization models. For NEL, commercial systems struggle to link rare entities and lag their academic counterparts by 10%+, while state-of-the-art summarization models struggle on examples that require abstraction and distillation, degrading by 9%+. Robustness Gym can be found at https://robustnessgym.com/
🚀Excited to release Robustness Gym, a new Python evaluation toolkit for evaluating the robustness of NLP models, as part of a collaboration between Stanford, Salesforce Research and UNC Chapel-Hill.
— Karan Goel (@krandiash) January 14, 2021
Paper: https://t.co/WjcfdE42x1
Code: https://t.co/1V4fAtJvcD
pip install!
4. What Makes a Dark Pattern… Dark? Design Attributes, Normative Considerations, and Measurement Methods
Arunesh Mathur, Jonathan Mayer, Mihir Kshirsagar
There is a rapidly growing literature on dark patterns, user interface designs — typically related to shopping or privacy — that researchers deem problematic. Recent work has been predominantly descriptive, documenting and categorizing objectionable user interfaces. These contributions have been invaluable in highlighting specific designs for researchers and policymakers. But the current literature lacks a conceptual foundation: What makes a user interface a dark pattern? Why are certain designs problematic for users or society? We review recent work on dark patterns and demonstrate that the literature does not reflect a singular concern or consistent definition, but rather, a set of thematically related considerations. Drawing from scholarship in psychology, economics, ethics, philosophy, and law, we articulate a set of normative perspectives for analyzing dark patterns and their effects on individuals and society. We then show how future research on dark patterns can go beyond subjective criticism of user interface designs and apply empirical methods grounded in normative perspectives.
In a new paper, @jonathanmayer, Mihir Kshirsagar, and I investigate a question that has been challenging researchers and policymakers: what makes a dark pattern, well, "dark"? https://t.co/AJmaZ502wm [thread] pic.twitter.com/rDDcGl9fkY
— Arunesh Mathur (@aruneshmathur) January 14, 2021
Dark patterns are here to stay, but there isn't consensus on what makes a dark pattern dark. A forthcoming paper from my Princeton colleagues (CHI '21) takes on the challenge of providing a normative and conceptual grounding for studying dark patterns. https://t.co/i8IKQBrUxk https://t.co/x1lPGkN3cg
— Arvind Narayanan (@random_walker) January 14, 2021
5. Model-Based Machine Learning for Communications
Nir Shlezinger, Nariman Farsad, Yonina C. Eldar, Andrea J. Goldsmith
We present an introduction to model-based machine learning for communication systems. We begin by reviewing existing strategies for combining model-based algorithms and machine learning from a high level perspective, and compare them to the conventional deep learning approach which utilizes established deep neural network (DNN) architectures trained in an end-to-end manner. Then, we focus on symbol detection, which is one of the fundamental tasks of communication receivers. We show how the different strategies of conventional deep architectures, deep unfolding, and DNN-aided hybrid algorithms, can be applied to this problem. The last two approaches constitute a middle ground between purely model-based and solely DNN-based receivers. By focusing on this specific task, we highlight the advantages and drawbacks of each strategy, and present guidelines to facilitate the design of future model-based deep learning systems for communications.
Model-Based Machine Learning for Communications. #DataScience #DeepLearning #BigData #Analytics #Python #RStats #DevCommunity #Serverless #IIoT #Linux #Programming #IoT #javascript #womenwhocode #100DaysOfCode #Robotics #NeuralNetworks #MachineLearning #AIhttps://t.co/ACBdXi9A9G pic.twitter.com/MYt3be3aPi
— Marcus Borba (@marcusborba) January 14, 2021
6. Re-labeling ImageNet: from Single to Multi-Labels, from Global to Localized Labels
Sangdoo Yun, Seong Joon Oh, Byeongho Heo, Dongyoon Han, Junsuk Choe, Sanghyuk Chun
ImageNet has been arguably the most popular image classification benchmark, but it is also the one with a significant level of label noise. Recent studies have shown that many samples contain multiple classes, despite being assumed to be a single-label benchmark. They have thus proposed to turn ImageNet evaluation into a multi-label task, with exhaustive multi-label annotations per image. However, they have not fixed the training set, presumably because of a formidable annotation cost. We argue that the mismatch between single-label annotations and effectively multi-label images is equally, if not more, problematic in the training setup, where random crops are applied. With the single-label annotations, a random crop of an image may contain an entirely different object from the ground truth, introducing noisy or even incorrect supervision during training. We thus re-label the ImageNet training set with multi-labels. We address the annotation cost barrier by letting a strong image classifier, trained on an extra source of data, generate the multi-labels. We utilize the pixel-wise multi-label predictions before the final pooling layer, in order to exploit the additional location-specific supervision signals. Training on the re-labeled samples results in improved model performances across the board. ResNet-50 attains the top-1 classification accuracy of 78.9% on ImageNet with our localized multi-labels, which can be further boosted to 80.2% with the CutMix regularization. We show that the models trained with localized multi-labels also outperforms the baselines on transfer learning to object detection and instance segmentation tasks, and various robustness benchmarks. The re-labeled ImageNet training set, pre-trained weights, and the source code are available at {https://github.com/naver-ai/relabel_imagenet}.
Re-labeling ImageNet: from Single to Multi-Labels, from Global to Localized Labels
— AK (@ak92501) January 14, 2021
pdf: https://t.co/eb1qvqaO7I
abs: https://t.co/rw45sireIH
github: https://t.co/C6Po5bcKKt pic.twitter.com/KFKVBEbsfA
7. Big Self-Supervised Models Advance Medical Image Classification
Shekoofeh Azizi, Basil Mustafa, Fiona Ryan, Zachary Beaver, Jan Freyberg, Jonathan Deaton, Aaron Loh, Alan Karthikesalingam, Simon Kornblith, Ting Chen, Vivek Natarajan, Mohammad Norouzi
Self-supervised pretraining followed by supervised fine-tuning has seen success in image recognition, especially when labeled examples are scarce, but has received limited attention in medical image analysis. This paper studies the effectiveness of self-supervised learning as a pretraining strategy for medical image classification. We conduct experiments on two distinct tasks: dermatology skin condition classification from digital camera images and multi-label chest X-ray classification, and demonstrate that self-supervised learning on ImageNet, followed by additional self-supervised learning on unlabeled domain-specific medical images significantly improves the accuracy of medical image classifiers. We introduce a novel Multi-Instance Contrastive Learning (MICLe) method that uses multiple images of the underlying pathology per patient case, when available, to construct more informative positive pairs for self-supervised learning. Combining our contributions, we achieve an improvement of 6.7% in top-1 accuracy and an improvement of 1.1% in mean AUC on dermatology and chest X-ray classification respectively, outperforming strong supervised baselines pretrained on ImageNet. In addition, we show that big self-supervised models are robust to distribution shift and can learn efficiently with a small number of labeled medical images.
Big self-supervised models advance medical image classification! Excited to see self-supervised pretraining on unlabeled medical images is much more effective than supervised pretraining on ImageNet.https://t.co/UDZKCM3vGW by @AziziShekoofeh & @skornblith @tingchenai @vivnat .. pic.twitter.com/XbmSVt7CvN
— Mohammad Norouzi (@mo_norouzi) January 14, 2021
Excited to share “Big Self-Supervised Models Advance Medical Image Classification” (https://t.co/FOjj5tWsjS) with @mo_norouzi @tingchenai @skornblith @vivnat @alan_karthi @_basilM @fionakryan @JanFreyberg
— Shek Azizi (@AziziShekoofeh) January 14, 2021
8. Fast convolutional neural networks on FPGAs with hls4ml
Thea Aarrestad, Vladimir Loncar, Maurizio Pierini, Sioni Summers, Jennifer Ngadiuba, Christoffer Petersson, Hampus Linander, Yutaro Iiyama, Giuseppe Di Guglielmo, Javier Duarte, Philip Harris, Dylan Rankin, Sergo Jindariani, Kevin Pedro, Nhan Tran, Mia Liu, Edward Kreinar, Zhenbin Wu, Duc Hoang
- retweets: 102, favorites: 63 (01/15/2021 14:30:55)
- links: abs | pdf
- cs.LG | cs.CV | hep-ex | physics.ins-det | stat.ML
We introduce an automated tool for deploying ultra low-latency, low-power deep neural networks with large convolutional layers on FPGAs. By extending the hls4ml library, we demonstrate how to achieve inference latency of s using convolutional architectures, while preserving state-of-the-art model performance. Considering benchmark models trained on the Street View House Numbers Dataset, we demonstrate various methods for model compression in order to fit the computational constraints of a typical FPGA device. In particular, we discuss pruning and quantization-aware training, and demonstrate how resource utilization can be reduced by over 90% while maintaining the original model accuracy.
Our paper on ultra-fast convolutional networks on FPGAs with #hls4ml is on arXiv: https://t.co/7SFZxUjwkX! A great collaboration between @CERN and @zenseact , achieving 5 microsecond latency on #SVHN using only a fraction of #FPGA resources. @xmpierinix @jmgduarte #mplhep pic.twitter.com/qQxqLylxGG
— Thea Ă…rrestad (@Thea_kaa) January 14, 2021
9. Declarative Demand-Driven Reverse Engineering
Yihao Sun, Jeffrey Ching, Kristopher Micinski
Binary reverse engineering is a challenging task because it often necessitates reasoning using both domain-specific knowledge (e.g., understanding entrypoint idioms common to an ABI) and logical inference (e.g., reconstructing interprocedural control flow). To help perform these tasks, reverse engineers often use toolkits (such as IDA Pro or Ghidra) that allow them to interactively explicate properties of binaries. We argue that deductive databases serve as a natural abstraction for interfacing between visualization-based binary analysis tools and high-performance logical inference engines that compute facts about binaries. In this paper, we present a vision for the future in which reverse engineers use a visualization-based tool to understand binaries while simultaneously querying a logical-inference engine to perform arbitrarily-complex deductive inference tasks. We call our vision declarative demand-driven reverse engineering (D^3RE for short), and sketch a formal semantics whose goal is to mediate interaction between a logical-inference engine (such Souffle) and a reverse engineering tool. We describe aprototype tool, d3re, which are using to explore the D^3RE vision. While still a prototype, we have used d3re to reimplement several common querying tasks on binaries. Our evaluation demonstrates that d3re enables both better performance and more succinct implementation of these common RE tasks.
Lately, my students and I have been thinking about how to integrate high-performance logical inference engines (Datalog) harmoniously with the reverse engineering process. Read about our recent progress (a plugin for Ghidra to enable declarative RE) here:https://t.co/b98vbUrYOx
— Kristopher Micinski (@krismicinski) January 14, 2021
10. SEED: Self-supervised Distillation For Visual Representation
Zhiyuan Fang, Jianfeng Wang, Lijuan Wang, Lei Zhang, Yezhou Yang, Zicheng Liu
This paper is concerned with self-supervised learning for small models. The problem is motivated by our empirical studies that while the widely used contrastive self-supervised learning method has shown great progress on large model training, it does not work well for small models. To address this problem, we propose a new learning paradigm, named SElf-SupErvised Distillation (SEED), where we leverage a larger network (as Teacher) to transfer its representational knowledge into a smaller architecture (as Student) in a self-supervised fashion. Instead of directly learning from unlabeled data, we train a student encoder to mimic the similarity score distribution inferred by a teacher over a set of instances. We show that SEED dramatically boosts the performance of small networks on downstream tasks. Compared with self-supervised baselines, SEED improves the top-1 accuracy from 42.2% to 67.6% on EfficientNet-B0 and from 36.3% to 68.2% on MobileNet-v3-Large on the ImageNet-1k dataset.
SEED: Self-supervised Distillation For Visual Representation
— AK (@ak92501) January 14, 2021
pdf: https://t.co/uZm2smPJmZ
abs: https://t.co/Pb8exxU9Ms pic.twitter.com/9xy1x8D2RT
11. Cross-Modal Contrastive Learning for Text-to-Image Generation
Han Zhang, Jing Yu Koh, Jason Baldridge, Honglak Lee, Yinfei Yang
The output of text-to-image synthesis systems should be coherent, clear, photo-realistic scenes with high semantic fidelity to their conditioned text descriptions. Our Cross-Modal Contrastive Generative Adversarial Network (XMC-GAN) addresses this challenge by maximizing the mutual information between image and text. It does this via multiple contrastive losses which capture inter-modality and intra-modality correspondences. XMC-GAN uses an attentional self-modulation generator, which enforces strong text-image correspondence, and a contrastive discriminator, which acts as a critic as well as a feature encoder for contrastive learning. The quality of XMC-GAN’s output is a major step up from previous models, as we show on three challenging datasets. On MS-COCO, not only does XMC-GAN improve state-of-the-art FID from 24.70 to 9.33, but—more importantly—people prefer XMC-GAN by 77.3 for image quality and 74.1 for image-text alignment, compared to three other recent models. XMC-GAN also generalizes to the challenging Localized Narratives dataset (which has longer, more detailed descriptions), improving state-of-the-art FID from 48.70 to 14.12. Lastly, we train and evaluate XMC-GAN on the challenging Open Images data, establishing a strong benchmark FID score of 26.91.
Cross-Modal Contrastive Learning for Text-to-Image Generation
— AK (@ak92501) January 14, 2021
pdf: https://t.co/In2qaWN7Uk
abs: https://t.co/XrtNX0yjgZ pic.twitter.com/L5qGQprXEo
12. Flow-Loss: Learning Cardinality Estimates That Matter
Parimarjan Negi, Ryan Marcus, Andreas Kipf, Hongzi Mao, Nesime Tatbul, Tim Kraska, Mohammad Alizadeh
Previous approaches to learned cardinality estimation have focused on improving average estimation error, but not all estimates matter equally. Since learned models inevitably make mistakes, the goal should be to improve the estimates that make the biggest difference to an optimizer. We introduce a new loss function, Flow-Loss, that explicitly optimizes for better query plans by approximating the optimizer’s cost model and dynamic programming search algorithm with analytical functions. At the heart of Flow-Loss is a reduction of query optimization to a flow routing problem on a certain plan graph in which paths correspond to different query plans. To evaluate our approach, we introduce the Cardinality Estimation Benchmark, which contains the ground truth cardinalities for sub-plans of over 16K queries from 21 templates with up to 15 joins. We show that across different architectures and databases, a model trained with Flow-Loss improves the cost of plans (using the PostgreSQL cost model) and query runtimes despite having worse estimation accuracy than a model trained with Q-Error. When the test set queries closely match the training queries, both models improve performance significantly over PostgreSQL and are close to the optimal performance (using true cardinalities). However, the Q-Error trained model degrades significantly when evaluated on queries that are slightly different (e.g., similar but not identical query templates), while the Flow-Loss trained model generalizes better to such situations. For example, the Flow-Loss model achieves up to 1.5x better runtimes on unseen templates compared to the Q-Error model, despite leveraging the same model architecture and training data.
New work on learned cardinality estimation, led by @parimarjan. We show how to focus the learning process on estimates that matter to a query optimizer.
— Andreas Kipf (@andreaskipf) January 14, 2021
We also introduce a new, more realistic Cardinality Estimation Benchmark (CEB).
Paper: https://t.co/BJKpbRx9Qh pic.twitter.com/3MiGlSHQ0j
13. COVID-19 Deterioration Prediction via Self-Supervised Representation Learning and Multi-Image Prediction
Anuroop Sriram, Matthew Muckley, Koustuv Sinha, Farah Shamout, Joelle Pineau, Krzysztof J. Geras, Lea Azour, Yindalon Aphinyanaphongs, Nafissa Yakubova, William Moore
The rapid spread of COVID-19 cases in recent months has strained hospital resources, making rapid and accurate triage of patients presenting to emergency departments a necessity. Machine learning techniques using clinical data such as chest X-rays have been used to predict which patients are most at risk of deterioration. We consider the task of predicting two types of patient deterioration based on chest X-rays: adverse event deterioration (i.e., transfer to the intensive care unit, intubation, or mortality) and increased oxygen requirements beyond 6 L per day. Due to the relative scarcity of COVID-19 patient data, existing solutions leverage supervised pretraining on related non-COVID images, but this is limited by the differences between the pretraining data and the target COVID-19 patient data. In this paper, we use self-supervised learning based on the momentum contrast (MoCo) method in the pretraining phase to learn more general image representations to use for downstream tasks. We present three results. The first is deterioration prediction from a single image, where our model achieves an area under receiver operating characteristic curve (AUC) of 0.742 for predicting an adverse event within 96 hours (compared to 0.703 with supervised pretraining) and an AUC of 0.765 for predicting oxygen requirements greater than 6 L a day at 24 hours (compared to 0.749 with supervised pretraining). We then propose a new transformer-based architecture that can process sequences of multiple images for prediction and show that this model can achieve an improved AUC of 0.786 for predicting an adverse event at 96 hours and an AUC of 0.848 for predicting mortalities at 96 hours. A small pilot clinical study suggested that the prediction accuracy of our model is comparable to that of experienced radiologists analyzing the same information.
Boundary-Aware Segmentation Network for Mobile and Web Applications
— AK (@ak92501) January 14, 2021
pdf: https://t.co/FUGmZED2V2
abs: https://t.co/kIuaRV3eJm pic.twitter.com/TGWCqGbnQO