1. Revisiting Deep Learning Models for Tabular Data
Yury Gorishniy, Ivan Rubachev, Valentin Khrulkov, Artem Babenko
The necessity of deep learning for tabular data is still an unanswered question addressed by a large number of research efforts. The recent literature on tabular DL proposes several deep architectures reported to be superior to traditional “shallow” models like Gradient Boosted Decision Trees. However, since existing works often use different benchmarks and tuning protocols, it is unclear if the proposed models universally outperform GBDT. Moreover, the models are often not compared to each other, therefore, it is challenging to identify the best deep model for practitioners. In this work, we start from a thorough review of the main families of DL models recently developed for tabular data. We carefully tune and evaluate them on a wide range of datasets and reveal two significant findings. First, we show that the choice between GBDT and DL models highly depends on data and there is still no universally superior solution. Second, we demonstrate that a simple ResNet-like architecture is a surprisingly effective baseline, which outperforms most of the sophisticated models from the DL literature. Finally, we design a simple adaptation of the Transformer architecture for tabular data that becomes a new strong DL baseline and reduces the gap between GBDT and DL models on datasets where GBDT dominates.
Revisiting Deep Learning for Tabular Data
— elvis (@omarsar0) June 23, 2021
This recent paper reviews some recent deep learning models developed for tabular data.
The authors propose a Transformer model for tabular data that achieves state-of-the-art performance among DL solutions.https://t.co/aMqD0FAzdH pic.twitter.com/c1ozxNrpRo
Revisiting Deep Learning Models for Tabular Data
— AK (@ak92501) June 23, 2021
pdf: https://t.co/G4J8TRyRBt
abs: https://t.co/tMUhp2IZwW
github: https://t.co/qeue47mnWD
proposed a attention-based architecture that outperforms ResNet on many tasks pic.twitter.com/aBCZ3dD4VV
The rate at which "DL for tabular data papers" are uploaded is quite impressive. "Regularization is all you need" was yesterday. Today, we are already "Revisiting Deep Learning Models for Tabular Data" https://t.co/hqorAVCnD9 https://t.co/eigfJcLQnY
— Sebastian Raschka (@rasbt) June 23, 2021
2. Randomness In Neural Network Training: Characterizing The Impact of Tooling
Donglin Zhuang, Xingyao Zhang, Shuaiwen Leon Song, Sara Hooker
The quest for determinism in machine learning has disproportionately focused on characterizing the impact of noise introduced by algorithmic design choices. In this work, we address a less well understood and studied question: how does our choice of tooling introduce randomness to deep neural network training. We conduct large scale experiments across different types of hardware, accelerators, state of art networks, and open-source datasets, to characterize how tooling choices contribute to the level of non-determinism in a system, the impact of said non-determinism, and the cost of eliminating different sources of noise. Our findings are surprising, and suggest that the impact of non-determinism in nuanced. While top-line metrics such as top-1 accuracy are not noticeably impacted, model performance on certain parts of the data distribution is far more sensitive to the introduction of randomness. Our results suggest that deterministic tooling is critical for AI safety. However, we also find that the cost of ensuring determinism varies dramatically between neural network architectures and hardware types, e.g., with overhead up to , , and on a spectrum of widely used GPU accelerator architectures, relative to non-deterministic training. The source code used in this paper is available at https://github.com/usyd-fsalab/NeuralNetworkRandomness.
How do hardware, software and algorithm contribute to non-determinism in deep neural networks?
— Sara Hooker (@sarahookr) June 23, 2021
What is the downstream impact and the cost of ensuring determinism?
Very excited to share new work led by @Donglin07326431 w Xingyao Zhang, @Leon75421958 https://t.co/INnTbe4OEm pic.twitter.com/IZcyKb6voW
3. Dangers of Bayesian Model Averaging under Covariate Shift
Pavel Izmailov, Patrick Nicholson, Sanae Lotfi, Andrew Gordon Wilson
Approximate Bayesian inference for neural networks is considered a robust alternative to standard training, often providing good performance on out-of-distribution data. However, Bayesian neural networks (BNNs) with high-fidelity approximate inference via full-batch Hamiltonian Monte Carlo achieve poor generalization under covariate shift, even underperforming classical estimation. We explain this surprising result, showing how a Bayesian model average can in fact be problematic under covariate shift, particularly in cases where linear dependencies in the input features cause a lack of posterior contraction. We additionally show why the same issue does not affect many approximate inference procedures, or classical maximum a-posteriori (MAP) training. Finally, we propose novel priors that improve the robustness of BNNs to many sources of covariate shift.
Dangers of Bayesian Model Averaging under Covariate Shift https://t.co/7V4Y15xi6P
— Pavel Izmailov (@Pavel_Izmailov) June 23, 2021
We show how Bayesian neural nets can generalize *extremely* poorly under covariate shift, why it happens and how to fix it!
With Patrick Nicholson, @LotfiSanae and @andrewgwils
1/10 pic.twitter.com/kR0I9YZSog
Despite its popularity in the covariate shift setting, Bayesian model averaging can surprisingly hurt OOD generalization! https://t.co/AqbH4f29FG 1/5 https://t.co/OawZs6AV7n pic.twitter.com/SiPHp8mQrm
— Andrew Gordon Wilson (@andrewgwils) June 23, 2021
4. Normalized Avatar Synthesis Using StyleGAN and Perceptual Refinement
Huiwen Luo, Koki Nagano, Han-Wei Kung, Mclean Goldwhite, Qingguo Xu, Zejian Wang, Lingyu Wei, Liwen Hu, Hao Li
We introduce a highly robust GAN-based framework for digitizing a normalized 3D avatar of a person from a single unconstrained photo. While the input image can be of a smiling person or taken in extreme lighting conditions, our method can reliably produce a high-quality textured model of a person’s face in neutral expression and skin textures under diffuse lighting condition. Cutting-edge 3D face reconstruction methods use non-linear morphable face models combined with GAN-based decoders to capture the likeness and details of a person but fail to produce neutral head models with unshaded albedo textures which is critical for creating relightable and animation-friendly avatars for integration in virtual environments. The key challenges for existing methods to work is the lack of training and ground truth data containing normalized 3D faces. We propose a two-stage approach to address this problem. First, we adopt a highly robust normalized 3D face generator by embedding a non-linear morphable face model into a StyleGAN2 network. This allows us to generate detailed but normalized facial assets. This inference is then followed by a perceptual refinement step that uses the generated assets as regularization to cope with the limited available training samples of normalized faces. We further introduce a Normalized Face Dataset, which consists of a combination photogrammetry scans, carefully selected photographs, and generated fake people with neutral expressions in diffuse lighting conditions. While our prepared dataset contains two orders of magnitude less subjects than cutting edge GAN-based 3D facial reconstruction methods, we show that it is possible to produce high-quality normalized face models for very challenging unconstrained input images, and demonstrate superior performance to the current state-of-the-art.
Normalized Avatar Synthesis Using StyleGAN and Perceptual Refinement
— AK (@ak92501) June 23, 2021
pdf: https://t.co/e87LOKwylg
StyleGAN2-based digitization approach using a non-linear 3DMM, generates high-quality normalized textured 3D face models from challenging unconstrained input photos pic.twitter.com/VgZgL4gAbR
5. Counterexample to cut-elimination in cyclic proof system for first-order logic with inductive definitions
Yukihiro Masuoka, Makoto Tatsuta
A cyclic proof system is a proof system whose proof figure is a tree with cycles. The cut-elimination in a proof system is fundamental. It is conjectured that the cut-elimination in the cyclic proof system for first-order logic with inductive definitions does not hold. This paper shows that the conjecture is correct by giving a sequent not provable without the cut rule but provable in the cyclic proof system.
My first pre-print......https://t.co/lWIDaj11Mk
— YukihiroMASUOKA (@Yukihiro0036) June 23, 2021
6. Towards Reducing Labeling Cost in Deep Object Detection
Ismail Elezi, Zhiding Yu, Anima Anandkumar, Laura Leal-Taixe, Jose M. Alvarez
Deep neural networks have reached very high accuracy on object detection but their success hinges on large amounts of labeled data. To reduce the dependency on labels, various active-learning strategies have been proposed, typically based on the confidence of the detector. However, these methods are biased towards best-performing classes and can lead to acquired datasets that are not good representatives of the data in the testing set. In this work, we propose a unified framework for active learning, that considers both the uncertainty and the robustness of the detector, ensuring that the network performs accurately in all classes. Furthermore, our method is able to pseudo-label the very confident predictions, suppressing a potential distribution drift while further boosting the performance of the model. Experiments show that our method comprehensively outperforms a wide range of active-learning methods on PASCAL VOC07+12 and MS-COCO, having up to a 7.7% relative improvement, or up to 82% reduction in labeling cost.
Towards Reducing Labeling Cost in Deep Object Detection
— AK (@ak92501) June 23, 2021
pdf: https://t.co/t9jNnhrw8E
consistently outperforms a wide range of active-learning methods, yielding up to a 7.7% relative improvement in mAP, or up to a 82% reduction in labeling cost pic.twitter.com/JNJauyl8tc
7. Unsupervised Object-Level Representation Learning from Scene Images
Jiahao Xie, Xiaohang Zhan, Ziwei Liu, Yew Soon Ong, Chen Change Loy
Contrastive self-supervised learning has largely narrowed the gap to supervised pre-training on ImageNet. However, its success highly relies on the object-centric priors of ImageNet, i.e., different augmented views of the same image correspond to the same object. Such a heavily curated constraint becomes immediately infeasible when pre-trained on more complex scene images with many objects. To overcome this limitation, we introduce Object-level Representation Learning (ORL), a new self-supervised learning framework towards scene images. Our key insight is to leverage image-level self-supervised pre-training as the prior to discover object-level semantic correspondence, thus realizing object-level representation learning from scene images. Extensive experiments on COCO show that ORL significantly improves the performance of self-supervised learning on scene images, even surpassing supervised ImageNet pre-training on several downstream tasks. Furthermore, ORL improves the downstream performance when more unlabeled scene images are available, demonstrating its great potential of harnessing unlabeled data in the wild. We hope our approach can motivate future research on more general-purpose unsupervised representation learning from scene data. Project page: https://www.mmlab-ntu.com/project/orl/.
Unsupervised Object-Level Representation Learning from Scene Images
— AK (@ak92501) June 23, 2021
pdf: https://t.co/jYccf0MAvu
project page: https://t.co/U1pZ5NtIbe
improves the performance of SSL on scene images,
even surpassing supervised ImageNet pre-training on several downstream tasks pic.twitter.com/9xGVd5uD0J
8. BARTScore: Evaluating Generated Text as Text Generation
Weizhe Yuan, Graham Neubig, Pengfei Liu
A wide variety of NLP applications, such as machine translation, summarization, and dialog, involve text generation. One major challenge for these applications is how to evaluate whether such generated texts are actually fluent, accurate, or effective. In this work, we conceptualize the evaluation of generated text as a text generation problem, modeled using pre-trained sequence-to-sequence models. The general idea is that models trained to convert the generated text to/from a reference output or the source text will achieve higher scores when the generated text is better. We operationalize this idea using BART, an encoder-decoder based pre-trained model, and propose a metric BARTScore with a number of variants that can be flexibly applied in an unsupervised fashion to evaluation of text from different perspectives (e.g. informativeness, fluency, or factuality). BARTScore is conceptually simple and empirically effective. It can outperform existing top-scoring metrics in 16 of 22 test settings, covering evaluation of 16 datasets (e.g., machine translation, text summarization) and 7 different perspectives (e.g., informativeness, factuality). Code to calculate BARTScore is available at https://github.com/neulab/BARTScore, and we have released an interactive leaderboard for meta-evaluation at http://explainaboard.nlpedia.ai/leaderboard/task-meval/ on the ExplainaBoard platform, which allows us to interactively understand the strengths, weaknesses, and complementarity of each metric.
BARTScore: Evaluating Generated Text as Text Generation
— AK (@ak92501) June 23, 2021
pdf: https://t.co/Zr8IXGKkKC
demo: https://t.co/n0ZuXsaeO0
outperforms existing top-scoring metrics in 16 of 22 test settings, covering evaluation of 16 datasets
and 7 different perspectives pic.twitter.com/CH6rYRtZlI
9. DeepMesh: Differentiable Iso-Surface Extraction
Benoit Guillard, Edoardo Remelli, Artem Lukoianov, Stephan Richter, Timur Bagautdinov, Pierre Baque, Pascal Fua
Geometric Deep Learning has recently made striking progress with the advent of continuous Deep Implicit Fields. They allow for detailed modeling of watertight surfaces of arbitrary topology while not relying on a 3D Euclidean grid, resulting in a learnable parameterization that is unlimited in resolution. Unfortunately, these methods are often unsuitable for applications that require an explicit mesh-based surface representation because converting an implicit field to such a representation relies on the Marching Cubes algorithm, which cannot be differentiated with respect to the underlying implicit field. In this work, we remove this limitation and introduce a differentiable way to produce explicit surface mesh representations from Deep Implicit Fields. Our key insight is that by reasoning on how implicit field perturbations impact local surface geometry, one can ultimately differentiate the 3D location of surface samples with respect to the underlying deep implicit field. We exploit this to define DeepMesh — end-to-end differentiable mesh representation that can vary its topology. We use two different applications to validate our theoretical insight: Single view 3D Reconstruction via Differentiable Rendering and Physically-Driven Shape Optimization. In both cases our end-to-end differentiable parameterization gives us an edge over state-of-the-art algorithms.
We can parametrize differentiably 3D meshes whose topology can change. In earlier work, we showed this for meshes represented by SDFs. We have now extended this result to a much broader class of implicit functions. https://t.co/g8vywEYqpi #deeplearning #computervision pic.twitter.com/QCmA97mb1H
— Pascal Fua (@FuaPv) June 23, 2021
10. DocFormer: End-to-End Transformer for Document Understanding
Srikar Appalaraju, Bhavan Jasani, Bhargava Urala Kota, Yusheng Xie, R. Manmatha
We present DocFormer — a multi-modal transformer based architecture for the task of Visual Document Understanding (VDU). VDU is a challenging problem which aims to understand documents in their varied formats (forms, receipts etc.) and layouts. In addition, DocFormer is pre-trained in an unsupervised fashion using carefully designed tasks which encourage multi-modal interaction. DocFormer uses text, vision and spatial features and combines them using a novel multi-modal self-attention layer. DocFormer also shares learned spatial embeddings across modalities which makes it easy for the model to correlate text to visual tokens and vice versa. DocFormer is evaluated on 4 different datasets each with strong baselines. DocFormer achieves state-of-the-art results on all of them, sometimes beating models 4x its size (in no. of parameters).
DocFormer: End-to-End Transformer for Document Understanding
— AK (@ak92501) June 23, 2021
pdf: https://t.co/GwVkCjUMVl
abs: https://t.co/BnY2nnegdC
a multi-modal end-to-end trainable transformer based model for various Visual Document Understanding tasks pic.twitter.com/ZbOlxOij6i