1. SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation
Brendan Duke, Abdalla Ahmed, Christian Wolf, Parham Aarabi, Graham W. Taylor
In this paper we introduce a Transformer-based approach to video object segmentation (VOS). To address compounding error and scalability issues of prior work, we propose a scalable, end-to-end method for VOS called Sparse Spatiotemporal Transformers (SST). SST extracts per-pixel representations for each object in a video using sparse attention over spatiotemporal features. Our attention-based formulation for VOS allows a model to learn to attend over a history of multiple frames and provides suitable inductive bias for performing correspondence-like computations necessary for solving motion segmentation. We demonstrate the effectiveness of attention-based over recurrent networks in the spatiotemporal domain. Our method achieves competitive results on YouTube-VOS and DAVIS 2017 with improved scalability and robustness to occlusions compared with the state of the art.
SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation
— AK (@ak92501) January 25, 2021
pdf: https://t.co/SRLEc0x7wd
abs: https://t.co/RXNRdKWpvI pic.twitter.com/SwC2k3En0m
2. On Maximum Likelihood Training of Score-Based Generative Models
Conor Durkan, Yang Song
Score-based generative modeling has recently emerged as a promising alternative to traditional likelihood-based or implicit approaches. Learning in score-based models involves first perturbing data with a continuous-time stochastic process, and then matching the time-dependent gradient of the logarithm of the noisy data density - or score function - using a continuous mixture of score matching losses. In this note, we show that such an objective is equivalent to maximum likelihood for certain choices of mixture weighting. This connection provides a principled way to weight the objective function, and justifies its use for comparing different score-based generative models. Taken together with previous work, our result reveals that both maximum likelihood training and test-time log-likelihood evaluation can be achieved through parameterization of the score function alone, without the need to explicitly parameterize a density function.
"On Maximum Likelihood Training of Score-Based Generative Models" -- a new technical note with @YSongStanford showing how score-based generative models can be fit using maximum likelihood. 1/7https://t.co/KFyCGXptW9
— Conor Durkan (@conormdurkan) January 25, 2021
Your training objective for score-based generative models is secretly the same as maximum likelihood.
— Yang Song (@YSongStanford) January 25, 2021
Checkout our new technical note on this!
Link: https://t.co/lsOYMt17Dz
Thread below👇 https://t.co/aS1ND0e6J4
Interesting insight from @conormdurkan & @YSongStanford!
— Chin-Wei Huang (@chinwei_h) January 25, 2021
DeBruijn’s thm tells us that Fisher div = infinitesimal change in KL when a small amount of Gaussian noise is added.
They show that matching the score of marginals over an SDE = min KL (i.e. MLE) https://t.co/5JTiEB6Koh pic.twitter.com/Vo9MS8t3xS
3. PyGlove: Symbolic Programming for Automated Machine Learning
Daiyi Peng, Xuanyi Dong, Esteban Real, Mingxing Tan, Yifeng Lu, Hanxiao Liu, Gabriel Bender, Adam Kraft, Chen Liang, Quoc V. Le
Neural networks are sensitive to hyper-parameter and architecture choices. Automated Machine Learning (AutoML) is a promising paradigm for automating these choices. Current ML software libraries, however, are quite limited in handling the dynamic interactions among the components of AutoML. For example, efficientNAS algorithms, such as ENAS and DARTS, typically require an implementation coupling between the search space and search algorithm, the two key components in AutoML. Furthermore, implementing a complex search flow, such as searching architectures within a loop of searching hardware configurations, is difficult. To summarize, changing the search space, search algorithm, or search flow in current ML libraries usually requires a significant change in the program logic. In this paper, we introduce a new way of programming AutoML based on symbolic programming. Under this paradigm, ML programs are mutable, thus can be manipulated easily by another program. As a result, AutoML can be reformulated as an automated process of symbolic manipulation. With this formulation, we decouple the triangle of the search algorithm, the search space and the child program. This decoupling makes it easy to change the search space and search algorithm (without and with weight sharing), as well as to add search capabilities to existing code and implement complex search flows. We then introduce PyGlove, a new Python library that implements this paradigm. Through case studies on ImageNet and NAS-Bench-101, we show that with PyGlove users can easily convert a static program into a search space, quickly iterate on the search spaces and search algorithms, and craft complex search flows to achieve better results.
PyGlove: Symbolic Programming for Automated Machine Learning
— AK (@ak92501) January 25, 2021
pdf: https://t.co/CCNTohb3fD
abs: https://t.co/Hyuy0QdqwP pic.twitter.com/9ApnCZd9gC
4. Progressive Image Super-Resolution via Neural Differential Equation
Seobin Park, Tae Hyun Kim
We propose a new approach for the image super-resolution (SR) task that progressively restores a high-resolution (HR) image from an input low-resolution (LR) image on the basis of a neural ordinary differential equation. In particular, we newly formulate the SR problem as an initial value problem, where the initial value is the input LR image. Unlike conventional progressive SR methods that perform gradual updates using straightforward iterative mechanisms, our SR process is formulated in a concrete manner based on explicit modeling with a much clearer understanding. Our method can be easily implemented using conventional neural networks for image restoration. Moreover, the proposed method can super-resolve an image with arbitrary scale factors on continuous domain, and achieves superior SR performance over state-of-the-art SR methods.
Progressive Image Super-Resolution via Neural Differential Equation
— AK (@ak92501) January 25, 2021
pdf: https://t.co/DX9Zlri1wR
abs: https://t.co/OGHKgrwGlY pic.twitter.com/q8UmSA1eXX
超解像問題を、低解像画像をbicubicなどで目標の解像度に拡大した後、高周波成分を徐々に復元する問題だとし、復元過程を初期値が低解像度画像の常微分方程式で定式化、Neural ODEで学習/推論する。任意の連続量の拡大率を扱うことができ、SOTAの復元精度を達成 https://t.co/9WMwqdWkfY
— Daisuke Okanohara (@hillbig) January 25, 2021
5. Distilling Large Language Models into Tiny and Effective Students using pQRNN
Prabhu Kaliamoorthi, Aditya Siddhant, Edward Li, Melvin Johnson
Large pre-trained multilingual models like mBERT, XLM-R achieve state of the art results on language understanding tasks. However, they are not well suited for latency critical applications on both servers and edge devices. It’s important to reduce the memory and compute resources required by these models. To this end, we propose pQRNN, a projection-based embedding-free neural encoder that is tiny and effective for natural language processing tasks. Without pre-training, pQRNNs significantly outperform LSTM models with pre-trained embeddings despite being 140x smaller. With the same number of parameters, they outperform transformer baselines thereby showcasing their parameter efficiency. Additionally, we show that pQRNNs are effective student architectures for distilling large pre-trained language models. We perform careful ablations which study the effect of pQRNN parameters, data augmentation, and distillation settings. On MTOP, a challenging multilingual semantic parsing dataset, pQRNN students achieve 95.9% of the performance of an mBERT teacher while being 350x smaller. On mATIS, a popular parsing task, pQRNN students on average are able to get to 97.1% of the teacher while again being 350x smaller. Our strong results suggest that our approach is great for latency-sensitive applications while being able to leverage large mBERT-like models.
Distilling Large Language Models into Tiny and Effective Students using pQRNN
— AK (@ak92501) January 25, 2021
pdf: https://t.co/z8qcaVN4mV
abs: https://t.co/tXAfckIulA pic.twitter.com/HQQXW0cqCb
6. Differentiable Trust Region Layers for Deep Reinforcement Learning
Fabian Otto, Philipp Becker, Ngo Anh Vien, Hanna Carolin Ziesche, Gerhard Neumann
Trust region methods are a popular tool in reinforcement learning as they yield robust policy updates in continuous and discrete action spaces. However, enforcing such trust regions in deep reinforcement learning is difficult. Hence, many approaches, such as Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO), are based on approximations. Due to those approximations, they violate the constraints or fail to find the optimal solution within the trust region. Moreover, they are difficult to implement, lack sufficient exploration, and have been shown to depend on seemingly unrelated implementation choices. In this work, we propose differentiable neural network layers to enforce trust regions for deep Gaussian policies via closed-form projections. Unlike existing methods, those layers formalize trust regions for each state individually and can complement existing reinforcement learning algorithms. We derive trust region projections based on the Kullback-Leibler divergence, the Wasserstein L2 distance, and the Frobenius norm for Gaussian distributions. We empirically demonstrate that those projection layers achieve similar or better results than existing methods while being almost agnostic to specific implementation choices.