Hot Papers 2020-07-20

1. Hybrid Discriminative-Generative Training via Contrastive Learning

Hao Liu, Pieter Abbeel

retweets: 32, favorites: 175 (07/21/2020 08:11:16)
links: abs | pdf
cs.LG | cs.CV | stat.ML

Contrastive learning and supervised learning have both seen significant progress and success. However, thus far they have largely been treated as two separate objectives, brought together only by having a shared neural network. In this paper we show that through the perspective of hybrid discriminative-generative training of energy-based models we can make a direct connection between contrastive learning and supervised learning. Beyond presenting this unified view, we show our specific choice of approximation of the energy-based loss outperforms the existing practice in terms of classification accuracy of WideResNet on CIFAR-10 and CIFAR-100. It also leads to improved performance on robustness, out-of-distribution detection, and calibration.

Excited to share our new work that explores the relationship between contrastive learning, discriminative modeling & generative modeling, through the lens of energy-based models.
🎓 https://t.co/oeVlRLsbGU
💻 https://t.co/F5fw6fC6JF
w/ @pabbeel
summary thread:

[1/N] pic.twitter.com/2zaVqAme2C
— Hao Liu (@lhaoml) July 20, 2020

2. A Unifying Perspective on Neighbor Embeddings along the Attraction-Repulsion Spectrum

Jan Niklas Böhm, Philipp Berens, Dmitry Kobak

retweets: 38, favorites: 153 (07/21/2020 08:11:16)
links: abs | pdf
cs.LG | stat.ML

Neighbor embeddings are a family of methods for visualizing complex high-dimensional datasets using kNN graphs. To find the low-dimensional embedding, these algorithms combine an attractive force between neighboring pairs of points with a repulsive force between all points. One of the most popular examples of such algorithms is t-SNE. Here we show that changing the balance between the attractive and the repulsive forces in t-SNE yields a spectrum of embeddings, which is characterized by a simple trade-off: stronger attraction can better represent continuous manifold structures, while stronger repulsion can better represent discrete cluster structures. We show that UMAP embeddings correspond to t-SNE with increased attraction; this happens because the negative sampling optimisation strategy employed by UMAP strongly lowers the effective repulsion. Likewise, ForceAtlas2, commonly used for visualizing developmental single-cell transcriptomic data, yields embeddings corresponding to t-SNE with the attraction increased even more. At the extreme of this spectrum lies Laplacian Eigenmaps, corresponding to zero repulsion. Our results demonstrate that many prominent neighbor embedding algorithms can be placed onto this attraction-repulsion spectrum, and highlight the inherent trade-offs between them.

New preprint on attraction-repulsion spectrum in t-SNE => continuity-discreteness trade-off!

We also show that UMAP has higher attraction due to negative sampling, and not due to its loss. 🤯 Plus we demystify FA2.

With @jnboehm and @CellTypist.https://t.co/n6AQT44WbH [1/n] pic.twitter.com/ipe0NQQGPk
— Dmitry Kobak (@hippopedoid) July 20, 2020

3. DVI: Depth Guided Video Inpainting for Autonomous Driving

Miao Liao, Feixiang Lu, Dingfu Zhou, Sibo Zhang, Wei Li, Ruigang Yang

retweets: 18, favorites: 80 (07/21/2020 08:11:16)
links: abs | pdf
cs.CV | eess.IV

To get clear street-view and photo-realistic simulation in autonomous driving, we present an automatic video inpainting algorithm that can remove traffic agents from videos and synthesize missing regions with the guidance of depth/point cloud. By building a dense 3D map from stitched point clouds, frames within a video are geometrically correlated via this common 3D map. In order to fill a target inpainting area in a frame, it is straightforward to transform pixels from other frames into the current one with correct occlusion. Furthermore, we are able to fuse multiple videos through 3D point cloud registration, making it possible to inpaint a target video with multiple source videos. The motivation is to solve the long-time occlusion problem where an occluded area has never been visible in the entire video. To our knowledge, we are the first to fuse multiple videos for video inpainting. To verify the effectiveness of our approach, we build a large inpainting dataset in the real urban road environment with synchronized images and Lidar data including many challenge scenes, e.g., long time occlusion. The experimental results show that the proposed approach outperforms the state-of-the-art approaches for all the criteria, especially the RMSE (Root Mean Squared Error) has been reduced by about 13%.

DVI: Depth Guided Video Inpainting for Autonomous Driving
pdf: https://t.co/YAz3pGdvVE
abs: https://t.co/7ARnA9DgVK pic.twitter.com/Z9b1cPgpbg
— AK (@ak92501) July 20, 2020

4. Scale Equivariance Improves Siamese Tracking

Ivan Sosnovik, Artem Moskalev, Arnold Smeulders

retweets: 9, favorites: 43 (07/21/2020 08:11:17)
links: abs | pdf
cs.CV

Siamese trackers turn tracking into similarity estimation between a template and the candidate regions in the frame. Mathematically, one of the key ingredients of success of the similarity function is translation equivariance. Non-translation-equivariant architectures induce a positional bias during training, so the location of the target will be hard to recover from the feature space. In real life scenarios, objects undergoe various transformations other than translation, such as rotation or scaling. Unless the model has an internal mechanism to handle them, the similarity may degrade. In this paper, we focus on scaling and we aim to equip the Siamese network with additional built-in scale equivariance to capture the natural variations of the target a priori. We develop the theory for scale-equivariant Siamese trackers, and provide a simple recipe for how to make a wide range of existing trackers scale-equivariant. We present SE-SiamFC, a scale-equivariant variant of SiamFC built according to the recipe. We conduct experiments on OTB and VOT benchmarks and on the synthetically generated T-MNIST and S-MNIST datasets. We demonstrate that a built-in additional scale equivariance is useful for visual object tracking.

Scale equivariance is not only useful for classification tasks, but also for localization. Excited to introduce our joint work with @isosnovik.

Scale Equivariance Improves Siamese Tracking:https://t.co/y47oTzIMBe pic.twitter.com/7jfd11bwxf
— Artem Moskalev (@artemmoskalev) July 20, 2020

Published 21 Jul 2020

ML Lead at Beatrust. (https://beatrust.com)Tatsuya Shirakawa on Twitter