All Articles

Hot Papers 2020-08-19

1. Super-Human Performance in Gran Turismo Sport Using Deep Reinforcement Learning

Florian Fuchs, Yunlong Song, Elia Kaufmann, Davide Scaramuzza, Peter Duerr

Autonomous car racing raises fundamental robotics challenges such as planning minimum-time trajectories under uncertain dynamics and controlling the car at its friction limits. In this project, we consider the task of autonomous car racing in the top-selling car racing game Gran Turismo Sport. Gran Turismo Sport is known for its detailed physics simulation of various cars and tracks. Our approach makes use of maximum-entropy deep reinforcement learning and a new reward design to train a sensorimotor policy to complete a given race track as fast as possible. We evaluate our approach in three different time trial settings with different cars and tracks. Our results show that the obtained controllers not only beat the built-in non-player character of Gran Turismo Sport, but also outperform the fastest known times in a dataset of personal best lap times of over 50,000 human drivers.

2. Motion Capture from Internet Videos

Junting Dong, Qing Shuai, Yuanqing Zhang, Xian Liu, Xiaowei Zhou, Hujun Bao

  • retweets: 49, favorites: 204 (08/20/2020 10:24:23)
  • links: abs | pdf
  • cs.CV

Recent advances in image-based human pose estimation make it possible to capture 3D human motion from a single RGB video. However, the inherent depth ambiguity and self-occlusion in a single view prohibit the recovery of as high-quality motion as multi-view reconstruction. While multi-view videos are not common, the videos of a celebrity performing a specific action are usually abundant on the Internet. Even if these videos were recorded at different time instances, they would encode the same motion characteristics of the person. Therefore, we propose to capture human motion by jointly analyzing these Internet videos instead of using single videos separately. However, this new task poses many new challenges that cannot be addressed by existing methods, as the videos are unsynchronized, the camera viewpoints are unknown, the background scenes are different, and the human motions are not exactly the same among videos. To address these challenges, we propose a novel optimization-based framework and experimentally demonstrate its ability to recover much more precise and detailed motion from multiple videos, compared against monocular motion capture methods.

3. Whitening and second order optimization both destroy information about the dataset, and can make generalization impossible

Neha S. Wadia, Daniel Duckworth, Samuel S. Schoenholz, Ethan Dyer, Jascha Sohl-Dickstein

Machine learning is predicated on the concept of generalization: a model achieving low error on a sufficiently large training set should also perform well on novel samples from the same distribution. We show that both data whitening and second order optimization can harm or entirely prevent generalization. In general, model training harnesses information contained in the sample-sample second moment matrix of a dataset. We prove that for models with a fully connected first layer, the information contained in this matrix is the only information which can be used to generalize. Models trained using whitened data, or with certain second order optimization schemes, have less access to this information; in the high dimensional regime they have no access at all, producing models that generalize poorly or not at all. We experimentally verify these predictions for several architectures, and further demonstrate that generalization continues to be harmed even when theoretical requirements are relaxed. However, we also show experimentally that regularized second order optimization can provide a practical tradeoff, where training is still accelerated but less information is lost, and generalization can in some circumstances even improve.

4. Drawing Shortest Paths in Geodetic Graphs

Sabine Cornelsen, Maximilian Pfister, Henry Förster, Martin Gronemann, Michael Hoffmann, Stephen Kobourov, Thomas Schneck

  • retweets: 7, favorites: 48 (08/20/2020 10:24:24)
  • links: abs | pdf
  • cs.DM | cs.CG

Motivated by the fact that in a space where shortest paths are unique, no two shortest paths meet twice, we study a question posed by Greg Bodwin: Given a geodetic graph GG, i.e., an unweighted graph in which the shortest path between any pair of vertices is unique, is there a philogeodetic drawing of GG, i.e., a drawing of GG in which the curves of any two shortest paths meet at most once? We answer this question in the negative by showing the existence of geodetic graphs that require some pair of shortest paths to cross at least four times. The bound on the number of crossings is tight for the class of graphs we construct. Furthermore, we exhibit geodetic graphs of diameter two that do not admit a philogeodetic drawing.

5. Inductive logic programming at 30: a new introduction

Andrew Cropper, Sebastijan Dumančić

  • retweets: 8, favorites: 47 (08/20/2020 10:24:24)
  • links: abs | pdf
  • cs.AI | cs.LG

Inductive logic programming (ILP) is a form of machine learning. The goal of ILP is to induce a logic program (a set of logical rules) that generalises training examples. As ILP approaches 30, we provide a new introduction to the field. We introduce the necessary logical notation and the main ILP learning settings. We describe the main building blocks of an ILP system. We compare several ILP systems on several dimensions. We detail four systems (Aleph, TILDE, ASPAL, and Metagol). We contrast ILP with other forms of machine learning. Finally, we summarise the current limitations and outline promising directions for future research.

6. Moment Multicalibration for Uncertainty Estimation

Christopher Jung, Changhwa Lee, Mallesh M. Pai, Aaron Roth, Rakesh Vohra

We show how to achieve the notion of “multicalibration” from H’ebert-Johnson et al. [2018] not just for means, but also for variances and other higher moments. Informally, it means that we can find regression functions which, given a data point, can make point predictions not just for the expectation of its label, but for higher moments of its label distribution as well-and those predictions match the true distribution quantities when averaged not just over the population as a whole, but also when averaged over an enormous number of finely defined subgroups. It yields a principled way to estimate the uncertainty of predictions on many different subgroups-and to diagnose potential sources of unfairness in the predictive power of features across subgroups. As an application, we show that our moment estimates can be used to derive marginal prediction intervals that are simultaneously valid as averaged over all of the (sufficiently large) subgroups for which moment multicalibration has been obtained.

7. Mesh Guided One-shot Face Reenactment using Graph Convolutional Networks

Guangming Yao, Yi Yuan, Tianjia Shao, Kun Zhou

  • retweets: 12, favorites: 39 (08/20/2020 10:24:24)
  • links: abs | pdf
  • cs.CV

Face reenactment aims to animate a source face image to a different pose and expression provided by a driving image. Existing approaches are either designed for a specific identity, or suffer from the identity preservation problem in the one-shot or few-shot scenarios. In this paper, we introduce a method for one-shot face reenactment, which uses the reconstructed 3D meshes (i.e., the source mesh and driving mesh) as guidance to learn the optical flow needed for the reenacted face synthesis. Technically, we explicitly exclude the driving face’s identity information in the reconstructed driving mesh. In this way, our network can focus on the motion estimation for the source face without the interference of driving face shape. We propose a motion net to learn the face motion, which is an asymmetric autoencoder. The encoder is a graph convolutional network (GCN) that learns a latent motion vector from the meshes, and the decoder serves to produce an optical flow image from the latent vector with CNNs. Compared to previous methods using sparse keypoints to guide the optical flow learning, our motion net learns the optical flow directly from 3D dense meshes, which provide the detailed shape and pose information for the optical flow, so it can achieve more accurate expression and pose on the reenacted face. Extensive experiments show that our method can generate high-quality results and outperforms state-of-the-art methods in both qualitative and quantitative comparisons.