Hot Papers 2020-08-12

1. Vid2Player: Controllable Video Sprites that Behave and Appear like Professional Tennis Players

Haotian Zhang, Cristobal Sciutto, Maneesh Agrawala, Kayvon Fatahalian

retweets: 421, favorites: 1594 (08/13/2020 09:25:27)
links: abs | pdf
cs.GR

We present a system that converts annotated broadcast video of tennis matches into interactively controllable video sprites that behave and appear like professional tennis players. Our approach is based on controllable video textures, and utilizes domain knowledge of the cyclic structure of tennis rallies to place clip transitions and accept control inputs at key decision-making moments of point play. Most importantly, we use points from the video collection to model a player’s court positioning and shot selection decisions during points. We use these behavioral models to select video clips that reflect actions the real-life player is likely to take in a given match play situation, yielding sprites that behave realistically at the macro level of full points, not just individual tennis motions. Our system can generate novel points between professional tennis players that resemble Wimbledon broadcasts, enabling new experiences such as the creation of matchups between players that have not competed in real life, or interactive control of players in the Wimbledon final. According to expert tennis players, the rallies generated using our approach are significantly more realistic in terms of player behavior than video sprite methods that only consider the quality of motion transitions during video synthesis.

Vid2Player: Controllable Video Sprites that Behave and Appear like Professional Tennis Players
pdf: https://t.co/XUHyI6Pp4i
abs: https://t.co/5X0Lb2MgAP
project page: https://t.co/wYBTTo9JBi pic.twitter.com/gOh3nEyVKv
— AK (@ak92501) August 12, 2020

2. BREEDS: Benchmarks for Subpopulation Shift

Shibani Santurkar, Dimitris Tsipras, Aleksander Madry

retweets: 21, favorites: 99 (08/13/2020 09:25:27)
links: abs | pdf
cs.CV | cs.LG | stat.ML

We develop a methodology for assessing the robustness of models to subpopulation shift---specifically, their ability to generalize to novel data subpopulations that were not observed during training. Our approach leverages the class structure underlying existing datasets to control the data subpopulations that comprise the training and test distributions. This enables us to synthesize realistic distribution shifts whose sources can be precisely controlled and characterized, within existing large-scale datasets. Applying this methodology to the ImageNet dataset, we create a suite of subpopulation shift benchmarks of varying granularity. We then validate that the corresponding shifts are tractable by obtaining human baselines for them. Finally, we utilize these benchmarks to measure the sensitivity of standard model architectures as well as the effectiveness of off-the-shelf train-time robustness interventions. Code and data available at https://github.com/MadryLab/BREEDS-Benchmarks .

Can our models classify Dalmatians as "dogs" even if they are only trained on Poodles? We release a suite of benchmarks for testing model generalization to unseen data subpopulations. w/ @tsiprasd @ShibaniSan

Blog (+data): https://t.co/dsr1EbkQPW
Paper: https://t.co/kCNcNvJdln pic.twitter.com/2SbSPD4L12
— Aleksander Madry (@aleks_madry) August 12, 2020

3. Deep Detail Enhancement for Any Garment

Meng Zhang, Tuanfeng Wang, Duygu Ceylan, Niloy J. Mitra

retweets: 22, favorites: 93 (08/13/2020 09:25:28)
links: abs | pdf
cs.GR

Creating fine garment details requires significant efforts and huge computational resources. In contrast, a coarse shape may be easy to acquire in many scenarios (e.g., via low-resolution physically-based simulation, linear blend skinning driven by skeletal motion, portable scanners). In this paper, we show how to enhance, in a data-driven manner, rich yet plausible details starting from a coarse garment geometry. Once the parameterization of the garment is given, we formulate the task as a style transfer problem over the space of associated normal maps. In order to facilitate generalization across garment types and character motions, we introduce a patch-based formulation, that produces high-resolution details by matching a Gram matrix based style loss, to hallucinate geometric details (i.e., wrinkle density and shape). We extensively evaluate our method on a variety of production scenarios and show that our method is simple, light-weight, efficient, and generalizes across underlying garment types, sewing patterns, and body motion.

Deep Detail Enhancement for Any Garment
pdf: https://t.co/gUxqaTvF2I
abs: https://t.co/va4ss4iK5E pic.twitter.com/FSBPKHGVSo
— AK (@ak92501) August 12, 2020

4. Visual Imitation Made Easy

Sarah Young, Dhiraj Gandhi, Shubham Tulsiani, Abhinav Gupta, Pieter Abbeel, Lerrel Pinto

retweets: 12, favorites: 86 (08/13/2020 09:25:28)
links: abs | pdf
cs.RO | cs.CV | cs.LG

Visual imitation learning provides a framework for learning complex manipulation behaviors by leveraging human demonstrations. However, current interfaces for imitation such as kinesthetic teaching or teleoperation prohibitively restrict our ability to efficiently collect large-scale data in the wild. Obtaining such diverse demonstration data is paramount for the generalization of learned skills to novel scenarios. In this work, we present an alternate interface for imitation that simplifies the data collection process while allowing for easy transfer to robots. We use commercially available reacher-grabber assistive tools both as a data collection device and as the robot’s end-effector. To extract action information from these visual demonstrations, we use off-the-shelf Structure from Motion (SfM) techniques in addition to training a finger detection network. We experimentally evaluate on two challenging tasks: non-prehensile pushing and prehensile stacking, with 1000 diverse demonstrations for each task. For both tasks, we use standard behavior cloning to learn executable policies from the previously collected offline demonstrations. To improve learning performance, we employ a variety of data augmentations and provide an extensive analysis of its effects. Finally, we demonstrate the utility of our interface by evaluating on real robotic scenarios with previously unseen objects and achieve a 87% success rate on pushing and a 62% success rate on stacking. Robot videos are available at https://dhiraj100892.github.io/Visual-Imitation-Made-Easy.

New work in visual imitation learning!

We use commodity reacher-grabbers to both collect demonstrations in the wild, and as an end-effector for robotic manipulation. This means no domain shift b/n the demonstrations and the robot's observations! https://t.co/ZoOvQknuws

1/5 pic.twitter.com/oxRgUMoQb4
— Lerrel Pinto (@LerrelPinto) August 12, 2020

5. DTVNet: Dynamic Time-lapse Video Generation via Single Still Image

Jiangning Zhang, Chao Xu, Liang Liu, Mengmeng Wang, Xia Wu, Yong Liu, Yunliang Jiang

retweets: 14, favorites: 67 (08/13/2020 09:25:28)
links: abs | pdf
cs.CV

This paper presents a novel end-to-end dynamic time-lapse video generation framework, named DTVNet, to generate diversified time-lapse videos from a single landscape image, which are conditioned on normalized motion vectors. The proposed DTVNet consists of two submodules: \emph{Optical Flow Encoder} (OFE) and \emph{Dynamic Video Generator} (DVG). The OFE maps a sequence of optical flow maps to a \emph{normalized motion vector} that encodes the motion information inside the generated video. The DVG contains motion and content streams that learn from the motion vector and the single image respectively, as well as an encoder and a decoder to learn shared content features and construct video frames with corresponding motion respectively. Specifically, the \emph{motion stream} introduces multiple \emph{adaptive instance normalization} (AdaIN) layers to integrate multi-level motion information that are processed by linear layers. In the testing stage, videos with the same content but various motion information can be generated by different \emph{normalized motion vectors} based on only one input image. We further conduct experiments on Sky Time-lapse dataset, and the results demonstrate the superiority of our approach over the state-of-the-art methods for generating high-quality and dynamic videos, as well as the variety for generating videos with various motion information.

DTVNet: Dynamic Time-lapse Video Generation via Single Still Image
pdf: https://t.co/pIqnVppmqE
abs: https://t.co/c8aDsBRJxM
github: https://t.co/BaVSo5syZz pic.twitter.com/gN0hS2U6Vy
— AK (@ak92501) August 12, 2020

6. GeLaTO: Generative Latent Textured Objects

Ricardo Martin-Brualla, Rohit Pandey, Sofien Bouaziz, Matthew Brown, Dan B Goldman

retweets: 3, favorites: 48 (08/13/2020 09:25:28)
links: abs | pdf
cs.CV | cs.GR | cs.LG

Accurate modeling of 3D objects exhibiting transparency, reflections and thin structures is an extremely challenging problem. Inspired by billboards and geometric proxies used in computer graphics, this paper proposes Generative Latent Textured Objects (GeLaTO), a compact representation that combines a set of coarse shape proxies defining low frequency geometry with learned neural textures, to encode both medium and fine scale geometry as well as view-dependent appearance. To generate the proxies’ textures, we learn a joint latent space allowing category-level appearance and geometry interpolation. The proxies are independently rasterized with their corresponding neural texture and composited using a U-Net, which generates an output photorealistic image including an alpha map. We demonstrate the effectiveness of our approach by reconstructing complex objects from a sparse set of views. We show results on a dataset of real images of eyeglasses frames, which are particularly challenging to reconstruct using classical methods. We also demonstrate that these coarse proxies can be handcrafted when the underlying object geometry is easy to model, like eyeglasses, or generated using a neural network for more complex categories, such as cars.

GeLaTO: Generative Latent Textured Objects
pdf: https://t.co/hmppnKCf7W
abs: https://t.co/avrCQPMPsI
project page: https://t.co/bPxxyWUYSR
video: https://t.co/wocTIBSEC4 pic.twitter.com/3VpdRCvsjn
— AK (@ak92501) August 12, 2020

7. ProblemChild: Discovering Anomalous Patterns based on Parent-Child Process Relationships

Bobby Filar, David French

retweets: 13, favorites: 37 (08/13/2020 09:25:29)
links: abs | pdf
cs.CR

It is becoming more common that adversary attacks consist of more than a standalone executable or script. Often, evidence of an attack includes conspicuous process heritage that may be ignored by traditional static machine learning models. Advanced attacker techniques, like “living off the land” that appear normal in isolation become more suspicious when observed in a parent-child context. The context derived from parent-child process chains can help identify and group malware families, as well as discover novel attacker techniques. Adversaries chain these techniques to achieve persistence, bypass defenses, and execute actions. Traditional heuristic-based detections often generate noise or disparate events that belong to what constitutes a single attack. ProblemChild is a graph-based framework designed to address these issues. ProblemChild applies a supervised learning classifier to derive a weighted graph used to identify communities of seemingly disparate events into larger attack sequences. ProblemChild applies conditional probability to automatically rank anomalous communities as well as suppress commonly occurring parent-child chains. In combination, this framework can be used by analysts to aid in the crafting or tuning of detectors and reduce false-positives over time. We evaluate ProblemChild against the 2018 MITRE ATT&CK(TM) emulation of APT3 attack to demonstrate its promise in identifying anomalous parent-child process chains.

Finally published the research @threatpunter and I did on "ProblemChild: Discovering Anomalous Patterns based on Parent-Child Process Relationships" for @virusbtn 2019.https://t.co/rq085e8HWf
— Bobby Filar (@filar) August 12, 2020

Published 13 Aug 2020

ML Lead at Beatrust. (https://beatrust.com)Tatsuya Shirakawa on Twitter