Hot Papers 2020-07-17

1. Implicit Mesh Reconstruction from Unannotated Image Collections

Shubham Tulsiani, Nilesh Kulkarni, Abhinav Gupta

retweets: 50, favorites: 224 (07/18/2020 05:56:48)
links: abs | pdf
cs.CV

We present an approach to infer the 3D shape, texture, and camera pose for an object from a single RGB image, using only category-level image collections with foreground masks as supervision. We represent the shape as an image-conditioned implicit function that transforms the surface of a sphere to that of the predicted mesh, while additionally predicting the corresponding texture. To derive supervisory signal for learning, we enforce that: a) our predictions when rendered should explain the available image evidence, and b) the inferred 3D structure should be geometrically consistent with learned pixel to surface mappings. We empirically show that our approach improves over prior work that leverages similar supervision, and in fact performs competitively to methods that use stronger supervision. Finally, as our method enables learning with limited supervision, we qualitatively demonstrate its applicability over a set of about 30 object categories.

Implicit Mesh Reconstruction from Unannotated Image Collections
pdf: https://t.co/VhxqTW2wa8
abs: https://t.co/iTZ12nuAdl
project page: https://t.co/cqx7EPYYrO
video: https://t.co/nbQwE6kIpq pic.twitter.com/bB1bgcAwG9
— AK (@ak92501) July 17, 2020

Implicit Mesh Reconstruction from Unannotated Image Collections. Nice 3D recovery with a clever parameterization of shape. https://t.co/DO162A0aow #robotics #computervision pic.twitter.com/rTQOmrZISF
— Tomasz Malisiewicz (@quantombone) July 17, 2020

2. OrbNet: Deep Learning for Quantum Chemistry Using Symmetry-Adapted Atomic-Orbital Features

Zhuoran Qiao, Matthew Welborn, Animashree Anandkumar, Frederick R. Manby, Thomas F. Miller III

retweets: 51, favorites: 171 (07/18/2020 05:56:49)
links: abs | pdf
physics.chem-ph | cs.LG

We introduce a machine learning method in which energy solutions from the Schrodinger equation are predicted using symmetry adapted atomic orbitals features and a graph neural-network architecture. \textsc{OrbNet} is shown to outperform existing methods in terms of learning efficiency and transferability for the prediction of density functional theory results while employing low-cost features that are obtained from semi-empirical electronic structure calculations. For applications to datasets of drug-like molecules, including QM7b-T, QM9, GDB-13-T, DrugBank, and the conformer benchmark dataset of Folmsbee and Hutchison, \textsc{OrbNet} predicts energies within chemical accuracy of DFT at a computational cost that is thousand-fold or more reduced.

Introducing OrbNet! #MachineLearning GNN for first-principles #compchem at semi-empirical cost with excellent chemical transferability.

Awesome collaboration with @AnimaAnandkumar, @EntosAI, and @Caltech. Congrats @ZhuoranQ and @mattgwelborn!https://t.co/7y26tG51Xe pic.twitter.com/TcH6uneHPj
— Thomas Miller (@tfmiller3) July 17, 2020

3. Distributed Reinforcement Learning of Targeted Grasping with Active Vision for Mobile Manipulators

Yasuhiro Fujita, Kota Uenishi, Avinash Ummadisingu, Prabhat Nagarajan, Shimpei Masuda, Mario Ynocente Castro

retweets: 46, favorites: 142 (07/18/2020 05:56:49)
links: abs | pdf
cs.RO | cs.AI | cs.DC | cs.LG | stat.ML

Developing personal robots that can perform a diverse range of manipulation tasks in unstructured environments necessitates solving several challenges for robotic grasping systems. We take a step towards this broader goal by presenting the first RL-based system, to our knowledge, for a mobile manipulator that can (a) achieve targeted grasping generalizing to unseen target objects, (b) learn complex grasping strategies for cluttered scenes with occluded objects, and (c) perform active vision through its movable wrist camera to better locate objects. The system is informed of the desired target object in the form of a single, arbitrary-pose RGB image of that object, enabling the system to generalize to unseen objects without retraining. To achieve such a system, we combine several advances in deep reinforcement learning and present a large-scale distributed training system using synchronous SGD that seamlessly scales to multi-node, multi-GPU infrastructure to make rapid prototyping easier. We train and evaluate our system in a simulated environment, identify key components for improving performance, analyze its behaviors, and transfer to a real-world setup.

PFN RL チームの成果。モバイルマニピュレーターロボット（HSR）を使い、環境固定でなく手首につけたカメラを使って画像で指定された物体の把持をシミュレーター上の分散強化学習システムで学習。把持前動作や見失った物体を探す能動視覚を自動で獲得 https://t.co/xhMXGjbY6H https://t.co/H3tBtMI4wc
— Daisuke Okanohara (@hillbig) July 17, 2020

Our new work is accepted at IROS 2020!

Distributed Reinforcement Learning of Targeted Grasping with Active Vision for Mobile Manipulators.

Paper: https://t.co/FLndJD9Du1 https://t.co/PPU6DSMovj
— mooopan (@mooopan) July 17, 2020

4. Accelerating 3D Deep Learning with PyTorch3D

Nikhila Ravi, Jeremy Reizenstein, David Novotny, Taylor Gordon, Wan-Yen Lo, Justin Johnson, Georgia Gkioxari

retweets: 48, favorites: 112 (07/18/2020 05:56:49)
links: abs | pdf
cs.CV | cs.GR | cs.LG

Deep learning has significantly improved 2D image recognition. Extending into 3D may advance many new applications including autonomous vehicles, virtual and augmented reality, authoring 3D content, and even improving 2D recognition. However despite growing interest, 3D deep learning remains relatively underexplored. We believe that some of this disparity is due to the engineering challenges involved in 3D deep learning, such as efficiently processing heterogeneous data and reframing graphics operations to be differentiable. We address these challenges by introducing PyTorch3D, a library of modular, efficient, and differentiable operators for 3D deep learning. It includes a fast, modular differentiable renderer for meshes and point clouds, enabling analysis-by-synthesis approaches. Compared with other differentiable renderers, PyTorch3D is more modular and efficient, allowing users to more easily extend it while also gracefully scaling to large meshes and images. We compare the PyTorch3D operators and renderer with other implementations and demonstrate significant speed and memory improvements. We also use PyTorch3D to improve the state-of-the-art for unsupervised 3D mesh and point cloud prediction from 2D images on ShapeNet. PyTorch3D is open-source and we hope it will help accelerate research in 3D deep learning.

Accelerating 3D Deep Learning with PyTorch3D
pdf: https://t.co/g2RaekzuBM
abs: https://t.co/bDPpOYAM1k pic.twitter.com/avec7uiFKY
— AK (@ak92501) July 17, 2020

Nikhila Ravi, Jeremy Reizenstein, David Novotny, Taylor Gordon, Wan-Yen Lo, Justin Johnson, Georgia Gkioxari, Accelerating 3D Deep Learning with PyTorch3Dhttps://t.co/JHQP5tmCbL pic.twitter.com/l4L1K6Fdm2
— Kosta Derpanis (@CSProfKGD) July 17, 2020

5. Quantum algorithms for graph problems with cut queries

Troy Lee, Miklos Santha, Shengyu Zhang

retweets: 14, favorites: 58 (07/18/2020 05:56:49)
links: abs | pdf
cs.DS | quant-ph

Let $G$ be an $n$ -vertex graph with $m$ edges. When asked a subset $S$ of vertices, a cut query on $G$ returns the number of edges of $G$ that have exactly one endpoint in $S$ . We show that there is a bounded-error quantum algorithm that determines all connected components of $G$ after making $O(\log(n)^5)$ many cut queries. In contrast, it follows from results in communication complexity that any randomized algorithm even just to decide whether the graph is connected or not must make at least $\Omega(n/\log(n))$ many cut queries. We further show that with $O(\log(n)^7)$ many cut queries a quantum algorithm can with high probability output a spanning forest for $G$ . En route to proving these results, we design quantum algorithms for learning a graph using cut queries. We show that a quantum algorithm can learn a graph with maximum degree $d$ after $O(d \log(n)^2)$ many cut queries, and can learn a general graph with $O(\sqrt{m} \log(n)^{3/2})$ many cut queries. These two upper bounds are tight up to the poly-logarithmic factors, and compare to $\Omega(dn)$ and $\Omega(m/\log(n))$ lower bounds on the number of cut queries needed by a randomized algorithm for the same problems, respectively. The key ingredients in our results are the Bernstein-Vazirani algorithm, approximate counting with “OR queries”, and learning sparse vectors from inner products as in compressed sensing.

Awesome new result from Troy Lee, Miklos Santha, and Shengyu Zhang gives an exponential quantum speedup for graph connectivity problems using cut queries. https://t.co/XXOcxhqBYt
— Andrew Childs (@andrewmchilds) July 17, 2020

6. Learning from Noisy Labels with Deep Neural Networks: A Survey

Hwanjun Song, Minseok Kim, Dongmin Park, Jae-Gil Lee

retweets: 12, favorites: 53 (07/18/2020 05:56:50)
links: abs | pdf
cs.LG | cs.CV | stat.ML

Deep learning has achieved remarkable success in numerous domains with help from large amounts of big data. However, the quality of data labels is a concern because of the lack of high-quality labels in many real-world scenarios. As noisy labels severely degrade the generalization performance of deep neural networks, learning from noisy labels (robust training) is becoming an important task in modern deep learning applications. In this survey, we first describe the problem of learning with label noise from a supervised learning perspective. Next, we provide a comprehensive review of 46 state-of-the-art robust training methods, all of which are categorized into seven groups according to their methodological difference, followed by a systematic comparison of six properties used to evaluate their superiority. Subsequently, we summarize the typically used evaluation methodology, including public noisy datasets and evaluation metrics. Finally, we present several promising research directions that can serve as a guideline for future studies.

Learning from Noisy Labels with Deep Neural
Networks: A Surveyhttps://t.co/jHCS80o1rG pic.twitter.com/e9XcAqXpFZ
— phalanx (@ZFPhalanx) July 17, 2020

7. D2D: Learning to find good correspondences for image matching and manipulation

Olivia Wiles, Sebastien Ehrhardt, Andrew Zisserman

retweets: 18, favorites: 47 (07/18/2020 05:56:50)
links: abs | pdf
cs.CV

We propose a new approach to determining correspondences between image pairs under large changes in illumination, viewpoint, context, and material. While most approaches seek to extract a set of reliably detectable regions in each image which are then compared (sparse-to-sparse) using increasingly complicated or specialized pipelines, we propose a simple approach for matching all points between the images (dense-to-dense) and subsequently selecting the best matches. The two key parts of our approach are: (i) to condition the learned features on both images, and (ii) to learn a distinctiveness score which is used to choose the best matches at test time. We demonstrate that our model can be used to achieve state of the art or competitive results on a wide range of tasks: local matching, camera localization, 3D reconstruction, and image stylization.

D2D: Learning to find good correspondences for image matching and manipulation@oliviawiles1, Sebastien Ehrhardt, Andrew Zisserman, @Oxford_VGG
Idea: extract features conditionally on 2nd image.
1/https://t.co/oMKHaszHMG pic.twitter.com/TrPVOj5SUr
— Dmytro Mishkin (@ducha_aiki) July 17, 2020

8. Xiaomingbot: A Multilingual Robot News Reporter

Runxin Xu, Jun Cao, Mingxuan Wang, Jiaze Chen, Hao Zhou, Ying Zeng, Yuping Wang, Li Chen, Xiang Yin, Xijin Zhang, Songcheng Jiang, Yuxuan Wang, Lei Li

retweets: 14, favorites: 47 (07/18/2020 05:56:50)
links: abs | pdf
eess.AS | cs.CL | cs.LG | cs.SD

This paper proposes the building of Xiaomingbot, an intelligent, multilingual and multimodal software robot equipped with four integral capabilities: news generation, news translation, news reading and avatar animation. Its system summarizes Chinese news that it automatically generates from data tables. Next, it translates the summary or the full article into multiple languages, and reads the multilingual rendition through synthesized speech. Notably, Xiaomingbot utilizes a voice cloning technology to synthesize the speech trained from a real person’s voice data in one input language. The proposed system enjoys several merits: it has an animated avatar, and is able to generate and read multilingual news. Since it was put into practice, Xiaomingbot has written over 600,000 articles, and gained over 150,000 followers on social media platforms.

Xiaomingbot: A Multilingual Robot News Reporter
pdf: https://t.co/RO4lJPCXer
abs: https://t.co/RhMG9uo8qV
project page: https://t.co/NwDLvVT3JS pic.twitter.com/VjNISCZewx
— AK (@ak92501) July 17, 2020

9. Controllable Image Synthesis via SegVAE

Yen-Chi Cheng, Hsin-Ying Lee, Min Sun, Ming-Hsuan Yang

retweets: 9, favorites: 41 (07/18/2020 05:56:50)
links: abs | pdf
cs.CV | cs.LG

Flexible user controls are desirable for content creation and image editing. A semantic map is commonly used intermediate representation for conditional image generation. Compared to the operation on raw RGB pixels, the semantic map enables simpler user modification. In this work, we specifically target at generating semantic maps given a label-set consisting of desired categories. The proposed framework, SegVAE, synthesizes semantic maps in an iterative manner using conditional variational autoencoder. Quantitative and qualitative experiments demonstrate that the proposed model can generate realistic and diverse semantic maps. We also apply an off-the-shelf image-to-image translation model to generate realistic RGB images to better understand the quality of the synthesized semantic maps. Furthermore, we showcase several real-world image-editing applications including object removal, object insertion, and object replacement.

Controllable Image Synthesis via SegVAE
pdf: https://t.co/j0k7tQn0dZ
abs: https://t.co/8xZKTeOnmz pic.twitter.com/L4WOLbUfHH
— AK (@ak92501) July 17, 2020

Published 18 Jul 2020

ML Lead at Beatrust. (https://beatrust.com)Tatsuya Shirakawa on Twitter