1. Stylized Neural Painting
Zhengxia Zou, Tianyang Shi, Shuang Qiu, Yi Yuan, Zhenwei Shi
This paper proposes an image-to-painting translation method that generates vivid and realistic painting artworks with controllable styles. Different from previous image-to-image translation methods that formulate the translation as pixel-wise prediction, we deal with such an artistic creation process in a vectorized environment and produce a sequence of physically meaningful stroke parameters that can be further used for rendering. Since a typical vector render is not differentiable, we design a novel neural renderer which imitates the behavior of the vector renderer and then frame the stroke prediction as a parameter searching process that maximizes the similarity between the input and the rendering output. We explored the zero-gradient problem on parameter searching and propose to solve this problem from an optimal transportation perspective. We also show that previous neural renderers have a parameter coupling problem and we re-design the rendering network with a rasterization network and a shading network that better handles the disentanglement of shape and color. Experiments show that the paintings generated by our method have a high degree of fidelity in both global appearance and local textures. Our method can be also jointly optimized with neural style transfer that further transfers visual style from other images. Our code and animated results are available at \url{https://jiupinjia.github.io/neuralpainter/}.
Stylized Neural Painting
— AK (@ak92501) November 17, 2020
pdf: https://t.co/cScDnvemeE
abs: https://t.co/77DbIPGSFf
project page: https://t.co/W7DR7YvKMT
github: https://t.co/LszlCbnCBb
colab: https://t.co/00RGwaDKKp pic.twitter.com/N9mYsI9AT4
2. Self Normalizing Flows
T. Anderson Keller, Jorn W.T. Peters, Priyank Jaini, Emiel Hoogeboom, Patrick Forré, Max Welling
Efficient gradient computation of the Jacobian determinant term is a core problem of the normalizing flow framework. Thus, most proposed flow models either restrict to a function class with easy evaluation of the Jacobian determinant, or an efficient estimator thereof. However, these restrictions limit the performance of such density models, frequently requiring significant depth to reach desired performance levels. In this work, we propose Self Normalizing Flows, a flexible framework for training normalizing flows by replacing expensive terms in the gradient by learned approximate inverses at each layer. This reduces the computational complexity of each layer’s exact update from to , allowing for the training of flow architectures which were otherwise computationally infeasible, while also providing efficient sampling. We show experimentally that such models are remarkably stable and optimize to similar data likelihood values as their exact gradient counterparts, while surpassing the performance of their functionally constrained counterparts.
Excited to share my first paper! https://t.co/LJaWlDPiGT
— Andy Keller (@nd_rs_n) November 17, 2020
Self Normalizing Flows -- An efficient training method for unconstrained normalizing flows.
Joint work w/ the ever supportive @jornpeters, @priyankjaini, @emiel_hoogeboom, Patrick Forré & @wellingmax
1/5 pic.twitter.com/RuhWfb2pkJ
3. Stable View Synthesis
Gernot Riegler, Vladlen Koltun
We present Stable View Synthesis (SVS). Given a set of source images depicting a scene from freely distributed viewpoints, SVS synthesizes new views of the scene. The method operates on a geometric scaffold computed via structure-from-motion and multi-view stereo. Each point on this 3D scaffold is associated with view rays and corresponding feature vectors that encode the appearance of this point in the input images. The core of SVS is view-dependent on-surface feature aggregation, in which directional feature vectors at each 3D point are processed to produce a new feature vector for a ray that maps this point into the new target view. The target view is then rendered by a convolutional network from a tensor of features synthesized in this way for all pixels. The method is composed of differentiable modules and is trained end-to-end. It supports spatially-varying view-dependent importance weighting and feature transformation of source images at each point; spatial and temporal stability due to the smooth dependence of on-surface feature aggregation on the target view; and synthesis of view-dependent effects such as specular reflection. Experimental results demonstrate that SVS outperforms state-of-the-art view synthesis methods both quantitatively and qualitatively on three diverse real-world datasets, achieving unprecedented levels of realism in free-viewpoint video of challenging large-scale scenes.
Stable View Synthesis
— AK (@ak92501) November 17, 2020
pdf: https://t.co/bd4lzAQw0i
abs: https://t.co/m8TvE3UR2Z pic.twitter.com/O7aU4t9zYA
4. Scaled-YOLOv4: Scaling Cross Stage Partial Network
Chien-Yao Wang, Alexey Bochkovskiy, Hong-Yuan Mark Liao
We show that the YOLOv4 object detection neural network based on the CSP approach, scales both up and down and is applicable to small and large networks while maintaining optimal speed and accuracy. We propose a network scaling approach that modifies not only the depth, width, resolution, but also structure of the network. YOLOv4-large model achieves state-of-the-art results: 55.4% AP (73.3% AP50) for the MS COCO dataset at a speed of 15 FPS on Tesla V100, while with the test time augmentation, YOLOv4-large achieves 55.8% AP (73.2 AP50). To the best of our knowledge, this is currently the highest accuracy on the COCO dataset among any published work. The YOLOv4-tiny model achieves 22.0% AP (42.0% AP50) at a speed of 443 FPS on RTX 2080Ti, while by using TensorRT, batch size = 4 and FP16-precision the YOLOv4-tiny achieves 1774 FPS.
Scaled-YOLOv4: Scaling Cross Stage Partial Networkhttps://t.co/L0UtU1Oj8h pic.twitter.com/zSfmrlbTuy
— phalanx (@ZFPhalanx) November 17, 2020
5. Towards Human-Level Learning of Complex Physical Puzzles
Kei Ota, Devesh K. Jha, Diego Romeres, Jeroen van Baar, Kevin A. Smith, Takayuki Semitsu, Tomoaki Oiki, Alan Sullivan, Daniel Nikovski, Joshua B. Tenenbaum
Humans quickly solve tasks in novel systems with complex dynamics, without requiring much interaction. While deep reinforcement learning algorithms have achieved tremendous success in many complex tasks, these algorithms need a large number of samples to learn meaningful policies. In this paper, we present a task for navigating a marble to the center of a circular maze. While this system is very intuitive and easy for humans to solve, it can be very difficult and inefficient for standard reinforcement learning algorithms to learn meaningful policies. We present a model that learns to move a marble in the complex environment within minutes of interacting with the real system. Learning consists of initializing a physics engine with parameters estimated using data from the real system. The error in the physics engine is then corrected using Gaussian process regression, which is used to model the residual between real observations and physics engine simulations. The physics engine equipped with the residual model is then used to control the marble in the maze environment using a model-predictive feedback over a receding horizon. We contrast the learning behavior against the time taken by humans to solve the problem to show comparable behavior. To the best of our knowledge, this is the first time that a hybrid model consisting of a full physics engine along with a statistical function approximator has been used to control a complex physical system in real-time using nonlinear model-predictive control (NMPC). Codes for the simulation environment can be downloaded here https://www.merl.com/research/license/CME . A video describing our method could be found here https://youtu.be/xaxNCXBovpc .
"Towards Human-Level Learning of Complex Physical Puzzles"
— Kei Ohta (@ohtake_i) November 17, 2020
新しい論文を公開しました!物理的に複雑なタスクを人間のように数分間環境と相互作用するだけで解く方法を提案しています。MITのJosh Tenenbaum先生との共同研究です。
論文:https://t.co/MLkPcY0wOM
動画:https://t.co/2xiyaPsoZ9 pic.twitter.com/aK8o3l1eGN
6. Solving Physics Puzzles by Reasoning about Paths
Augustin Harter, Andrew Melnik, Gaurav Kumar, Dhruv Agarwal, Animesh Garg, Helge Ritter
We propose a new deep learning model for goal-driven tasks that require intuitive physical reasoning and intervention in the scene to achieve a desired end goal. Its modular structure is motivated by hypothesizing a sequence of intuitive steps that humans apply when trying to solve such a task. The model first predicts the path the target object would follow without intervention and the path the target object should follow in order to solve the task. Next, it predicts the desired path of the action object and generates the placement of the action object. All components of the model are trained jointly in a supervised way; each component receives its own learning signal but learning signals are also backpropagated through the entire architecture. To evaluate the model we use PHYRE - a benchmark test for goal-driven physical reasoning in 2D mechanics puzzles.
Solving Physics Puzzles by Reasoning about Paths
— AK (@ak92501) November 17, 2020
pdf: https://t.co/nF3YH8JqDp
abs: https://t.co/QYtXJOaRUC
github: https://t.co/tC1T8b7tOl pic.twitter.com/oimbKIMWyi
7. Good proctor or “Big Brother”? AI Ethics and Online Exam Supervision Technologies
Simon Coghlan, Tim Miller, Jeannie Paterson
This article philosophically analyzes online exam supervision technologies, which have been thrust into the public spotlight due to campus lockdowns during the COVID-19 pandemic and the growing demand for online courses. Online exam proctoring technologies purport to provide effective oversight of students sitting online exams, using artificial intelligence (AI) systems and human invigilators to supplement and review those systems. Such technologies have alarmed some students who see them as `Big Brother-like’, yet some universities defend their judicious use. Critical ethical appraisal of online proctoring technologies is overdue. This article philosophically analyzes these technologies, focusing on the ethical concepts of academic integrity, fairness, non-maleficence, transparency, privacy, respect for autonomy, liberty, and trust. Most of these concepts are prominent in the new field of AI ethics and all are relevant to the education context. The essay provides ethical considerations that educational institutions will need to carefully review before electing to deploy and govern specific online proctoring technologies.
Thinking of using online exam proctoring in your organisation? Our pre-print (https://t.co/MpWctQHeNC) analyses the ethical considerations of these technologies, arguing they are neither a silver bullet nor completely evil. A fun collaboration with @CoghSimon and @JMPaters pic.twitter.com/PJr19djELk
— Tim Miller (@tmiller_unimelb) November 17, 2020
8. Cycle-Consistent Generative Rendering for 2D-3D Modality Translation
Tristan Aumentado-Armstrong, Alex Levinshtein, Stavros Tsogkas, Konstantinos G. Derpanis, Allan D. Jepson
For humans, visual understanding is inherently generative: given a 3D shape, we can postulate how it would look in the world; given a 2D image, we can infer the 3D structure that likely gave rise to it. We can thus translate between the 2D visual and 3D structural modalities of a given object. In the context of computer vision, this corresponds to a learnable module that serves two purposes: (i) generate a realistic rendering of a 3D object (shape-to-image translation) and (ii) infer a realistic 3D shape from an image (image-to-shape translation). In this paper, we learn such a module while being conscious of the difficulties in obtaining large paired 2D-3D datasets. By leveraging generative domain translation methods, we are able to define a learning algorithm that requires only weak supervision, with unpaired data. The resulting model is not only able to perform 3D shape, pose, and texture inference from 2D images, but can also generate novel textured 3D shapes and renders, similar to a graphics pipeline. More specifically, our method (i) infers an explicit 3D mesh representation, (ii) utilizes example shapes to regularize inference, (iii) requires only an image mask (no keypoints or camera extrinsics), and (iv) has generative capabilities. While prior work explores subsets of these properties, their combination is novel. We demonstrate the utility of our learned representation, as well as its performance on image generation and unpaired 3D shape inference tasks.
Cycle-Consistent Generative Rendering for 2D-3D Modality Translation
— AK (@ak92501) November 17, 2020
pdf: https://t.co/a1ZqDuSmM3
abs: https://t.co/9aCujXGSpy
project page: https://t.co/HLjfEIbGu6 pic.twitter.com/p5e1vyCcGj
9. Indirection Stream Semantic Register Architecture for Efficient Sparse-Dense Linear Algebra
Paul Scheffler, Florian Zaruba, Fabian Schuiki, Torsten Hoefler, Luca Benini
Sparse-dense linear algebra is crucial in many domains, but challenging to handle efficiently on CPUs, GPUs, and accelerators alike; multiplications with sparse formats like CSR and CSF require indirect memory lookups. In this work, we enhance a memory-streaming RISC-V ISA extension to accelerate sparse-dense products through streaming indirection. We present efficient dot, matrix-vector, and matrix-matrix product kernels using our hardware, enabling single-core FPU utilizations of up to 80% and speedups of up to 7.2x over an optimized baseline without extensions. A matrix-vector implementation on a multi-core cluster is up to 5.8x faster and 2.7x more energy-efficient with our kernels than an optimized baseline. We propose further uses for our indirection hardware, such as scatter-gather operations and codebook decoding, and compare our work to state-of-the-art CPU, GPU, and accelerator approaches, measuring a 2.8x higher peak FP64 utilization in CSR matrix-vector multiplication than a GTX 1080 Ti GPU running a cuSPARSE kernel.
In our latest paper "Indirection Stream Semantic Register Architecture for Efficient Sparse-Dense Linear Algebra" we present an efficient dot product, CsrMV and CsrMM kernels with our Snitch. Single-core FPU utilizations up to 80% and speedups up to 7.2x https://t.co/cd29g0Zezy pic.twitter.com/X1wHecSj8j
— PULP Platform (@pulp_platform) November 17, 2020
10. DORB: Dynamically Optimizing Multiple Rewards with Bandits
Ramakanth Pasunuru, Han Guo, Mohit Bansal
Policy gradients-based reinforcement learning has proven to be a promising approach for directly optimizing non-differentiable evaluation metrics for language generation tasks. However, optimizing for a specific metric reward leads to improvements in mostly that metric only, suggesting that the model is gaming the formulation of that metric in a particular way without often achieving real qualitative improvements. Hence, it is more beneficial to make the model optimize multiple diverse metric rewards jointly. While appealing, this is challenging because one needs to manually decide the importance and scaling weights of these metric rewards. Further, it is important to consider using a dynamic combination and curriculum of metric rewards that flexibly changes over time. Considering the above aspects, in our work, we automate the optimization of multiple metric rewards simultaneously via a multi-armed bandit approach (DORB), where at each round, the bandit chooses which metric reward to optimize next, based on expected arm gains. We use the Exp3 algorithm for bandits and formulate two approaches for bandit rewards: (1) Single Multi-reward Bandit (SM-Bandit); (2) Hierarchical Multi-reward Bandit (HM-Bandit). We empirically show the effectiveness of our approaches via various automatic metrics and human evaluation on two important NLG tasks: question generation and data-to-text generation, including on an unseen-test transfer setup. Finally, we present interpretable analyses of the learned bandit curriculum over the optimized rewards.