All Articles

Hot Papers 2020-07-08

1. Predicting Afrobeats Hit Songs Using Spotify Data

Adewale Adeagbo

This study approached the Hit Song Science problem with the aim of predicting which songs in the Afrobeats genre will become popular among Spotify listeners. A dataset of 2063 songs was generated through the Spotify Web API, with the provided audio features. Random Forest and Gradient Boosting algorithms proved to be successful with approximately F1 scores of 86%.

2. The Go Transformer: Natural Language Modeling for Game Play

David Noever, Matthew Ciolino, Josh Kalin

  • retweets: 19, favorites: 88 (07/09/2020 08:45:39)
  • links: abs | pdf
  • cs.CL | cs.LG

This work applies natural language modeling to generate plausible strategic moves in the ancient game of Go. We train the Generative Pretrained Transformer (GPT-2) to mimic the style of Go champions as archived in Smart Game Format (SGF), which offers a text description of move sequences. The trained model further generates valid but previously unseen strategies for Go. Because GPT-2 preserves punctuation and spacing, the raw output of the text generator provides inputs to game visualization and creative patterns, such as the Sabaki project’s (2020) game engine using auto-replays. Results demonstrate that language modeling can capture both the sequencing format of championship Go games and their strategic formations. Compared to random game boards, the GPT-2 fine-tuning shows efficient opening move sequences favoring corner play over less advantageous center and side play. Game generation as a language modeling task offers novel approaches to more than 40 other board games where historical text annotation provides training data (e.g., Amazons & Connect 4/6).

3. Off-Policy Evaluation via the Regularized Lagrangian

Mengjiao Yang, Ofir Nachum, Bo Dai, Lihong Li, Dale Schuurmans

The recently proposed distribution correction estimation (DICE) family of estimators has advanced the state of the art in off-policy evaluation from behavior-agnostic data. While these estimators all perform some form of stationary distribution correction, they arise from different derivations and objective functions. In this paper, we unify these estimators as regularized Lagrangians of the same linear program. The unification allows us to expand the space of DICE estimators to new alternatives that demonstrate improved performance. More importantly, by analyzing the expanded space of estimators both mathematically and empirically we find that dual solutions offer greater flexibility in navigating the tradeoff between optimization stability and estimation bias, and generally provide superior estimates in practice.

4. Provably Safe PAC-MDP Exploration Using Analogies

Melrose Roderick, Vaishnavh Nagarajan, J. Zico Kolter

A key challenge in applying reinforcement learning to safety-critical domains is understanding how to balance exploration (needed to attain good performance on the task) with safety (needed to avoid catastrophic failure). Although a growing line of work in reinforcement learning has investigated this area of “safe exploration,” most existing techniques either 1) do not guarantee safety during the actual exploration process; and/or 2) limit the problem to a priori known and/or deterministic transition dynamics with strong smoothness assumptions. Addressing this gap, we propose Analogous Safe-state Exploration (ASE), an algorithm for provably safe exploration in MDPs with unknown, stochastic dynamics. Our method exploits analogies between state-action pairs to safely learn a near-optimal policy in a PAC-MDP sense. Additionally, ASE also guides exploration towards the most task-relevant states, which empirically results in significant improvements in terms of sample efficiency, when compared to existing methods.