1. Predicting Afrobeats Hit Songs Using Spotify Data
Adewale Adeagbo
This study approached the Hit Song Science problem with the aim of predicting which songs in the Afrobeats genre will become popular among Spotify listeners. A dataset of 2063 songs was generated through the Spotify Web API, with the provided audio features. Random Forest and Gradient Boosting algorithms proved to be successful with approximately F1 scores of 86%.
My paper is now on arxiv!
— AA (@Adxpillar) July 8, 2020
Predicting Afrobeats hit songs using Spotify data https://t.co/6C2AWSNJ3H . It is an experiment on Hit Song Science (HSS) for the Afrobeats genre. Because why not? We global now.
2. The Go Transformer: Natural Language Modeling for Game Play
David Noever, Matthew Ciolino, Josh Kalin
This work applies natural language modeling to generate plausible strategic moves in the ancient game of Go. We train the Generative Pretrained Transformer (GPT-2) to mimic the style of Go champions as archived in Smart Game Format (SGF), which offers a text description of move sequences. The trained model further generates valid but previously unseen strategies for Go. Because GPT-2 preserves punctuation and spacing, the raw output of the text generator provides inputs to game visualization and creative patterns, such as the Sabaki project’s (2020) game engine using auto-replays. Results demonstrate that language modeling can capture both the sequencing format of championship Go games and their strategic formations. Compared to random game boards, the GPT-2 fine-tuning shows efficient opening move sequences favoring corner play over less advantageous center and side play. Game generation as a language modeling task offers novel approaches to more than 40 other board games where historical text annotation provides training data (e.g., Amazons & Connect 4/6).
The Go Transformer: Natural Language Modeling for Game Play
— AK (@ak92501) July 8, 2020
pdf: https://t.co/PRZSVi2so2
abs: https://t.co/WDnhNRCFBX pic.twitter.com/X4CgV1KAeW
3. Off-Policy Evaluation via the Regularized Lagrangian
Mengjiao Yang, Ofir Nachum, Bo Dai, Lihong Li, Dale Schuurmans
The recently proposed distribution correction estimation (DICE) family of estimators has advanced the state of the art in off-policy evaluation from behavior-agnostic data. While these estimators all perform some form of stationary distribution correction, they arise from different derivations and objective functions. In this paper, we unify these estimators as regularized Lagrangians of the same linear program. The unification allows us to expand the space of DICE estimators to new alternatives that demonstrate improved performance. More importantly, by analyzing the expanded space of estimators both mathematically and empirically we find that dual solutions offer greater flexibility in navigating the tradeoff between optimization stability and estimation bias, and generally provide superior estimates in practice.
Policy evaluation via duality/Lagrangian methods presents a lot of choices (how to setup the LPs, regularize them, etc). In https://t.co/Ics1296x4U we examine how these choices affect accuracy of final eval. Lots of insights in this paper, many of which I didn't expect.... pic.twitter.com/0HkEeuSEPE
— Ofir Nachum (@ofirnachum) July 8, 2020
4. Provably Safe PAC-MDP Exploration Using Analogies
Melrose Roderick, Vaishnavh Nagarajan, J. Zico Kolter
A key challenge in applying reinforcement learning to safety-critical domains is understanding how to balance exploration (needed to attain good performance on the task) with safety (needed to avoid catastrophic failure). Although a growing line of work in reinforcement learning has investigated this area of “safe exploration,” most existing techniques either 1) do not guarantee safety during the actual exploration process; and/or 2) limit the problem to a priori known and/or deterministic transition dynamics with strong smoothness assumptions. Addressing this gap, we propose Analogous Safe-state Exploration (ASE), an algorithm for provably safe exploration in MDPs with unknown, stochastic dynamics. Our method exploits analogies between state-action pairs to safely learn a near-optimal policy in a PAC-MDP sense. Additionally, ASE also guides exploration towards the most task-relevant states, which empirically results in significant improvements in terms of sample efficiency, when compared to existing methods.
I’m really excited to release our work on a provably safe and optimal reinforcement learning method: Analogous Safe-state Exploration (with @_vaishnavh and @zicokolter). Paper: https://t.co/j38yaYxPNF Code: https://t.co/yOJcghXCDN
— Melrose Roderick (@roderickmelrose) July 8, 2020