Posts

Note

Soft Performance Difference Lemma

8 minute read

This note presents the soft performance difference lemma, which is a fundamental result in regularized reinforcement learning. Nowadays, regularized reinforc...

Back to top ↑

Survey

Multi-armed Bandits: A Short Survey

less than 1 minute read

Here is a short survey covering the most commonly seen Multi-armed bandits (MAB) algorithms. You can download the full survey here.

Back to top ↑