Posts

Note 5
Survey 1

Note

Soft Performance Difference Lemma

8 minute read

This note presents the soft performance difference lemma, which is a fundamental result in regularized reinforcement learning. Nowadays, regularized reinforc...

(TRAVEL) Provably Efficient Learning of Transferable Rewards

4 minute read

This is a recap of the regret analysis in (TRAVEL) Provably Efficient Learning of Transferable Rewards.

(Inverse Bandit) Learning from an Exploring Demonstrator: Optimal Reward Estimation for Bandits

2 minute read

This is a recap of the “inverse bandit” problem first proposed in this paper.

(SCAL) Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning

6 minute read

This is a recap of the regret analysis in SCAL.

(UCRL2) Near-optimal Regret Bounds for Reinforcement Learning

9 minute read

This is a recap of the regret analysis in UCRL2.

Survey

Multi-armed Bandits: A Short Survey

less than 1 minute read

Here is a short survey covering the most commonly seen Multi-armed bandits (MAB) algorithms. You can download the full survey here.