Soft Performance Difference Lemma
This note presents the soft performance difference lemma, which is a fundamental result in regularized reinforcement learning. Nowadays, regularized reinforc...
This note presents the soft performance difference lemma, which is a fundamental result in regularized reinforcement learning. Nowadays, regularized reinforc...
This is a recap of the regert analysis in (TRAVEL) Provably Efficient Learning of Transferable Rewards.
This is a recap of the “inverse bandit” problem first proposed in this paper.
This is a recap of the regert analysis in SCAL.
This is a recap of the regert analysis in UCRL2.