(TRAVEL) Provably Efficient Learning of Transferable Rewards
This is a recap of the regert analysis in (TRAVEL) Provably Efficient Learning of Transferable Rewards.
This is a recap of the regert analysis in (TRAVEL) Provably Efficient Learning of Transferable Rewards.
This is a recap of the “inverse bandit” problem first proposed in this paper.
This is a recap of the regert analysis in SCAL.
This is a recap of the regert analysis in UCRL2.
Here is a short survey covering the most common seen Multi-armed bandits (MAB) algorithms. You can download the full survey here.