(Inverse Bandit) Learning from an Exploring Demonstrator: Optimal Reward Estimation for Bandits
This is a recap of the “inverse bandit” problem first proposed in this paper.
This is a recap of the “inverse bandit” problem first proposed in this paper.
This is a recap of the regert analysis in SCAL.
This is a recap of the regert analysis in UCRL2.
Here is a short survey covering the most common seen Multi-armed bandits (MAB) algorithms. You can download the full survey here.