KL-MS
Kullback-Leibler Maillard Sampling for Multi-armed Bandits with Bounded Rewards.
Hao Qin, Kwang-Sung Jun, Chicheng Zhang
Conference on Neural Information Processing Systems (NeurIPS) 2023
arXiv | Code