HQ

Hao Qin

PhD student in Statistics at the University of Arizona

About Me

Hi, I'm Hao Qin, a final-year PhD student at The University of Arizona in the program of GIDP-STATS. I am fortunate to be advised by Dr. Chicheng Zhang. Before that, I received my Bachelor's degree in Applied Mathematics from Shandong University and Master's degree in Data Science from The University of Wisconsin-Madison.

Research Interests

My research interests are in the field of machine learning, especially in the area of reinforcement learning. I have a particular interest in identifying the crucial factors that quantify the exploration-exploitation trade-off in reinforcement learning, which is a fundamental problem in the field. A clear understanding of this trade-off will pave the way for designing more efficient algorithms in various applications, such as recommendation systems, robotics, and autonomous driving.

Currently, I am working on the following directions:

Reinforcement Learning
Inverse Reinforcement Learning
Wireless Communications

Selected Publications

Taming the Monster Every Context: Complexity Measure and Unified Framework for Offline-Oracle Efficient Contextual Bandits
Hao Qin, Chicheng Zhang
Conference on Learning Theory (COLT) 2026
Paper | Slides | Poster

Physics-Informed Parametric Bandits for Beam Alignment in mmWave Communications
Hao Qin*, Thang Duong*, Ming F. Li, Chicheng Zhang
International Symposium on Modeling and Optimization in Mobile, Ad hoc, and Wireless Networks (WiOpt) 2026
Paper | Slides

Achieving Adaptivity and Optimality for Multi-Armed Bandits using Exponential-Kullback Leibler Maillard Sampling
Hao Qin, Kwang-Sung Jun, Chicheng Zhang
Transactions on Machine Learning Research (TMLR) 2026 — Featured Certification
Paper

Kullback-Leibler Maillard Sampling for Multi-armed Bandits with Bounded Rewards
Hao Qin, Kwang-Sung Jun, Chicheng Zhang
Conference on Neural Information Processing Systems (NeurIPS) 2023
Paper | Code

Hao Qin

About Me

Research Interests

Selected Publications

News

Soft Performance Difference Lemma

(TRAVEL) Provably Efficient Learning of Transferable Rewards

(Inverse Bandit) Learning from an Exploring Demonstrator: Optimal Reward Estimation for Bandits

(SCAL) Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning

(UCRL2) Near-optimal Regret Bounds for Reinforcement Learning