Enqueued related words: Contextual Bandit

Thompson Sampling

Definition / 释义

汤普森采样：一种用于“探索-利用”权衡的贝叶斯决策方法，常见于多臂老虎机（multi-armed bandit）问题。核心做法是：根据当前数据得到参数的后验分布，从后验中随机采样一组参数，并选择在该采样下看起来最优的行动；通过这种“按不确定性比例随机尝试”，在探索新选项与利用已知好选项之间取得平衡。

Pronunciation / 发音

/ˈtɑːmpsən ˈsæmplɪŋ/

Examples / 例句

We used Thompson sampling to choose which ad to show each user.
我们使用汤普森采样来决定向每个用户展示哪条广告。

In a contextual bandit setting, Thompson sampling can adapt to changing user preferences by updating the posterior after each interaction.
在情境老虎机（contextual bandit）设定中，汤普森采样可以在每次交互后更新后验分布，从而适应不断变化的用户偏好。

Etymology / 词源

“Thompson”来自统计学家 William R. Thompson，他在1933年的研究中提出了用后验概率来进行序贯决策的思想；“sampling”意为“采样/抽样”，指从后验分布中随机抽取参数或模型假设。该方法后来在机器学习与在线推荐/广告投放等领域被广泛采用与推广。

Related Words / 相关词

Literary Works / 文学与著作例证

William R. Thompson, On the Likelihood that One Unknown Probability Exceeds Another in View of the Evidence of Two Samples（1933）
Daniel Russo & Benjamin Van Roy, An Information-Theoretic Analysis of Thompson Sampling（2014）
Shipra Agrawal & Navin Goyal, Further Optimal Regret Bounds for Thompson Sampling（2013）
Tor Lattimore & Csaba Szepesvári, Bandit Algorithms（2020，系统性讨论并涵盖汤普森采样）