A Contextual-Bandit Approach to Personalized News Article Recommendation

Source

Core Claim

The Yahoo! Front Page news recommendation line uses randomized logged traffic to evaluate contextual bandit policies over article actions.

Action-Time-Series Notes

  • The sequence is a temporal log of recommendation decisions, contexts, actions, and click rewards.
  • It is valuable for action-response modeling but often lacks a rich next-state observation.
  • It belongs in a weak-time-series / bandit category for world-model comparison.