CSpace
Robust and efficient algorithms for conversational contextual bandit
Gu, Haoran1; Xia, Yunni1; Xie, Hong2; Shi, Xiaoyu2; Shang, Mingsheng2
2024-02-01
摘要Conversational contextual bandit is one of the notable variants of contextual bandit and it is shown to have superior performance in recommendation applications. The core idea of conversational contextual bandits utilizing is conversational feedback from users to improve the speed of learning user preference. We show that in real-world applications conversational feedback can be imbalanced and such feedback causes the latest conversational contextual bandit algorithm to conduct many conversations but has a slower learning speed than the baseline algorithm without conversational feedback. How to deal with imbalanced conversational feedback? How to schedule conversations across the learning horizon? In-depth analysis of the limitations of one representative conversational contextual bandit algorithm reveals insights to design ICF-UCB ((Imbalanced Conversational Feedback Upper Confidence Bound)) algorithm, which maintains a fast learning speed under imbalanced feedbacks. ICF-UCB achieves this by adaptively eliminating conversations that may slow down the learning speed. Furthermore, ICF-UCB adaptively schedules conversations to the decision rounds where suboptimal actions may trap the decision maker. It also adaptively selects appropriate conversations to avoid such traps. This algorithm is shown to have sublinear regret. Extensive experiments on synthetic datasets and public real-world datasets (from Yelp and TripAdvisor) validate the superior performance of ICF-UCB for recommendation tasks.
关键词Conversational contextual bandit Imbalanced conversation feedback Upper confidence bound Regret analysis
DOI10.1016/j.ins.2023.119993
发表期刊INFORMATION SCIENCES
ISSN0020-0255
卷号657页码:17
通讯作者Xia, Yunni(xiayunni@hotmail.com)
收录类别SCI
WOS记录号WOS:001141722700001
语种英语