ABSTRACT
We study a model for crowdsourced information discovery, arising in restaurant and product reviews on Yelp and Amazon, in collaborative content discovery on YouTube, and in citizen science on Galaxy Zoo and eBird. This model builds on the classical Bayesian multi-armed bandit problem, but supposes that arms are pulled by selfish myopic individuals. These individuals pull the arm with the highest expected posterior reward (i.e., they exploit and do not explore), unless incentivized by a principal or central planner to explore.
We explore the tradeoff between the principal's total expected time-discounted incentive payments, and the total expected time-discounted rewards realized across individuals. Specifically, with time discount factor gamma, and letting OPT denote the reward achievable by a principal who pulls arms directly without having to incentivize individuals, we characterize which pairs (b,p) allow the principal to achieve a reward of at least p*OPT, within an incentive budget of b*OPT, across all problem instances. Our main result is an essentially complete characterization:
If sqrt(b) + sqrt(1-p) > sqrt(gamma), then reward p*OPT is achievable;
If sqrt(b) + sqrt(1-p) < sqrt(gamma), then reward p*OPT is not achievable.
We also plan to comment on applications of crowdsourced information discovery at Uber.