10/7/2017 Colin Robertson, CS @ ILLINOIS
Written by Colin Robertson, CS @ ILLINOIS
As part of the CS @ ILLINOIS Distinguished Lecture Series, Dr. Eva Tardos will present her research on learning algorithms. The lecture will take place at 4 pm on October 9, in 2405 Siebel Center.
Learning with Low Approximate Regret with Partial Feedback
We consider the adversarial multi-armed bandit problem with partial feedback, minimizing a non-negative loss function using the graph based feedback framework introduced by Mannor and Shamir in 2011. We offer algorithms that attain small loss bounds, as well as low approximate regret against a shifting comparator.
Classical learning algorithms add a low level of uniform noise to the algorithm’s choice to limit the variance of the loss estimator used in importance sampling, which also helps the algorithm to shift to a new arm fast enough when the comparator changes. However, such an approach poses significant hurdles to proving small-loss or low approximate regret bounds. We show that a different general technique of freezing arms, rather than adding random noise, does much better in this setting. The idea of freezing arms was proposed by Allenberg et al. in 2006 in the context of bandit learning with multiplicative weights. We show the broad applicability of this technique by extending it to partial information feedback (via a novel dual freezing thresholding technique), and to shifting comparators. This is joint work with Thodoris Lykouris and Karthik Sridharan.