Improved regret for zeroth-order adversarial bandit convex optimisation

Tor Lattimore

doi:10.4171/msl/17

JournalsmslVol. 2, No. 3/4pp. 311–334

Improved regret for zeroth-order adversarial bandit convex optimisation

Tor Lattimore
Google UK, London, UK
- zbMATH
- MR

A subscription is required to access this article.

Abstract

We prove that the information-theoretic upper bound on the minimax regret for zeroth-order adversarial bandit convex optimisation is at most $O (d^{2.5} n log (n))$ , where $d$ is the dimension and $n$ is the number of interactions. This improves on the bound of $O (d^{9.5} n log (n)^{7.5})$ by Bubeck et al. (2017). The proof is based on identifying an improved exploratory distribution for convex functions.

Cite this article

Tor Lattimore, Improved regret for zeroth-order adversarial bandit convex optimisation. Math. Stat. Learn. 2 (2019), no. 3/4, pp. 311–334

DOI 10.4171/MSL/17