# Sample complexity of the distinct elements problem

### Yihong Wu

Yale University, New Haven, USA### Pengkun Yang

University of Illinois at Urbana-Champaign, USA

## Abstract

We consider the distinct elements problem, where the goal is to estimate the number of distinct colors in an urn containing $k$ balls based on $n$ samples drawn with replacements. Based on discrete polynomial approximation and interpolation, we propose an estimator with additive error guarantee that achieves the optimal sample complexity within $O(\mathrm {log\:log}\: k)$ factors, and in fact within constant factors for most cases. The estimator can be computed in $O(n)$ time for an accurate estimation. The result also applies to sampling without replacement provided the sample size is a vanishing fraction of the urn size.

One of the key auxiliary results is a sharp bound on the minimum singular values of a real rectangular Vandermonde matrix, which might be of independent interest.

## Cite this article

Yihong Wu, Pengkun Yang, Sample complexity of the distinct elements problem. Math. Stat. Learn. 1 (2018), no. 1, pp. 37–72

DOI 10.4171/MSL/1-1-2