Learning Theory and Approximation

Kurt Jetter; Steve Smale; Ding-Xuan Zhou

doi:10.4171/owr/2008/30

JournalsowrVol. 5, No. 3pp. 1655–1706

Learning Theory and Approximation

Kurt Jetter
Universität Hohenheim, Stuttgart, Germany
- zbMATH Open
Steve Smale
City University of Hong Kong, China
- zbMATH Open
Ding-Xuan Zhou
City University of Hong Kong, China
- zbMATH Open

Download PDF

Abstract

The workshop Learning Theory and Approximation, organised by Kurt Jetter (Stuttgart-Hohenheim), Steve Smale (Berkeley) and Ding-Xuan Zhou (Hong Kong), was held June 29–July 5, 2008. The meeting was attended by 22 participants from Europe, North America and Asia. It provided an excellent platform for a fruitful interaction of scientists from learning theory and approximation theory. Discussions with participants in the parallel workshop Computational Algebraic Topology were encouraged by the program to include plenary sessions for both workshops on subjects of common interest. Among these, the workshop addressed here has contributed six plenary lectures presented by Smale, Suykens, Tsybakov, Sauer, Temlyakov, and Schölkopf.

The scientific program started with Smale's lecture which was perfectly within the scope of both workshops. His idea was to apply various orders of boundary and co-boundary operators in Hodge theory to develop learning algorithms for pattern analysis on general probability spaces without Lebesgue structure, similar to the graph Laplacian. To this end, approximation theory is needed for the study of approximating integral operators associated with positive kernels by finite-rank operators and of the space of harmonic functions on general spaces. A useful demonstration of this type of approximation of integral operators by finite-rank operators was given by Tarres in his talk dealing with online learning algorithms for regression.

Various central branches of learning theory have raised new approximation theory problems. For example, kernel-based learning has become an indispensable tool, and two plenary lectures have addressed this subject: Suykens surveyed various support vector machine type kernel-based learning algorithms and pointed to approximation theory problems about kernel canonical correlation analysis, independent component analysis and kernel PCA for sparsity. Schölkopf gave a general introductory survey on kernel methods and discussed several interesting approximation theory problems on data dependent kernels, how to measure variable dependence and covariance by kernel means, and learning surfaces from normals. However, kernel-based methods have been used also in approximation theory for some time, where explicit error estimates have been derived for the approximation of smooth classes of functions. Examples include the use of frame decompositions, see the contribution by Han on wavelet frames, or the use of a class of Bernstein-Durrmeyer positive linear operators associated with general probability measures and their applications to kernel learning algorithms, see the talks by Jetter and by Berdysheva.

A second central topic in learning theory is sparsity, which is an important property for dimension reduction, data representation and analysis, and information retrieval. In this workshop, some statisticians discussed sparsity for various purposes and raised the interesting problem of how to characterize functions with sparse representations: Tsybakov discussed how to study the sparsity in statistical learning theory by sparsity oracle inequalities. Wahba discussed LASSO algorithm and presented a separable approximation algorithm with projected Newton acceleration for variable selection and clustering. Pontil surveyed some multi-task learning with spectral regularization. Mukherjee talked about some predictive models inferring structure and dependencies. And Boucheron discussed model selection for Wilks phenomenon.

On the other hand, sparsity comes up in approximation theory with the development of algorithms for non-linear approximation. Binev presented a mathematical analysis of an adaptive sparse tree algorithm for regression problems in learning theory by such non-linear methods. Temlyakov talked about optimal error bounds of some universal estimators in learning theory by means of properties of $N$ -term approximation. Sauer considered a multivariate polynomial interpolation problem to learn an algebraic variety containing a finite set of points for the purpose of dimension reduction. Zhou described some classification and regression learning schemes generated by Parzen windows and least squares regularization from a new viewpoint of scaling in the time and the frequency domain. And Rosasco introduced some learning algorithms associated with elastic net regularization by means of some classes of wavelets and libraries from approximation theory.

Two further talks have dealt with application of non-parametric regression: Kohler applied smooth splines to non-parametric regression estimators by Monte Carlo methods, with an interesting application to pricing of options. And Hein studied non-parametric regression between manifolds and asked for properties of thin-plate splines and higher order energy functionals including the so-called Eells energy.

The workshop has provided an excellent overview on actual subjects where learning theory and approximation theory meet, and where the two fields can benefit from each other. It has also raised some questions to be settled in the future. To address two of them: First, the usual estimates for variance involve essentially capacity of function classes used in learning processes which relies on properties of various functions spaces from approximation theory. However, smoothness of functions is usually measured in a sophisticated way which makes such results less applicable, in particular, if high-dimensional data are considered. This question is important to topics and learning algorithms on dimension reduction. Second, learning theory considers robustness of its methods, so far, only concerning statistical errors. However, from the standpoint of numerical analysis, also the condition of the used algorithms should be incorporated.

The organizers acknowledge the friendly atmosphere provided by the Oberwolfach institute, and express their thanks to the entire staff.

Cite this article

Kurt Jetter, Steve Smale, Ding-Xuan Zhou, Learning Theory and Approximation. Oberwolfach Rep. 5 (2008), no. 3, pp. 1655–1706

DOI 10.4171/OWR/2008/30