Uniform Learnability, Model Selection, and Neural Networks
Andrew Barron (Yale)
A variety of pattern recognition methods including polynomial discriminants
and neural networks have a feature of universal statistical consistency for arbitrary
joint distributions of inputs and outputs. Indeed, the probability of error of
the estimated discriminant converges to the Bayes optimal probability of error
in the limit of large sample size when the size of the model is chosen adaptively.
An index of resolvability quantifies the rate of convergence in terms of a
complexity versus approximation trade-off. Though the convergence is not uniform
over all distributions, this index of resolvability quantifies interesting
nonparametric classes for which the convergence is uniform at a polynomial rate.
We discuss the relationship of these conclusions to computational learning theory.