Generative models in supervised statistical learning with applications
to digital image categorization and structural reliability.
Guillaume Bouchard |
PhD thesis |
This page gives the overview of my PhD thesis that will be presented in January, 2005.
The PhD was supervised by Gilles Celeux and Bill Triggs from INRIA.
Supervised learning techniques predict an output y from an input x.
The term generative is used to call the methods based on the modelization
of the full joint distribution of x and y. This type of approach is
sometimes referred as Informative Classifiers or Bayesian Classifiers, although
the parameter estimation is not Bayesian in the statistical point of view.
Chapter 1 - Introduction
Chapter 2 - Generative models in supervised learning
This introduction motivates the use of generative models in supervised learning.
This introductory chapter focus on the theoretical distinction between generative and discriminative
learning. The advantages of both approaches are summarized.
We insist on the importance of the structure of the data in the parameter estimation.
The main drawback of generative methods is their asymptotic bias, and
we discuss the existing techniques that improve their efficiency.
It includes Boosting, Maximum Entropy Discrimination and Kernel techniques
using the tangent space of a parametric classifier (the Fisher Kernel).
Chapter 3 - Generative learning with latent class models
We present latent class models for classification and regression to motivate the
choice of the generative approach to perform supervised learning tasks.
Classification with mixture of spherical gaussian distribution
Discriminant Analysis~(LDA) is a reference method in supervised
classification. In cases where it provides poor performance,
alternative methods giving non-linear decision boundaries are
required. The critical properties of such a method are
flexibility, simplicity and parsimony. We propose a method based
on estimating the density of each group using a mixture of
spherical Gaussian distributions. This model is easy to use and
interpret and has good generalization performance. We study
various ways of choosing the numbers of components in the
mixtures.
Generative estimation of a mixture of regressions
An alternative to Mixture of Experts (ME) called
localised mixture of experts is studied.
It corresponds to ME where the experts are linear regressions
and the gating network is a Gaussian classifier.
The underlying regressors distribution can be considered to be
Gaussian, so that the joint distribution is a Gaussian mixture.
The estimation of the parameters is then generative instead of discriminative.
This provides a powerful speed-up of the EM algorithm for
localised ME. Conversely, when studying Gaussian mixtures
with specific constraints,
one can use the standard EM algorithm for mixture of experts to carry
out maximum likelihood estimation.
Some constrained models are useful,
and the corresponding modifications to apply to the EM algorithm
are described. This type of model is particularly suitable for multimodal output prediction
[pdf]
Chapter 4 - Choosing the complexity of a generative model in a classification framework
The model choice for generative classification has never been studied. Existing
criteria such as AIC and BIC are suboptimal and we propose here an alternative to the
expensive Cross-Validation.
Many classification tasks are performed by a
generative classifier modelling the class-conditional distributions. The
learning is done by maximizing the joint likelihood instead of te conditional,
and standard model selection procedures can give select models with suboptimal
error rates. In this paper, a new criterion to select the model giving the
smallest classification entropy is proposed. It is based on the
approximation of the integrated entropy (as BIC).
Some numerical experiments on simulated and real data show
that these new criterions compare favorably with the classical BIC criterion to
choose a model minimizing the classification error rate.
[pdf]
Chapter 5 - The trade-off between generative and discriminative learning
There have been several papers on the topic of generative vs. discriminative models.
This work continue this line of research and tries to go beyond past attempts
of contrasting generative and discrimnative models
Given any generative classifier based on an inexact density model,
we can define a discriminative counterpart that reduces its asymptotic error
rate. We introduce a family of classifiers that interpolate the two approaches,
thus providing a new way to compare them and giving an estimation procedure
whose classification performance is well balanced between the bias of
generative classifiers and the variance of discriminative ones. We show that an
intermediate trade-off between the two strategies is often preferable, both
theoretically and in experiments on real data.
[pdf]
Chapter 6 - A hierarchical part-based model for object categorization
As an example of generative model, we show how a full probabilistic model on
the distribution of the interest points leads to a classifier with high recognition rates.
We propose a hierarchical generative model for coding the geometry and
appearance of visual object categories. The model is a collection of
loosely connected parts containing more rigid assemblies of
subparts. It is optimized for domains where there are relatively large
numbers of somewhat informative subparts, such as the features
returned by local feature methods from computer vision. The model is
learned quickly by an E-M procedure. Some experiments on real images
show the its ability to fit complex natural object classes.
[pdf]
Chapter 7 - A bayesian model for statistical structural reliability
Some relevant work on structural reliability has been done.
It is another example of generative model used to predict failures.
This study has been awarded during the Lambda-Mu 14 conference in Bourges.
Some studies carried out by the french Electricité de
France company on the failure process of metallic components showed that the
expected date of failure largely exceeds their period of use. In the case of
failure before this date, we wish to know how the physical model is
contraticted. A graphical model mixing knowledge on the physical variables and
the experience feedback is proposed. The parameters of the models are estimated by Gibbs sampling. The importance of the reactualization according to the failure
scenario is quantified and the role of each parameter is identified.
[pdf (in french)]
Chapter 8 - A kernel estimator for the frontier estimation problem
This chapter is not related to generative classification but
is a typical example of the efficiency of discriminative learning.
We propose new methods for estimating the frontier of a set of points.
The estimates are defined as kernel functions
covering all the points and whose associated support is
of smallest surface. They are written as linear combinations
of kernel functions applied to the points of the sample.
The weights of the linear combination are then computed by solving
a linear programming problem. In the general case, the solution
of the optimization problem is sparse, that is, only a few coefficients
are non zero. The corresponding points play the role of support vectors
in the statistical learning theory.
In the case of uniform bivariate densities, the $L_1$ error between the estimated and the true frontiers is
shown to be almost surely converging to zero, and the rate of convergence
is provided. The behaviour of the estimates on one finite sample situation
is illustrated on simulations.
[pdf]
Chapter 9 - Conclusion