Generative models in supervised statistical learning with applications to digital image categorization and structural reliability.

Guillaume Bouchard

PhD thesis

This page gives the overview of my PhD thesis that will be presented in January, 2005. The PhD was supervised by Gilles Celeux and Bill Triggs from INRIA.

Supervised learning techniques predict an output y from an input x. The term generative is used to call the methods based on the modelization of the full joint distribution of x and y. This type of approach is sometimes referred as Informative Classifiers or Bayesian Classifiers, although the parameter estimation is not Bayesian in the statistical point of view.

Chapter 1 - Introduction

Chapter 2 - Generative models in supervised learning

This introduction motivates the use of generative models in supervised learning. This introductory chapter focus on the theoretical distinction between generative and discriminative learning. The advantages of both approaches are summarized. We insist on the importance of the structure of the data in the parameter estimation. The main drawback of generative methods is their asymptotic bias, and we discuss the existing techniques that improve their efficiency. It includes Boosting, Maximum Entropy Discrimination and Kernel techniques using the tangent space of a parametric classifier (the Fisher Kernel).

Chapter 3 - Generative learning with latent class models

We present latent class models for classification and regression to motivate the choice of the generative approach to perform supervised learning tasks.

Classification with mixture of spherical gaussian distribution

Discriminant Analysis~(LDA) is a reference method in supervised classification. In cases where it provides poor performance, alternative methods giving non-linear decision boundaries are required. The critical properties of such a method are flexibility, simplicity and parsimony. We propose a method based on estimating the density of each group using a mixture of spherical Gaussian distributions. This model is easy to use and interpret and has good generalization performance. We study various ways of choosing the numbers of components in the mixtures.

Generative estimation of a mixture of regressions

An alternative to Mixture of Experts (ME) called localised mixture of experts is studied. It corresponds to ME where the experts are linear regressions and the gating network is a Gaussian classifier. The underlying regressors distribution can be considered to be Gaussian, so that the joint distribution is a Gaussian mixture. The estimation of the parameters is then generative instead of discriminative. This provides a powerful speed-up of the EM algorithm for localised ME. Conversely, when studying Gaussian mixtures with specific constraints, one can use the standard EM algorithm for mixture of experts to carry out maximum likelihood estimation. Some constrained models are useful, and the corresponding modifications to apply to the EM algorithm are described. This type of model is particularly suitable for multimodal output prediction [pdf]

Chapter 4 - Choosing the complexity of a generative model in a classification framework

The model choice for generative classification has never been studied. Existing criteria such as AIC and BIC are suboptimal and we propose here an alternative to the expensive Cross-Validation. Many classification tasks are performed by a generative classifier modelling the class-conditional distributions. The learning is done by maximizing the joint likelihood instead of te conditional, and standard model selection procedures can give select models with suboptimal error rates. In this paper, a new criterion to select the model giving the smallest classification entropy is proposed. It is based on the approximation of the integrated entropy (as BIC). Some numerical experiments on simulated and real data show that these new criterions compare favorably with the classical BIC criterion to choose a model minimizing the classification error rate. [pdf]

Chapter 5 - The trade-off between generative and discriminative learning

There have been several papers on the topic of generative vs. discriminative models. This work continue this line of research and tries to go beyond past attempts of contrasting generative and discrimnative models Given any generative classifier based on an inexact density model, we can define a discriminative counterpart that reduces its asymptotic error rate. We introduce a family of classifiers that interpolate the two approaches, thus providing a new way to compare them and giving an estimation procedure whose classification performance is well balanced between the bias of generative classifiers and the variance of discriminative ones. We show that an intermediate trade-off between the two strategies is often preferable, both theoretically and in experiments on real data. [pdf]

Chapter 6 - A hierarchical part-based model for object categorization

As an example of generative model, we show how a full probabilistic model on the distribution of the interest points leads to a classifier with high recognition rates. We propose a hierarchical generative model for coding the geometry and appearance of visual object categories. The model is a collection of loosely connected parts containing more rigid assemblies of subparts. It is optimized for domains where there are relatively large numbers of somewhat informative subparts, such as the features returned by local feature methods from computer vision. The model is learned quickly by an E-M procedure. Some experiments on real images show the its ability to fit complex natural object classes. [pdf]

Chapter 7 - A bayesian model for statistical structural reliability

Some relevant work on structural reliability has been done. It is another example of generative model used to predict failures. This study has been awarded during the Lambda-Mu 14 conference in Bourges. Some studies carried out by the french Electricité de France company on the failure process of metallic components showed that the expected date of failure largely exceeds their period of use. In the case of failure before this date, we wish to know how the physical model is contraticted. A graphical model mixing knowledge on the physical variables and the experience feedback is proposed. The parameters of the models are estimated by Gibbs sampling. The importance of the reactualization according to the failure scenario is quantified and the role of each parameter is identified. [pdf (in french)]

Chapter 8 - A kernel estimator for the frontier estimation problem

This chapter is not related to generative classification but is a typical example of the efficiency of discriminative learning. We propose new methods for estimating the frontier of a set of points. The estimates are defined as kernel functions covering all the points and whose associated support is of smallest surface. They are written as linear combinations of kernel functions applied to the points of the sample. The weights of the linear combination are then computed by solving a linear programming problem. In the general case, the solution of the optimization problem is sparse, that is, only a few coefficients are non zero. The corresponding points play the role of support vectors in the statistical learning theory. In the case of uniform bivariate densities, the $L_1$ error between the estimated and the true frontiers is shown to be almost surely converging to zero, and the rate of convergence is provided. The behaviour of the estimates on one finite sample situation is illustrated on simulations. [pdf]