I was a PhD student under the French-German co-supervision of Cordelia Schmid in the LEAR (Learning and Recognition in Vision) team at INRIA Grenoble in France, and Rüdiger Westermann at the chair for Computer Graphics and Visualization at TU München in Germany; the PhD was part of an industry co-operation with EADS Innovation Works in München, Germany. My work focused on the use of synthetic 3D CAD models for model-based object class detection and pose estimation; see below for details. I started my PhD in October 2006 and graduated in October 2010.
I completed my diploma thesis in 2005 as part of a French-German double degree programme between the ENSIMAG (INP) in Grenoble, France, and the Universität Karlsruhe (TH) in Germany. The diploma thesis focused on improving the robustness of 2D Active Appearance Models for facial action recognition by combining 2D intensity images and 3D stereo information; it was published at CVPR 2006 (see IEEExplore , PDF ).
My PhD thesis aims at extending object class detection and pose estimation tasks on single 2D images by a 3D model-based approach, building on existing CAD models and rendering techniques from the domain of computer graphics.
The first contribution is a 3D approach to multi-view object class detection. Most existing approaches recognize object classes for a particular viewpoint or combine classifiers for a few discrete views. We propose instead to build 3D representations of object classes which allow to handle viewpoint changes and intra-class variability. Our approach extracts a set of pose and class discriminant features from synthetic 3D object models using a filtering procedure, evaluates their suitability for matching to real image data and represents them by their appearance and 3D position. We term these representations 3D Feature Maps. For recognizing an object class in an image, we match the synthetic descriptors to the real ones in a 3D voting scheme. Geometric coherence is reinforced by means of a robust pose estimation which yields a 3D bounding box in addition to the 2D localization. This work was published at CVPR 2008 (see IEEExplore , PDF ).
The second contribution extends the previous approach to multi-view object class detection by introducing discriminative part classifiers and a probabilistic pose estimation method; it further allows to combine training data from synthetic as well as from real images. Appearance and geometry are treated as separate learning tasks with different training data. A part model is used which discriminatively learns the object appearance with spatial pyramids from a database of real images, and encodes the 3D geometry of the object class with a generative representation built from a database of synthetic models. The geometric information is linked to the 2D training data and allows to perform an approximate 3D pose estimation for generic object classes. The pose estimation provides an efficient method to evaluate the likelihood of groups of 2D part detections with respect to a full 3D geometry model in order to disambiguate and prune 2D detections and to handle occlusions. This work was published at CVPR 2010 (see IEEExplore, PDF ).
The third contribution addresses the limitation of the previous methods which provide only approximate 3D pose estimations. Building on initializations of object category and approximate 3D pose as provided for example by the previous methods, a registration scheme is described to align arbitrary standard 3D models to optical and Synthetic Aperture Radar (SAR) images in order to recover the full 6 degrees of freedom of the object. We propose a novel similarity measure which combines perspective contour matching and an appearance-based Mutual Information (MI) measure; it is optimized using an evolutionary Particle Swarming strategy, parallelized to exploit the hardware acceleration potential of current generation graphics processors (GPUs). We show that our approach leads to precise registration results, even for significant image noise, small object dimensions and partial occlusion where other methods would fail. This work was published at CVPR 2007 (see IEEExplore , PDF ).