Gyuri Dorkó

Selection of Discriminative Regions and Local Descriptors for Generic Object Class Recognition

PhD Thesis - Abstract

Object category recognition is one of the most difficult problems in computer vision. It involves recognizing objects despite intra-class variations, viewpoint changes and background clutter. The goal of this thesis is to investigate robust invariant local image description and the selection of discriminative features. We show that class-discriminative scale-invariant features achieve excellent results for image-level categorization and object localization. We present solutions for two key problems: (i) we improve the quality of the image description based on a novel scale-invariant keypoint detection method and (ii) we integrate feature filtering techniques into our object models.

Our novel scale-invariant detector is based on the idea of a ``maximally stable description'', i.e., the descriptor should be stable even in the presence of minor variations of the detector. The technique performs scale selection based on a region descriptor, here SIFT, and chooses regions for which this descriptor is maximally stable, i.e., the difference between descriptors extracted for consecutive scales reaches a minimum. This scale selection technique is applied to multi-scale Harris and Laplacian points. Experimental results evaluate the performance of our detector and show that it outperforms existing ones in the context of image matching, category and texture classification, as well as object localization.

To construct object models based on discriminative features, we first cluster the scale-invariant descriptors and obtain a set of ``visual words''. We then estimate the discriminative information of these clusters based on different feature selection techniques---several of which are traditionally used in text retrieval. We discuss their properties---feature frequency, discriminative power, and redundancy---and analyze their performance in the context of image classification and object localization. We show that each task has different requirements, and indicate which selection techniques are the most appropriate. Experimental results for recognition on challenging large datasets demonstrate the performance of the approach.