Student Projects

Vers la version française

Year 2007-2008

Efficient and discriminative visual vocabularies for image categorization


Supervision: Jakob Verbeek and Diane Larlus

Description:
This project is focused on image categorization, that is to tell whether an image contains one or more object categories (such as bicycles, cars, people, dogs, etc.) Image categorization finds applications in image and video search and indexing. The challenge is to create categorization systems automatically from a collection of training images, for which it is known which categories they contain. The systems performance is measured by checking on other test images how often the system makes the correct predictions about class presence/absence. Current state-of-the-art image categorization systems (as developed by Lear) represent an image as collection of local features, that describe the content of small image patches. The local features are then grouped, or `clustered', so that each patch can be represented by the group to which it belongs. The collection of groups is often referred to as a visual vocabulary, as patches in an image represented in terms of the groups are in a sense similar to words in a text. An image is then represented by a histogram that indicated how many patches of each group or visual word it contains. These histograms over visual words are then analyzed to find a function that maps these histograms into predictions on the presence/absence of categories. Two drawbacks of this approach are that (i) the process to find the group to which a patch belongs is costly (linear cost in the number of groups), and (ii) the groups are formed in a generic way that does not take into account the subsequent categorization task, and is thus likely to be sub-optimal. The goal of this project is to solve these two issues by (i) using a fast hierarchical grouping structure (logarithmic cost in the number of groups), and (ii) constructing this structure in such a way that takes into account the final categorization goal so as to yield histograms that are near optimal for categorization.


Visual words and corresponding patches

Year 2006-2007