Student Projects
Year 2007-2008
Efficient and discriminative visual vocabularies for image categorization
Supervision: Jakob Verbeek and Diane Larlus
Description:
This project is focused on
image categorization,
that is to tell whether an image contains one or more
object categories (such as bicycles, cars, people, dogs, etc.)
Image categorization finds applications in image and video search and indexing.
The challenge is to create categorization systems automatically from a collection of
training images, for which it is known which categories they contain.
The systems performance is measured by checking on other
test images how often the system
makes the correct predictions about class presence/absence.
Current state-of-the-art image categorization systems (as developed by
Lear) represent an image as collection of local features, that
describe the content of small image patches. The local features are then
grouped, or `clustered', so that each patch can be represented by the group to
which it belongs. The collection of groups is often referred to as a
visual vocabulary, as patches in an image represented in terms of the
groups are in a sense similar to words in a text. An image is then represented
by a histogram that indicated how many patches of each group or
visual
word it contains. These histograms over visual words are then analyzed to
find a function that maps these histograms into predictions on the
presence/absence of categories.
Two drawbacks of this approach are that (i) the process to find the group to
which a patch belongs is costly (linear cost in the number of groups), and (ii)
the groups are formed in a generic way that does not take into account the
subsequent categorization task, and is thus likely to be sub-optimal. The goal
of this project is to solve these two issues by (i) using a fast
hierarchical grouping structure (logarithmic cost in the number of
groups), and (ii) constructing this structure in such a way that takes into
account the final categorization goal so as to yield histograms that are near
optimal for categorization.

Visual words and corresponding patches
Year 2006-2007