INRIA Visual Recognition and Machine Learning Summer School
Grenoble, France, 9-13 July 2012
LEAR's research on learning based approaches for visual scene interpretation can be divided into the following areas. For more information see also our annual report and publications.
In example-based image search the essence is robust correspondence between observed images and reference ones,
despite large differences in viewpoint or malicious attacks of the images.
Scaling up to databases of one million images and more, efficiency in indexing is a second requirement.
Visual search is a key component of many applications, e.g. navigation through image and video databases, and copyright protection.
Recent work in this direction is described in our papers in
PAMI'10,
CIVR'09,
IJCV'10.
Visual recognition requires the construction of visual models of
particular objects and of object and scene categories. Achieving good
invariance to viewpoint, lighting, occlusion and background is
challenging even for exactly known rigid objects, and these
difficulties are compounded when reliable generalization across object
categories is needed. Our research combines advanced image
descriptors with learning to obtain successful models.
Humans activity ranks among the most important content of videos, but is also one of the hardest to analyze
owing to variations in pose, clothing, and movements. Our
research aims at developing robust video descriptors to characterize objects and actions.
We also study how text associated with videos can be used to automatically acquire large quantities of training data for action recognition.
There is a huge amount of visual data available with loosely structured textual annotation,
e.g. user generated content sites such as YouTube and Flickr, news websites, and general webpages with images and text.
We investigate methods for visual learning that leverage the potential of loosely annotated material to
enable semantic-level image and video querying with little or no manual labeling.