Research Directions @ LEAR

LEAR's research on learning based approaches for visual scene interpretation can be divided into the following areas. For more information see also our annual report and publications.

Large scale image retrieval

In example-based image search the essence is robust correspondence between observed images and reference ones, despite large differences in viewpoint or malicious attacks of the images. Scaling up to databases of one million images and more, efficiency in indexing is a second requirement. Visual search is a key component of many applications, e.g. navigation through image and video databases, and copyright protection.
Recent work in this direction is described in our papers in PAMI'10, CIVR'09, IJCV'10.

Object recognition and localization

Visual recognition requires the construction of visual models of particular objects and of object and scene categories. Achieving good invariance to viewpoint, lighting, occlusion and background is challenging even for exactly known rigid objects, and these difficulties are compounded when reliable generalization across object categories is needed. Our research combines advanced image descriptors with learning to obtain successful models.
For more information see e.g. our recent papers in IJCV'10, ICCV'09, CVPR'09.

Video interpretation

Humans activity ranks among the most important content of videos, but is also one of the hardest to analyze owing to variations in pose, clothing, and movements. Our research aims at developing robust video descriptors to characterize objects and actions. We also study how text associated with videos can be used to automatically acquire large quantities of training data for action recognition.
For more information see e.g. our recent papers in CVPR'09, BMVC'09, CVPR'08.

Learning from weak supervision

There is a huge amount of visual data available with loosely structured textual annotation, e.g. user generated content sites such as YouTube and Flickr, news websites, and general webpages with images and text. We investigate methods for visual learning that leverage the potential of loosely annotated material to enable semantic-level image and video querying with little or no manual labeling.
For more information see e.g. our recent papers in BMVC'09, ICCV'09, ECCV'08.