PhD thesis: Large-scale machine learning for video analysis

Supervisors:

Jakob Verbeek and Cordelia Schmid

Duration:

3 years, preferrably starting September 2011.

Topics:

statistical machine learning, computer vision

Keywords:

classification, ranking, local descriptors, compression

Expected skills:

strong knowledge in machine learning and/or computer vision, good skills in programming in python and/or C, ability to make things work

Context:

Video interpretation and understanding is one of the long-term research goals in computer vision. Realistic videos such as movies [LMSR08, MLS09, KMSZ10] present a variety of challenging machine learning problems, such as action classification/action retrieval, human tracking, human/object interaction classification, etc.

Recently robust visual descriptors for video classification have been developed, and have shown that it is possible to learn visual classifiers in realistic difficult settings [GMS09, WUK+09]. However, in order to deploy visual recognition systems on large-scale in practice it becomes important to address the scalability of the techniques.

Goals

The main goal is this thesis is to develop scalable methods for video content analysis (eg for ranking, or classification). In order to address scalability, a variety of topics are of interest: All of these topics require the design of novel machine learning methods, and large-scale experimental evaluation (for which we have the required infrastructure). Therefor, it is important that applicants have both a very good understanding of diverse machine learning techniques, as well as excellent programming skills.

Application:

Please send applications via email both to Jakob Verbeek and Cordelia Schmid (firstname.lastname@inria.fr), along with:

 




References:

[GMS09] Adrien Gaidon, Marcin Marszalek, and Cordelia Schmid. Mining visual actions from movies. In BMVC, 2009.

[KMSZ10] A. Klaeser, M. Marszalek, C. Schmid, and A. Zisserman. Human focused action localization in video. In International Workshop on Sign, Gesture, and Activity (SGA) in Conjunction with ECCV, 2010.

[LMSR08] I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld. Learning realistic human actions from movies. In CVPR, 2008.

[LSP06a] S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proc. CVPR, 2006.

[MLS09] M. Marszalek, I. Laptev, and C. Schmid. Actions in contexts. In CVPR, 2009.

[WUK+09] Heng Wang, Muhammad Muneeb Ullah, Alexander Klaeser, Ivan Laptev, and Cordelia Schmid. Evaluation of local spatio-temporal features for action recognition. In British Machine Vision Conference, 2009.