I have successfully defended my thesis on January 6th 2014.
You can find my thesis manuscript
here and my PhD defense slides
here.
Abstract
Building algorithms that classify images on a large scale is an essential task due to the difficulty in searching massive amount of unlabeled visual data available on the Internet. We aim at classifying images based on their content to simplify the manageability of such large-scale collections. Large-scale image classification is a difficult problem as datasets are large with respect to both the number of images and the number of classes. Some of these classes are fine grained and they may not contain any labeled representatives. In this thesis, we use state-of-the-art image representations and focus on efficient learning methods. Our contributions are (1) a benchmark of learning algorithms for large scale image classification, and (2) a novel learning algorithm based on label embedding for learning with scarce training data.
Firstly, we propose a benchmark of learning algorithms for large scale image classification in the fully supervised setting. It compares several objective functions for learning linear classifiers such as one-vs-rest, multiclass, ranking and weighted average ranking using the stochastic gradient descent optimization. The output of this benchmark is a set of recommendations for large-scale learning. We experimentally show that, online learning is well suited for large-scale image classification. With simple data rebalancing, One-vs-Rest performs better than all other methods. Moreover, in online learning, using a small enough step size with respect to the learning rate is sufficient for state-of-the-art performance. Finally, regularization through early stopping results in fast training and a good generalization performance.
Secondly, when dealing with thousands of classes, it is difficult to collect sufficient labeled training data for each class. For some classes we might not even have a single training example. We propose a novel algorithm for this zero-shot learning scenario. Our algorithm uses side information, such as attributes to embed classes in a Euclidean space. We also introduce a function to measure the compatibility between an image and a label. The parameters of this function are learned using a ranking objective. Our algorithm outperforms the state-of-the-art for zero-shot learning. It is flexible and can accommodate other sources of side information such as hierarchies. It also allows for a smooth transition from zero-shot to few-shots learning.
After completing my MSc in
Media Informatics program of
RWTH Aachen University in December 2010, I joined the
LEAR team in January 2011. My PhD was a collaboration between
INRIA Rhone-Alpes and
Xerox Research Centre Europe (XRCE) . It was funded by the CIFRE grant of
The French National Research and Technology Association (ANRT) .