Course information

Statistical learning is about the construction and study of systems that can automatically learn from data. With the emergence of massive datasets commonly encountered today, the need for powerful machine learning is of acute importance. Examples of successful applications include effective web search, anti-spam software, computer vision, robotics, practical speech recognition, and a deeper understanding of the human genome. This course gives an introduction to this exciting field, with a strong focus on kernels as a versatile tool to represent data, in combination with (un)supervised learning techniques that are agnostic to the type of data that is learned from. The learning techniques that will be covered include regression, classification, clustering and dimension reduction. We will cover both the theoretical underpinnings of kernels, as well as a series of kernels that are important in practical applications.

Evaluation

  • For UJF: homeworks 1,2,3 (1/2) + project (1/2)
  • For ENSIMAG: homework 1 (1/4) + project (3/4)

Course outline

Introduction

  • Motivating example applications
  • Empirical risk minimization
  • Bias-variance trade-off, and risk bounds

Supervised learning with linear models and kernels

  • Risk convexification and regularization
  • Ridge regression
  • Logistic regression
  • Support vector machines
  • Kernels for non-linear models

Unsupervised learning

  • Principal component analysis
  • Data clustering
  • Other methods: canonical correlation analysis, sparse coding, etc.

Kernels for probabilistic models

  • Fisher kernels
  • Probability product kernels

Reading material

Machine Learning and Statistics

  • Vapnik, The nature of statistical learning theory. Springer
  • Hastie, Tibshirani, Friedman, The elements of statistical learning. (free online)
  • Devroye, Gyorfi, Lugosi, A probabilistic theory of pattern recognition. Springer
  • J Shawe-Taylor, N Cristianini. Kernel methods for pattern analysis. 2004.
  • Bishop, Pattern recognition & machine learning. 2006.
  • Slides by Jean-Philippe Vert on kernel methods.

Optimization

  • S. Boyd and L. Vandenberghe. Convex Optimization. 2004. (free online)
  • D. Bertsekas. Nonlinear Programming. 2003.

Calendar

Date Room Lecturer Topic Homework
07/10 H104 JV Introduction + Bias-variance tradeoff.
slides
14/10 H201 JV Penalized empirical risk minimization, linear classifiers, introduction kernels.
slides
Homework 1
21/10 H201 JM Reproducing kernel Hilbert spaces (RKHS)
04/11 H201 JV The kernel trick, supervised kernel methods, and Fisher kernels.
slides
Homework 2
25/11 H201 JM
2/12 H201 JM

Homeworks

There will be three homeworks given during the course, at lecture 2, 4, and 6. Each of them should be returned within three weeks. Either use LateX, or make sure you write very clearly. Homework has to be done individually. ENSIMAG students only have to handin the first homework, since they get less credits for the course.

Projects

The project consists of implementing an article, doing some experiments, and writing a small report (less than 10 pages). It is also possible to study a theoretical paper instead of implementing a method. All reports should be written in LateX, and a pdf should be sent to the lecturers before January 5th. Projects can be done alone, or in groups of two people. You can either come with your own idea and discuss it with us, or we can give you some suggestions. To give you an idea, these are projects of a related course.
Project Student(s) Coach
Supervised classification of text documents.
material
Vera Shalaeva and Manon Lukas
Predicting Molecular Activity with Graph Kernels.
material
Phivos Valougeorgis
Speaker Recognition.
material
Li Liu
Supervised classification of Flickr images.
material
Leonardo Gutierrez Gomez
Fast string kernels using inexact matching for protein sequences
material
Semigroup kernels on measures
material
Julien Alapetite
Kernel change-point analysis
material
Fast global alignment kernels
material
Multiple kernel learning, conic duality, and the SMO algorithm
material
Predictive low-rank decomposition for kernel methods
material
Image classification with segmentation graph kernels
material
Image Classification with the Fisher Vector: Theory and Practice
material
Jerome Lesaint

Jobs / Internships Opportunities

We have different intern/PhD opportunities in machine learning, image processing, bioinformatics and computer vision. It is best to discuss that matter early with us since the number of places is limited.