Machine learning and applications

Course information

Machine Learning is about the construction and study of systems that can automatically learn from data. With the emergence of massive datasets commonly encountered today, the need for powerful machine learning is of acute importance. Examples of successful applications include effective web search, anti-spam software, computer vision, robotics, practical speech recognition, and a deeper understanding of the human genome. In this course, we will give an introduction to this exciting field. We will focus on supervised learning, such as classification and ranking, and unsupervised learning problems, such as clustering and dimension reduction. We will study classical algorithms, and introduce tools to measure their performance, as well as their computational complexity.

Related courses

CR07 Algorithms for Molecular Biology
CR08: Combinatorial Scientific Computing
CR16: Data analysis and processing for networks

Evaluation

homeworks (1/3)
project (2/3)

Course outline

Introduction

Empirical risk minimization
Risk convexification and regularization
Bias-variance trade-off, and risk bounds

Supervised learning

Ridge regression
Logistic regression
Perceptron and neural networks
Support vector machines
Other methods: nearest-neighbors, kernel methods, etc.

Unsupervised learning

Principal component analysis
Data clustering
Other methods: canonical correlation analysis, sparse coding, etc.

Reading material

Machine Learning and Statistics

Vapnik, The nature of statistical learning theory. Springer
Hastie, Tibshirani, Friedman, The elements of statistical learning. (free online)
Devroye, Gyorfi, Lugosi, A probabilistic theory of pattern recognition. Springer
Dubashi, Panconesi, Concentration of measure for analysis of randomized algorithms, Cambridge University Press
J Shawe-Taylor, N Cristianini. Kernel methods for pattern analysis. 2004.
Slides by Jean-Philippe Vert on kernel methods.

Optimization

S. Boyd and L. Vandenberghe. Convex Optimization. 2004. (free online)
D. Bertsekas. Nonlinear Programming. 2003.

Calendar

Date	Lecturer	Topic	Scribes
12/09	LJ	Introduction + Bias-variance tradeoff. slides Scribe notes	Sebastien Jonglez Stephane Durand
19/09	LJ	Supervised Learning - SVM slides Scribe notes	Raphael Bournhonesque
26/09	LJ	Empirical risk minimization - cross validation slides Scribe notes	Martin Privat
03/10	JM	Convex optimization principles - Non-parametric estimation Scribe notes	Antoine Pouille
10/10	JM	Introduction to kernels and RKHS Scribe notes	Sebastian Scheibner
24/10	JM	Kernels methods and kernel examples Scribe notes	Guinard Brieuc Emma Prudent
14/11	JS	Unsupervised learning	Mouhcine Mendil
21/11	JS	Unsupervised learning Scribe notes	Aurore Alcolei

Scribe notes

For each course, a duo of students commit to turn their notes into latex format. A cool package, due to students from last year can be found here.

Homeworks

There will be three homeworks given during the course. Each of them should be returned within three weeks. (no need to use LateX here).

Homework 1: due October 24th: pdf, code in R, data.
Homework 2: due November 25th: pdf.
Homework 3: due December 17th: pdf.

Projects

The project consists of implementing an article, doing some experiments, and writing a small report (less than 10 pages). It is also possible to study a theoretical paper instead of implementing a method. All reports should be written in LateX, and a pdf should be sent to the lecturers before January 5th. You can either comes with your own idea and discuss it with us, or we can give you some suggestions. You can also pick up one article in the list below, or look at the projects from last year.

So far, the project courses chosen by the students are

Project	Student(s)	Coach
Supervised text classification	Aurore Alcolei	JM
Prediction in social Networks article	Guinard Brieuc	LJ
Distributed robust learning	Mendil Mouhcine	JS
Audio processing and machine learning	Emma Prudent	JS
Latency prediction in TCP networks	Baptiste Jonglez	JM
Speaker Recognition	Raphael Bournhonesque	LJ
Text analysis	Martin Privat	JM
Safe feature elimination for the Lasso	Antoine Pouille	JS
Graph kernel for biology	Sebastian Scheibner	LJ

Jobs / Internships Opportunities

We have different intern/PhD opportunities in machine learning, image processing, bioinformatics and computer vision. It is best to discuss that matter early with us since the number of places is limited.