### Course information

Machine Learning is about the construction and study of systems that can automatically learn from data. With the emergence of massive datasets commonly encountered today, the need for powerful machine learning is of acute importance. Examples of successful applications include effective web search, anti-spam software, computer vision, robotics, practical speech recognition, and a deeper understanding of the human genome. In this course, we will give an introduction to this exciting field. We will focus on supervised learning, such as classification and ranking, and unsupervised learning problems, such as clustering and dimension reduction. We will study classical algorithms, and introduce tools to measure their performance, as well as their computational complexity.

#### Related courses

- CR07 Algorithms for Molecular Biology
- CR08: Combinatorial Scientific Computing
- CR16: Data analysis and processing for networks

#### Evaluation

- homeworks (1/3)
- project (2/3)

### Course outline

#### Introduction

- Empirical risk minimization
- Risk convexification and regularization
- Bias-variance trade-off, and risk bounds

#### Supervised learning

- Ridge regression
- Logistic regression
- Perceptron and neural networks
- Support vector machines
- Other methods: nearest-neighbors, kernel methods, etc.

#### Unsupervised learning

- Principal component analysis
- Data clustering
- Other methods: canonical correlation analysis, sparse coding, etc.

### Reading material

#### Machine Learning and Statistics

- Vapnik, The nature of statistical learning theory. Springer
- Hastie, Tibshirani, Friedman, The elements of statistical learning. (free online)
- Devroye, Gyorfi, Lugosi, A probabilistic theory of pattern recognition. Springer
- Dubashi, Panconesi, Concentration of measure for analysis of randomized algorithms, Cambridge University Press
- J Shawe-Taylor, N Cristianini. Kernel methods for pattern analysis. 2004.
- Slides by Jean-Philippe Vert on kernel methods.

#### Optimization

- S. Boyd and L. Vandenberghe. Convex Optimization. 2004. (free online)
- D. Bertsekas. Nonlinear Programming. 2003.

### Calendar

Date | Lecturer | Topic | Scribes |
---|---|---|---|

12/09 | LJ | Introduction + Bias-variance tradeoff. slides Scribe notes |
Sebastien Jonglez Stephane Durand |

19/09 | LJ |
Supervised Learning - SVM slides Scribe notes |
Raphael Bournhonesque |

26/09 | LJ | Empirical risk minimization - cross validation slides Scribe notes |
Martin Privat |

03/10 | JM |
Convex optimization principles - Non-parametric estimation Scribe notes |
Antoine Pouille |

10/10 | JM |
Introduction to kernels and RKHS Scribe notes |
Sebastian Scheibner |

24/10 | JM | Kernels methods and kernel examples Scribe notes | Guinard Brieuc Emma Prudent |

14/11 | JS | Unsupervised learning | Mouhcine Mendil |

21/11 | JS |
Unsupervised learning Scribe notes |
Aurore Alcolei |

### Scribe notes

For each course, a duo of students commit to turn their notes into latex format. A cool package, due to students from last year can be found here.### Homeworks

There will be three homeworks given during the course. Each of them should be returned within three weeks. (no need to use LateX here).-
**Homework 1: due October 24th**: pdf, code in R, data. -
**Homework 2: due November 25th**: pdf. -
**Homework 3: due December 17th**: pdf.

### Projects

The project consists of implementing an article, doing some experiments, and writing a small report (less than 10 pages). It is also possible to study a theoretical paper instead of implementing a method. All reports should be written in LateX, and a pdf should be sent to the lecturers before**January 5th**. You can either comes with your own idea and discuss it with us, or we can give you some suggestions. You can also pick up one article in the list below, or look at the projects from last year.

- Learning using large datasets
- Spectral ranking using seriation
- Graphical lasso
- Intersecting singularities for multi-structured estimation
- Convolution and local alignment kernel
- Distributed robust learning
- Optimization with quadratic penalties
- Safe feature elimination for the Lasso
- Bayesian model averaging
- Speaker recognition
- Graph kernels for biology
- Text categorization

Project | Student(s) | Coach |
---|---|---|

Supervised text classification | Aurore Alcolei | JM |

Prediction in social Networks article |
Guinard Brieuc | LJ |

Distributed robust learning | Mendil Mouhcine | JS |

Audio processing and machine learning | Emma Prudent | JS |

Latency prediction in TCP networks | Baptiste Jonglez | JM |

Speaker Recognition | Raphael Bournhonesque | LJ |

Text analysis | Martin Privat | JM |

Safe feature elimination for the Lasso | Antoine Pouille | JS |

Graph kernel for biology | Sebastian Scheibner | LJ |