Optimization, machine learning, and pluri-disciplinarity workshop

Tuesday, March 21st, 2017


Grand Amphithéâtre, Inria Grenoble - Rhône-Alpes (Montbonnot/Inovallée site: Directions)


Online registration is closed. You can still attend the workshop, but you will have no guarantee to have some food for the lunch.


Nathalie Gillot and Julien Mairal (firstname.lastname@inria.fr)


9:00 - 9:30Room open + coffee
9:30 - 9:50   Bikash Joshi (UGA / LIG)
Aggressive Double Sampling for Multi-class to Binary Reduction
Abstract [slides]
We address the problem of multiclass classification in the case where the number of classes is very large. We propose a multiclass to binary reduction strategy, in which we transform the original problem into a binary classification one over pairs of examples. We derive generalization bounds for the error of the classifier of pairs using local Rademacher complexity, and a double sampling strategy (in the terms of examples and classes) that speeds up the training phase while maintaining a very low memory usage. Experiments are carried for text classification on DMOZ and Wikipedia collections with up to 20,000 classes in order to show the efficiency of the proposed method.
9:50 - 10:40   Robert M. Gower (Inria Sierra)
Stochastic Variance Reduced Methods Based on Sketching and Projecting the Jacobian
Abstract [slides]
We present a new perspective on stochastic variance reduced (SVR) methods as methods that maintain an estimate of the Jacobian of an auxiliary vector valued function. This auxiliary vector valued function is formed by stacking the individual data functions from the empirical risk minimization problem. Through this observation we extend the class of SVR methods by updating the Jacobian estimate using randomized sparse sketches of the true Jacobian. By choosing different randomized sketches we recover know methods: the SAG and SAGA method, their mini-batch variants and even non-uniform sampling variants. These new SVR methods all converge linearly, as dictated by a single convergence theorem. When specialized to known methods, our convergence theorem recovers the best known convergence results for SAGA, and furthermore, we obtain new results for mini-batch and non-uniform sampling variants of SAGA. Thus our work unites all SAGA variants under one framework. Furthermore, this is the first work where the non-uniform sampling variants of SAGA are directly analysed.
10:40 - 11:10   Coffee
11:10 - 12:00 Chloé-Agathe Azencott (Institut Curie / INSERM / Mines ParisTech)
Feature Selection in High Dimension for Precision Medicine
Abstract [slides]

Differences in disease predisposition or response to treatment can be explained in great part by genomic differences between individuals. This realization has given birth to precision medicine, where treatment is tailored to the genome of patients. This field depends on collecting considerable amounts of molecular data for large numbers of individuals, which is being enabled by thriving developments in genome sequencing and other high-throughput experimental technologies.

Unfortunately, we still lack effective methods to reliably detect, from this data, which of the genomic features determine a phenotype such as disease predisposition or response to treatment. One of the major issues is that the number of features that can be measured is large (easily reaching tens of millions) with respect to the number of samples for which they can be collected (more usually of the order of hundreds or thousands), posing both computational and statistical difficulties.

In my talk I will discuss several ways to address this problem, from reducing the dimensionality of the feature space by imposing on it a structure derived from prior knowledge, to increasing the number of samples via multi-task approaches.

Related references:
C.-A. Azencott, D. Grimm, M. Sugiyama, Y. Kawahara and K. M. Borgwardt. Efficient network- guided multi-locus association mapping with graph cuts. Bioinformatics, 29(13), 2013.
M. Sugiyama, C.-A. Azencott, D. Grimm, Y. Kawahara and K. Borgwardt. Multi-task feature selection on multiple networks via maximum flows. SIAM ICDM, 2014.
Victor Bellon, Véronique Stoven, and Chloé-Agathe Azencott. Multitask feature selection with task descriptors, Pacific Symposium on Biocomputing, 2016.

Speaker bio:
Chloé-Agathe Azencott is a junior research faculty at Mines ParisTech (Paris, France). She belongs to the Centre for Computational Biology, a joint research group between Mines ParisTech, Institut Curie and INSERM focusing on machine learning for cancer research. She holds a PhD in computer science from University of California, Irvine (USA), which she obtained in 2010. From 2011 to 2013 she was a postdoctoral fellow at the Max Planck Institutes in Tübingen (Germany). Her research interests revolve around developing machine learning approaches for therapeutic research. This ranges from chemoinformatics methods for drug discovery to the analysis of large-scale, heterogeneous, whole-genome data for precision medicine. For more details see http://cazencott.info.

12:00 - 12:20   Alberto Bietti (Inria Thoth)
Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure
Abstract [slides]
Stochastic optimization algorithms with variance reduction have proven successful for minimizing large finite sums of functions. However, in the context of empirical risk minimization, it is often helpful to augment the training set by considering random perturbations of input examples. In this case, the objective is no longer a finite sum, and the main candidate for optimization is the stochastic gradient descent method (SGD). In this paper, we introduce a variance reduction approach for this setting when the objective is strongly convex. After an initial linearly convergent phase, the algorithm achieves a O(1/t) convergence rate in expectation like SGD, but with a constant factor that is typically much smaller, depending on the variance of gradient estimates due to perturbations on a single example. Extensions of the algorithm to composite objectives and non-uniform sampling are also studied.
12:20 - 14:00   Lunch (for registered participants, for others, cantine / restaurants are at 1 min walking)
14:00 - 14:50   Alexandre Gramfort (Telecom ParisTech / CEA NeuroSpin)
Statistical Learning and optimization for MRI data mining
Abstract [slides]

Over the last 30 years, functional MRI (fMRI) has allowed to image the active brain non-invasively. Yet, the development of processing tools to extract knowledge from fMRI data is still an active topic of research. For example, despite the common usage of a canonical, data-independent, hemodynamic response function (HRF), it is known that the shape of the HRF varies across brain regions and subjects. In the first part of my talk I will explain how a data-driven estimation of this function leads to more statistical power in the context of supervised learning with fMRI [0]. In a second part of my talk I will detail how one can learn a subject specific forward model of fMRI data in visual areas, allowing us to replicate neuroscience findings without actual task specific measurements. This is achieved using features obtained via a deep convolutional network applied to input stimuli, features that we use to predict fMRI data [1]. Finally, if time permits I will detail how data smoothing for group analysis could some day be unecessary. The approach I will detail is based on optimal transport and new algorithmic findings making the method tractable on full brain data [2].

Some references

[0] Data-driven HRF estimation for encoding and decoding models, Fabian Pedregosa, Michael Eickenberg, Philippe Ciuciu, Bertrand Thirion and Alexandre Gramfort, Neuroimage 2015
[1] Seeing it all: Convolutional network layers map the function of the human visual system Michael Eickenberg, Alexandre Gramfort, Gaël Varoquaux, Bertrand Thirion, Neuroimage 2016
[2] Fast Optimal Transport Averaging of Neuroimaging Data A Gramfort, G Peyre, M Cuturi, IPMI Conf. 2015

14:50 - 15:10   Emilie Neveu (Université de Lausanne)
PEPSI-Dock: Learning atom to atom interaction potentials to predict 3D structures of protein-protein complexes
Abstract [slides]

Proteins are a part of all the complex molecular machineries involved in our cells to perform communication, storage, transport as well as defense. Many of these functions rely on the recognition of specific 3D molecular shapes associated with their biochemical characteristics. Indeed, in drug design, we aim at finding proteins that could inhibit viruses or bacteria by binding to these. Thus, the selection of putative candidates for drug development starts with docking prediction algorithms. Those algorithms predict the conformation of a complex of proteins from knowledge of their individual structures, including possible conformational changes that each protein could undertake.


Docking algorithms rely on a combination of sampling and scoring methods, adapted to different scales. Polynomial Expansion of Protein Structures and Interactions for Docking (PEPSI-Dock) improves the accuracy of the first stage of the docking pipeline, which will sharpen up the final predictions. Indeed, PEPSI-Dock benefits from the precision of a very detailed data-driven model of the binding free energy used with a global exhaustive search in the rigid-body (6D) space. As well as being accurate, our computations are among the fastest by virtue of the sparse representation of the pre-computed potentials and FFT-accelerated sampling techniques. Overall, this is the first demonstration of a FFT- accelerated docking method coupled with an arbitrary-shaped distance-dependent interaction potential.


Here, I will first explain the biological context and present the learning process to compute data-driven distant- dependent pairwise potentials. The potential coefficients are learned by combining machine-learning techniques (Support Vector Machine) with physically interpretable descriptors. Then, I will describe the integration of the deduced potentials into a FFT-accelerated spherical sampling provided by the Hex library to test the power of prediction of our method.

Related references: https://doi.org/10.1093/bioinformatics/btw443

Speaker bio:

Emilie Neveu is a research assistant in Dirk Fasshauer group in the Department of Fundamental Neurosciences at the University of Lausanne. After working at the development of prediction methods using structural information with Sergei Grudinin in Inria Nano-d Team, she is now exploring what can protein sequence analyses reveal about interactions and specificities in vesicle trafficking and in the fusion machinery.

15:10 - 15:40   Coffee
15:40 - 16:30Olivier Fercoq (Telecom ParisTech)
Restarting Accelerated Gradient Methods with a Rough Strong Convexity Estimate
Abstract [slides]
We propose new restarting strategies for accelerated gradient and accelerated coordinate descent methods. Our main contribution is to show that under a local error bound condition, the restarted method has a geometric rate of convergence for any restarting frequency. Hence, it allows us to take profit of restarting even when we do not know the strong convexity coefficient. The scheme can be combined with adaptive restarting, leading to the first provable convergence for adaptive restarting schemes with accelerated coordinate descent methods. Finally, we illustrate the properties of the algorithm on a regularized logistic regression problem and on a Lasso problem.
macaron  macaron