M2MO: Modélisation Aléatoire, Finance et Data Science

Master en statistique, probabilités et finance - Université Paris 7 - Paris Diderot

Courses Group Data Science Optimization for machine learning

Optimization for machine learning

Lecturer:  G. Garrigos
Term 1
Schedule 24h


## Prerequisites

- Calculus : Differentiability, Convexity,  Optimality conditions
- Linear Algebra : Eigenvalues, Singular values
- Probabilities : Conditional expectation

## Course objectives

- Understanding optimization algorithms used for training machine learning models
- Understanding the conditions and  convergence properties for those algorithms
- Building enough knowledge to understand algorithms which will appear in the future
- Being able to implement those algorithms numerically

## Program

1. Gradient Descent for smooth problems
   - convexity, strong convexity, smoothness
   - convergence rates, notion of adaptivity
2. Stochastic Gradient Descent for smooth problems
   - notions of expected smoothness, interpolation
   - Various flavors of SGD : minibatches, important sampling
   - complexity vs rates
3. Towards better methods
   - Notion of optimal rates for an algorithm
   - Inertial methods : Nesterov, Heavy Ball, Momentum methods
   - Variance reduced methods : SAGA, SVRG
   - Adaptive learning rates : Adagrad, ADAM
4. Nonconvex optimization
   - Convergence of the methods with Lojasiewicz inequalities
   - Asymptotic properties for large Neural Networks

## Evaluation

- 1 homework : practical session on python (at home), mid-period
- 1 exam on table, end of the period. Most questions come from the exercises seen in class or from the sheets.

## Références

- Borwein, J.M. & Lewis,  A.S. (2006).Convex Analysis and Nonlinear Optimization: Theory and Examples. Springer.
- Nesterov Y. (2004). Introductory lectures on convex optimization. Springer.
- Peypouquet P. (2016). Convex Optimization in Normed Spaces.  Springer