Lecturer: |
G. Garrigos |

Period: |
Term 1 |

ECTS: |
3 |

Schedule |
24h |

## Prerequisites

- Calculus : Differentiability, Convexity, Optimality conditions

- Linear Algebra : Eigenvalues, Singular values

- Probabilities : Conditional expectation

## Course objectives

- Understanding optimization algorithms used for training machine learning models

- Understanding the conditions and convergence properties for those algorithms

- Building enough knowledge to understand algorithms which will appear in the future

- Being able to implement those algorithms numerically

## Program

1. Gradient Descent for smooth problems

- convexity, strong convexity, smoothness

- convergence rates, notion of adaptivity

2. Stochastic Gradient Descent for smooth problems

- notions of expected smoothness, interpolation

- Various flavors of SGD : minibatches, important sampling

- complexity vs rates

3. Towards better methods

- Notion of optimal rates for an algorithm

- Inertial methods : Nesterov, Heavy Ball, Momentum methods

- Variance reduced methods : SAGA, SVRG

- Adaptive learning rates : Adagrad, ADAM

4. Nonconvex optimization

- Convergence of the methods with Lojasiewicz inequalities

- Asymptotic properties for large Neural Networks

## Evaluation

- 1 homework : practical session on python (at home), mid-period

- 1 exam on table, end of the period. Most questions come from the exercises seen in class or from the sheets.

## Références

- Borwein, J.M. & Lewis, A.S. (2006).Convex Analysis and Nonlinear Optimization: Theory and Examples. Springer.

- Nesterov Y. (2004). Introductory lectures on convex optimization. Springer.

- Peypouquet P. (2016). Convex Optimization in Normed Spaces. Springer