Lecturer: | G. Garrigos |
Period: |
Term 1 |
ECTS: | 3 |
Schedule | 24h |
## Prerequisites
- Calculus : Differentiability, Convexity, Optimality conditions
- Linear Algebra : Eigenvalues, Singular values
- Probabilities : Conditional expectation
## Course objectives
- Understanding optimization algorithms used for training machine learning models
- Understanding the conditions and convergence properties for those algorithms
- Building enough knowledge to understand algorithms which will appear in the future
- Being able to implement those algorithms numerically
## Program
1. Gradient Descent for smooth problems
- convexity, strong convexity, smoothness
- convergence rates, notion of adaptivity
2. Stochastic Gradient Descent for smooth problems
- notions of expected smoothness, interpolation
- Various flavors of SGD : minibatches, important sampling
- complexity vs rates
3. Towards better methods
- Notion of optimal rates for an algorithm
- Inertial methods : Nesterov, Heavy Ball, Momentum methods
- Variance reduced methods : SAGA, SVRG
- Adaptive learning rates : Adagrad, ADAM
4. Nonconvex optimization
- Convergence of the methods with Lojasiewicz inequalities
- Asymptotic properties for large Neural Networks
## Evaluation
- 1 homework : practical session on python (at home), mid-period
- 1 exam on table, end of the period. Most questions come from the exercises seen in class or from the sheets.
## Références
- Borwein, J.M. & Lewis, A.S. (2006).Convex Analysis and Nonlinear Optimization: Theory and Examples. Springer.
- Nesterov Y. (2004). Introductory lectures on convex optimization. Springer.
- Peypouquet P. (2016). Convex Optimization in Normed Spaces. Springer