Research Colloquium with Botond Szabó (Bocconi University) and Jonas Latz (University of Manchester)

Botond Szabó & Jonas Latz, Bocconi University & Manchester University 10:00-11:45

Title talk Botond Szabó: Deep Vecchia Gaussian Processes 

Abstract: Deep Gaussian processes have several advantages compared to standard Gaussian Processes (GPs), including learning local and compositional structures of the signal. However, their practical applicability is hindered by their computational complexity and instability of approximation methods. In this work we propose a novel Vecchia approximation of Deep GPs, derive optimal contraction rates for structured (compositional) functions and discuss algorithmic aspects. We also demonstrate the applicability of the proposed method on synthetic data sets. Based on an ongoing joint work with Ismael Castillo (Sorbone, Paris), Thibault Randrianarisoa (Vector Institute, Toronto) and Yichen Zhou (University of Hong Kong).

 

Title talk Jonas Latz: Losing momentum in continuous-time stochastic optimisation

Abstract: The training of modern machine learning models often consists in solving high-dimensional non-convex optimisation problems that are subject to large-scale data. In this context, momentum-based stochastic optimisation algorithms have become particularly widespread. The stochasticity arises from data subsampling which reduces computational cost. Both, momentum and stochasticity help the algorithm to converge globally. In this work, we propose and analyse a continuous-time model for stochastic gradient descent with momentum. This model is a piecewise-deterministic Markov process that represents the optimiser by an underdamped dynamical system and the data subsampling through a stochastic switching. We investigate longtime limits, the subsampling-to-no-subsampling limit, and the momentum-to-no-momentum limit. We are particularly interested in the case of reducing the momentum over time. Under convexity assumptions, we show convergence of our dynamical system to the global minimiser when reducing momentum over time and letting the subsampling rate go to infinity. We then propose a stable, symplectic discretisation scheme to construct an algorithm from our continuous-time dynamical system. In experiments, we study our scheme in convex and non-convex test problems. Additionally, we train a convolutional neural network in an image classification problem. Our algorithm attains competitive results compared to stochastic gradient descent with momentum.

Joint work with Kexin Jin, Chenguang Liu, and Alessandro Scagliotti.

Associated paper: Jin et al. 2025: Journal of Machine Learning Research 26(148):1-55