Past Talks in Joint Analysis Seminar
There weren't any events in the past six months.
Past Talks in Post Graduate Seminar
17.02.2025, 16:15
Marco Mondelli (IST Austria):
Three Vignettes on Learning in High Dimensions with Gradient Descent: Optimization, Robustness and Differential Privacy
A popular line of work has analyzed the properties of over-parameterized neural networks trained via gradient descent through the lens of two high-dimensional maps:
Random Features (RF) and the Neural Tangent Kernel (NTK). In this talk, I will show how the characterization of these simplified models sheds light on (i) the optimization of neural
networks, (ii) their adversarial robustness, and (iii) their (differential) privacy guarantees.
I will start by proving tight bounds on the smallest eigenvalue of the NTK for deep neural networks with minimum over-parameterization. This implies that the network optimized by
gradient descent interpolates the training dataset (i.e., reaches 0 training loss), as soon as the number of parameters is information-theoretically optimal.
Next, I will focus on the robustness of the interpolating solution. A thought-provoking paper by Bubeck and Sellke has proposed a "universal law of robustness": interpolating
smoothly the data necessarily requires many more parameters than simple memorization. By providing sharp bounds on RF and NTK models, I will show that, while some random feature models
cannot be robust (regardless of the over-parameterization), a class of NTK features saturates the universal law of robustness, addressing a conjecture by Bubeck, Li and Nagaraj.
Finally, I will establish privacy guarantees for differentially private gradient descent (DP-GD). Understanding the performance cost of DP-GD with respect to standard GD has received
remarkable attention in the last decade. However, existing bounds on the excess risk tend to degrade with over-parameterization. By analyzing the trajectory of DP-GD in the RF model,
we show that privacy can be obtained for free for any sufficiently large number of parameters, thus challenging the common wisdom that over-parameterization inherently hinders
performance in private learning.
05.02.2025, 16:15
Ekkehard Schnoor (Fraunhofer HHI Berlin):
Concept Activation Vectors from a Statistical Learning Perspective
Concept Activation Vectors (CAVs) are a well-established tool in concept-based Explainable AI (XAI). This technique employs linear probing for concept encodings at the latent layers of a neural network, allowing for the quantification of specific concepts' influence on predictions. Despite operating on the complex non-linear transformations of the input data, CAVs are straightforward linear models, and linear classifiers have been systematically studied in the literature, with precise performance guarantees. However, while there is a wealth of empirical research, this statistical learning perspective on CAVs has received limited attention. In this talk, we will present initial steps to bridge this gap by examining the theoretical foundations of CAVs through the lens of large-dimensional statistics.
22.01.2025, 16:15
Johannes Maly (LMU Munich):
Subspace estimation from quantized samples
We study subspace estimation from coarsely quantized data. In particular, we analyze two stochastic quantization schemes which use dithering: a one-bit quantizer combined with rectangular dither and a multi-bit quantizer with triangular dither. For each quantizer, we derive rigorous high probability bounds for the distances between the true and estimated signal subspaces. Using our analysis, we identify scenarios in which subspace estimation via triangular dithering qualitatively outperforms rectangular dithering. We verify in numerical simulations that our estimates are optimal in their dependence on the smallest non-zero eigenvalue of the target matrix. This is joint work with S. Dirksen and W. Li.
15.01.2025, 16:15
Florentin Goyens (UC Louvain):
Riemannian trust-region methods for strict saddle functions with complexity guarantees
The difficulty of minimizing a nonconvex function is in part explained by the presence of saddle points, which slows down the algorithm. This also directly impacts the worst-case complexity guarantees. However, a number of nonconvex problems of interest possess a favorable structure for optimization, in the sense that saddle points can be escaped efficiently by an appropriate algorithm. This so-called strict saddle property has been extensively used in data science to derive good properties for first-order algorithms, such as convergence to second-order critical points. However, the analysis and the design of second-order algorithms in the strict saddle setting have received significantly less attention. We consider second-order trust-region methods for a class of strict saddle functions defined on Riemannian manifolds. These functions exhibit (geodesic) strong convexity around minimizers and negative curvature at saddle points. We show that the standard trust-region method with exact subproblem minimization finds an approximate local minimizer in a number of iterations that depends logarithmically on the accuracy parameter, which significantly improves known results for general nonconvex optimization. We also propose an inexact variant of the algorithm that explicitly leverages the strict saddle property to compute the most appropriate step at every iteration. Our bounds for the inexact variant also improve over the general nonconvex case, and illustrate the benefit of using strict saddle properties within optimization algorithms. Keywords: Riemannian optimization, strict saddle function, second-order method, complexity guarantees.
08.01.2025, 16:15
Oliver Hinder (Pittsburgh University):
Making SGD parameter-free
We develop an algorithm for parameter-free stochastic convex optimization (SCO) whose rate of convergence is only a double-logarithmic factor larger than the optimal rate for the corresponding known-parameter setting. In contrast, the best previously known rates for parameter-free SCO are based on online parameter-free regret bounds, which contain unavoidable excess logarithmic terms compared to their known-parameter counterparts. Our algorithm is conceptually simple, has high-probability guarantees, and is also partially adaptive to unknown gradient norms, smoothness, and strong convexity. At the heart of our results is a novel parameter-free certificate for the step size of stochastic gradient descent (SGD), and a time-uniform concentration result that assumes no a-priori bounds on SGD iterates. Additionally, we present theoretical and numerical results for a dynamic step size schedule for SGD based on a variant of this idea. On a broad range of vision and language transfer learning tasks our methods performance is close to that of SGD with tuned learning rate. Also, a per-layer variant of our algorithm approaches the performance of tuned ADAM. This talk is based on papers with Yair Carmon and Maor Ivgi.
11.12.2024, 16:15
André Uschmajew (Augsburg University):
Accelerating operator Sinkhorn iteration with overrelaxation
We propose accelerated versions of the operator Sinkhorn iteration for operator scaling using successive overrelaxation. We analyze the local convergence rates of these accelerated methods via linearization, which allows to determine the asymptotically optimal relaxation parameter based on Young's SOR theorem. Using the Hilbert metric on positive definite cones, we also obtain a global convergence result for a geodesic version of overrelaxation in a specific range of relaxation parameters. Numerical experiments demonstrate that our accelerated methods outperform the original operator Sinkhorn iteration in certain applications.
04.12.2024, 16:15
Robert Kunsch (RWTH Aachen):
Adaptive and non-adaptive randomized approximation of high-dimensional vectors
We study approximation of m-dimensional vectors based on randomized algorithms that use up to n linear measurements as information on a problem instance where n<<m. We aim to
achieve a small lq-error relative to the lp-norm of the vector where 1≤p≤m≤∞.
In the deterministic setting adaptive information is essentially no better than simpler non-adaptive information that can be represented by a fixed measurement matrix. In the
randomized setting, however, it turns out that, for 1≤p≤2, adaptive (i.e. sequential) measurements lead to a log(log(m))-dependence of the n-th minimal error. This stands in
contrast to non-adaptive methods where always a log(m)-dependence occurs. In fact, in the extrem case of 1=p<q=∞ we can prove that there exist problems where a gap of order n
(up to logarithmic terms) occurs between the error of optimal adaptive vs. non-adaptive methods.
The talk will present adaptive approximation algorithms that lead to these results. Further, improved
non-adaptive randomized methods are described that can compete with classical
l1-minimization approaches for p=1 and are superior to deterministic methods for p>1.
20.11.2024, 16:15
Laura Paul (LMU Munich):
Mathematical Methods for Art Investigation and Reconstruction
Advancements in imaging techniques, as well as recent developments in deep learning, have opened up a wide range of opportunities for art investigation and conservation. They allow for the non-invasive analysis of paintings, enabling art conservators to gain valuable insights into for example the structural and compositional aspects of a painting, its condition, and stylistic elements. Furthermore, inpainting techniques can suggest potential reconstructions of missing areas, aiding art conservators in the restoration process. This talk will provide an overview of the state-of-the-art mathematical methods applied to art conservation, with a specific focus on crack detection in paintings and inpainting techniques.
06.11.2024, 16:15
Arinze Folarin (LMU Munich):
Low-Rank Tensor Recovery via Tractable Algorithms
This presentation will focus on recovering low-rank tensors from Gaussian measurements. We will demonstrate how effective initial approximations and algorithms, such as Riemannian Gradient Iteration and Iterative Hard Thresholding, facilitate successful recovery with minimal measurements. Also, we will explore tensor decomposition techniques, including HOSVD and Tensor Train decomposition, which uncover low-rank structures. A numerical experiment will illustrate the effectiveness of the Riemannian Gradient Iteration, showing how it recovers the low-rank tensor in a minimal number of measurements, and we will compare its performance with that of the Iterative Hard Thresholding algorithm.
30.10.2024, 16:15
Leonardo Galli (LMU Munich):
Nonmonotone Line Searches Operate at the Edge of Stability
The traditional convergence analysis of Gradient Descent (GD) assumes the step size to be bounded from above by twice the reciprocal of the sharpness, i.e., the largest eigenvalue of the Hessian of the objective function. However, recent numerical observations on neural networks have shown that GD also converges with larger step sizes. In this case, GD may enter into the so-called edge of stability phase in which the objective function decreases faster than with smaller steps, but nonmonotonically. Interestingly enough, this same behaviour was already observed roughly 40 years ago when employing nonmonotone line searches. These methods are designed to accept larger steps than their monotone counterparts (e.g., Armijo) as they do not impose a decrease in the objective function, while still being provably convergent. In this paper, we show that nonmonotone line searches are able to operate at the edge of stability regime right from the start of the training. Moreover, we design a new resetting technique that speeds up training and yields flatter solutions, by keeping GD at the edge of stability without requiring hyperparameter-tuning or prior knowledge of problem-dependent constants. To conclude, we observe that the large steps yielded by our method seem to mimic the behavior of the well-known Barzilai-Borwein method.