Vergangene Vorträge im Joint Analysis Seminar
In den vergangenen sechs Monaten haben keine Veranstaltungen stattgefunden.
Vergangene Vorträge im Oberseminar
Andrea Paudice (Aarhus University, online):
General Tail Bounds for Non-Smooth Stochastic Mirror Descent
We study the problem of minimizing a convex, non-smooth Lipschitz function over a convex domain when only noisy stochastic subgradient estimates are available. We analyze
the classical Stochastic Mirror Descent (SMD) algorithm and derive new tail bounds on its optimization error, for both the averaged and the last iterate. Our results extend existing
analyses - traditionally limited to light-tailed, sub-Gaussian noise - to heavier-tailed noise distributions. We specialize our general bounds to two important families of noise: one
with exponential tails and another with polynomial tails. Notably, our bounds for the averaged iterate reveal a distinct two-regime behavior, highlighting new insights into the
interplay between noise tails and convergence rates.
(Abstract ein-/ausblenden)
Christian Kühn (TUM, hybrid):
Dynamics on Deep Neural Networks
In this talk, I am going to focus on the dynamics of information propagation on neural networks. In particular, we shall study the universal embedding problem and illuminate the role of bottlenecks, augmentation, Morse functions, and delay in the design of deep neural networks using methods from nonlinear dynamics.
(Abstract ein-/ausblenden)
Alexander Posch (University of Wien, hybrid):
Entropic Mirror Descent for Linear Systems: Polyak's Stepsize and Implicit Bias
In this talk I will discuss the application of entropic mirror descent to solving linear systems and analyze the resulting $\ell_1$-norm implicit bias. I begin by establishing bounds
on the $\ell_1$-norm of the solution when the algorithm is initialized close to zero. I then discuss the convergence analysis of this scheme, where the main challenge stems from the
unboundedness of the domain. To address this without imposing restrictive assumptions, we introduce a variant of the Polyak stepsize. For this stepsize we establish both sublinear and
linear convergence rates, which also extend to arbitrary convex $L$-smooth functions. Finally, we propose an alternative algorithm that avoids exponentiation, resembling gradient
descent with Hadamard overparametrization, while maintaining provable convergence guarantees.
(Abstract ein-/ausblenden)
Johannes Hertrich (Université Paris Dauphine, online):
Fast Summations via Kernel Slicing
The fast computation of large kernel sums is a challenging task which arises as a subproblem in any kernel method. Initially, this problem has complexity O(N²), where N
is the number of considered data points. In this talk, we propose an approxmation algorithm which reduces this complexity to O(N). Our approach is based on two ideas. First, we prove
that under mild assumptions radial kernels can be represented as sliced version of some one-dimensional kernel and derive an analytic formula for the one-dimensional counterpart.
Hence, we can reduce the d-dimensional kernel summation to a one-dimensional setting. Second, for solving these one-dimensional problems efficiently, we apply fast Fourier summations
on non-equispaced data or a sorting algorithm. We prove bounds for the slicing error, employ quasi-Monte Carlo methods for improved error rates and demonstrate the advantages of our
approach by numerical examples. Finally, we present an application in image generation.
(Abstract ein-/ausblenden)
David Holzmueller (INRIA Paris, hybrid):
Generalization of Neural Networks and Kernel Methods: Theory and Practice
Training neural networks on vectors of continuous or discrete data is a classical and highly relevant problem, yet there are still many questions and new developments
from both a theoretical and applied standpoint. From a theoretical viewpoint, I will discuss how neural networks are connected to kernel methods and can be studied using linearization.
In some cases, this allows us to prove whether they generalize to new data with or even without regularization. Finally, I will provide a quick summary of what matters in practice, and
how we managed to create kernel methods and neural networks achieving state-of-the-art results on applied benchmarks.
(Abstract ein-/ausblenden)
Jason Altschuler (University of Pennsylvania, online):
Negative Stepsizes Make Gradient-Descent-Ascent Converge
Solving min-max problems is a central question in optimization, games, learning, and controls. Arguably the most natural algorithm is Gradient-Descent-Ascent (GDA), however since the 1970s,
conventional wisdom has argued that it fails to converge even on simple problems. This failure spurred the extensive literature on modifying GDA with extragradients, optimism, momentum, anchoring,
etc. In contrast, we show that GDA converges in its original form by simply using a judicious choice of stepsizes.
The key innovation is the proposal of unconventional stepsize schedules that are time-varying, asymmetric, and (most surprisingly) periodically negative. We show that all three properties are
necessary for convergence, and that altogether this enables GDA to converge on the classical counterexamples (e.g., unconstrained convex-concave problems). The core intuition is that although
negative stepsizes make backward progress, they de-synchronize the min/max variables (overcoming the cycling issue of GDA) and lead to a slingshot phenomenon in which the forward progress in the
other iterations is overwhelmingly larger. This results in fast overall convergence. Geometrically, the slingshot dynamics leverage the non-reversibility of gradient flow: positive/negative steps
cancel to first order, yielding a second-order net movement in a new direction that leads to convergence and is otherwise impossible for GDA to move in. Joint work with Henry Shugart.
(Abstract ein-/ausblenden)