# Past Talks in Joint Analysis Seminar

There weren't any events in the past six months.

# Past Talks in Post Graduate Seminar

22.07.2024, 16:00

**Sakirudeen Abdulsalaam** (LMU Munich):**Phase Recovery from Antenna Measurements with Random Masks**

The radiation characteristics of an antenna under test (AUT) is one of the most important antenna properties. Spherical Near Field (SNF) measurements are known to be the most accurate characterization method. Despite its accuracy, SNF measurements pose several challenges, including the need for a significant number of samples and complicated mathematical transformation to derive the AUT's far-field (FF) radiation pattern from the complex near-field (NF) measurements. Furthermore, the phase acquisition becomes more challenging at higher frequencies. Therefore, research into AUT's characterization with measurements and transformation techniques based on amplitude information only has gained traction. The key challenge in this case is to compute coefficients describing the AUT's radiation behaviour from amplitude NF measurements. PhaseLift has been shown in the literature to be one of the successful methods for phase recovery from discrete Fourier Transform (DFT) measurements with random masks. In this talk, we present application of this technique to antenna measurements. We show the promising results of our numerical experiments and lay some of the theoretical foundations for establishing theoretical guarantees for the success of PhaseLift in phase recovery from antenna measurements.

15.07.2024, 16:00

**El Mehdi Achour** (RWTH Aachen):**The Energy Landscape of Predictive Coding Networks**

Predictive coding (PC) is a brain-inspired learning algorithm that performs local updates of network activities as well as weights. Recent work has begun to study the
properties of PC compared to backpropagation (BP) with gradient descent (GD), but its training dynamics still remain poorly understood. It is known that the loss landscape of deep
neural networks abounds with ``non-strict'' saddle points where the Hessian is positive semi-definite, which can lead to vanishing gradients and exponentially slow GD convergence.
Here, we present theoretical and empirical evidence that the energy at the PC equilibrium of the network activities might only have ``strict'' saddles with negative curvature. For deep
linear networks, we prove that the saddle at the origin of the energy is strict, in contrast to the mean squared error (MSE) loss where it is non-strict for any network with more than
one hidden layer. We support our theory with experiments on both linear and non-linear networks, showing that when initialised close to the origin, PC converges substantially faster
than BP with stochastic GD. In addition, we prove that other non-strict saddles of the MSE than the origin become strict in the equilibrated energy. Overall, these results highlight
the higher robustness to initialisation of PC and raise further questions about the relationship between the loss and the energy landscape.

Joint work with Francesco Innocenti, Ryan Singh, and Christopher L. Buckley.

08.07.2024, 16:00

**Sjoerd Dirksen** (Utrecht University):**The separation and memorization capacities of neural networks**

In this talk, I will first consider the separation capacity of neural nets - the ability to transform two classes (with positive distance) into linearly separable ones.
I will show that already sufficiently large two-layer ReLU nets with random weights and biases, which arise as the initialization of ReLU nets, have this capacity with high
probability. In the second part of this talk I will build on the geometric insights behind this result to shed new light on the memorization capacity of neural nets - their ability to
perfectly fit given labeled data. I will present a simple randomized algorithm to produce a small interpolating neural net for given data with two classes. This method yields new
memorization capacity results for neural nets that move beyond worst case lower bounds in the literature. In both results, the sufficient size of the network is linked to new
complexity measures that describe both the geometric properties of the two classes and their mutual arrangement.

Based on joint works with Patrick Finke (UU), Martin Genzel (UU, Merantix Momentum), Laurent Jacques (UC Louvain), Alexander Stollenwerk (UC Louvain)

24.06.2024, 16:00

**Debarghya Ghoshdastidar** (TU Munich):**When can we Approximate Wide Contrastive Models with Neural Tangent Kernels and Principal Component Analysis?**

Contrastive learning is a paradigm for learning representations from unlabelled data that has been highly successful for image and text data. Several recent works have examined contrastive losses to claim that contrastive models effectively learn spectral embeddings, while few works show relations between (wide) contrastive models and kernel principal component analysis (PCA). However, it is not known if trained contrastive models indeed correspond to kernel methods or PCA. In this work, we analyze the training dynamics of two-layer contrastive models, with non-linear activation, and answer when these models are close to PCA or kernel methods. It is well known in the supervised setting that neural networks are equivalent to neural tangent kernel (NTK) machines, and that the NTK of infinitely wide networks remains constant during training. We provide the first convergence results of NTK for contrastive losses, and present a nuanced picture: NTK of wide networks remains almost constant for cosine similarity based contrastive losses, but not for losses based on dot product similarity. We further study the training dynamics of contrastive models with orthogonality constraints on output layer, which is implicitly assumed in works relating contrastive learning to spectral embedding. Our deviation bounds suggest that representations learned by contrastive models are close to the principal components of a certain matrix computed from random features. We empirically show that our theoretical results possibly hold beyond two-layer networks.

17.06.2024, 16:00

**Reinhard Heckel** (TU Munich):**Data-centric Deep Learning Based Imaging**

Artificial intelligence (AI) based image reconstruction yields significant performance boosts performance for magnetic resonance imaging and computed tomography, and enables the imaging of dynamic objects such as the moving heart. AI can also automate the imaging process, and enables accurate imaging under challenging conditions such as patient movement. However, with the current wave of generative AI, a concern is that AIs trained on large datasets might cause hallucinations and robustness issues in medical imaging. Perhaps surprisingly, the opposite is true: Large amounts of diverse training data are key to building robust and reliable AI-based imaging systems. This talk is about the opportunities and challenges that AI based imaging offers in modern medical imaging.

07.06.2024, 14:00

**Hans Feichtinger** (University of Wien):**Signals ARE mild distributions: An alternative approach to Fourier Analysis**

Since the presentation of the Segal algebra $\SOGN$ (for LCA groups) in February 1979 this space, meanwhile known as ``Feichtinger's Algebra'', and its dual, more
recently popularized under the name of ``mild distributions'', have been a tremendeously useful tool for Harmonic Analysis.

In contrast to the usual approach we want to discuss the following thesis:

Mild distributions are a suitable model for signals as understood in the application domain. Signals are described by measurements, very much like a picture is not an
$\Ltsp$-function on a square well defined almost everywhere, but rather something which allows to take digital images using fine sensors.
Trying to describe signals as elements of $\SOPRd$ and measurements as test functions from $\SORd$ we can build up the Banach Gelfand Triple $\SOGTrRd$ which should be taken as the
basis for Time-Frequency and Gabor Analysis, but also for Classical Fourier Analysis or the theory of pseudo-differential operators.
The extended (fractional) Fourier transform leaving the space of mild distributions invariant supports the physicists intuition that signals can be either described in the time or in the frequency domain, and of course in many different ways as
(bounded, continuous) functions on phase space.

03.06.2024, 16:00

**Mark Peletier** (TU Eindhoven):**Singular-limit analysis of training with noise injection**

Many training algorithms inject some form of noise in the training. The classical example is the mini-batch noise in Stochastic Gradient Descent, but other examples are dropout, data augmentation, 'noise nodes', 'label noise', and input-data noise. While the additional noise is generally believed to improve generalisation performance, there is little mathematical understanding of how this is achieved. In this talk I will describe recent work, together with Anna Shalova (TU/e) and André Schlichting (Münster -> Ulm), in which we analyse a fairly general class of iterative training schemes with noise injection. In the limit of small noise, we prove convergence of the appropriately rescaled time courses to solutions of an auxiliary evolution equation. This auxiliary equation is a gradient flow driven by a functional for which we obtain an explicit expression, thus opening the door to understanding the different types of regularisation generated by different types of noise injection.

27.05.2024, 16:00

**Arthur Jacot** (NYU Courant Institute):**Hamiltonian Mechanics of Feature Learning:Bottleneck Structure in Leaky ResNets**

We describe 'representation geodesics' $A_p$ in ResNets: continuous paths in representation space (similar to NeuralODEs) from input ($p=0$) to output ($p=1$) that minimize the parameter norm of the network. We give a Lagrangian and Hamiltonian reformulation, which highlight the importance of two terms: a kinetic energy which favors small layer derivatives $\partial_{p}A_{p}$ and a potential energy that favors low-dimensional representations, as measured by the 'Cost of Identity'. The balance between these two forces offers an intuitive understanding of feature learning in ResNets. We leverage this intuition to explain the emergence of a bottleneck structure, as observed in previous work: for large $\tilde{L}$ the potential energy dominates and leads to a separation of timescales, where the representation jumps rapidly from the high dimensional inputs to a low-dimensional representation, move slowly inside the space of low-dimensional representations, before jumping back to the high-dimensional outputs.

13.05.2024, 16:00

**Adit Radhakrishnan** (Harvard University):**How do neural networks learn features from data?**

Understanding how neural networks learn features, or relevant patterns in data, for prediction is necessary for their reliable use in technological and scientific applications. We propose a unifying mechanism that characterizes feature learning in neural network architectures. Namely, we show that features learned by neural networks are captured by a statistical operator known as the average gradient outer product (AGOP). Empirically, we show that the AGOP captures features across a broad class of network architectures including convolutional networks and large language models. Moreover, we use AGOP to enable feature learning in general machine learning models through an algorithm we call Recursive Feature Machine (RFM). We show that RFM automatically identifies sparse subsets of features relevant for prediction and explicitly connects feature learning in neural networks with classical sparse recovery and low rank matrix factorization algorithms. Overall, this line of work advances our fundamental understanding of how neural networks extract features from data, leading to the development of novel, interpretable, and effective models for use in scientific applications.

06.05.2024, 16:00

**Dominik Stöger** (KU Eichstätt-Ingolstadt):**Breaking the quadratic rank bottleneck in non-convex matrix sensing: Recovery guarantees with (near-)optimal sample complexity**

Low-rank matrix recovery problems are ubiquitous in many areas of science and engineering. Most of the methods that have been studied for these problems can be divided
into two categories: Convex optimization approaches based on nuclear norm minimization, and non-convex approaches that use factorized gradient descent.

While the latter promises to be computationally much less expensive, basically all existing recovery guarantees for factorized gradient descent are much more pessimistic with respect
to the number of samples required. In particular, they require the number of samples to scale quadratically with the rank of the ground truth matrix. This is in stark contrast to
empirical observations which suggest that the non-convex approaches perform as well as the convex ones with respect to the sample complexity.

In this talk, we resolve this issue and we present the first theoretical guarantees to the best of our knowledges for matrix sensing that show that factorized gradient descent recovers
the ground truth matrix with a sample size that is optimal in the number of degrees of freedom. Our proof is based on new probabilistic decoupling arguments, which we expect to be of
independent interest. Joint work with Yizhe Zhu (UC Irvine).

29.04.2024, 10:00

**Massimo Fornasier** (TU Munich):**Wassertein Sobolev functions and their numerical approximations**

We start the talk by presenting general results of strong density of sub-algebras of bounded Lipschitz functions in metric Sobolev spaces. We apply such results to show the
density of smooth cylinder functionsin Sobolev spaces of functions on the Wasserstein space $\mathcal P_2$ endowed with a finite positive Borel measure. As a byproduct, we obtain the
infinitesimal Hilbertianity of Wassertein Sobolev spaces. By taking advantage of these results, we further address the challenging problem of the numerical approximation of Wassertein
Sobolev functions defined on probability spaces. Our particular focus centers on the Wasserstein distance function, which serves as a relevant example. In contrast to the existing body
of literature focused on approximating efficiently pointwise evaluations, we chart a new course to define functional approximants by adopting three machine learning-based approaches:

1. Solving a finite number of optimal transport problems and computing the corresponding Wasserstein potentials.

2. Employing empirical risk minimization with Tikhonov regularization in Wasserstein Sobolev spaces.

3. Addressing the problem through the saddle point formulation that characterizes the weak form of the Tikhonov functional's Euler-Lagrange equation.

As a theoretical contribution, we furnish explicit and quantitative bounds on generalization errors for each of these solutions. In the proofs, we leverage the theory of metric
Sobolevspaces introduced above and we combine it with techniques of optimal transport,variational calculus, and large deviation bounds. In our numerical implementation, we harness
appropriately designed neural networks to serve as basis functions.Consequently, our constructive solutions significantly enhance at equal accuracy the evaluation speed, surpassing
that of state-of-the-art methods by several orders of magnitude.

The talk presents a collection of results with Pascal Heid, Giacomo Sodini, and Giuseppe Savaré.

08.02.2024, 16:15

**Laura Paul** (RWTH Aachen University):**Covariance Estimation for Massive MIMO**

Massive multiple-input multiple-output (MIMO) communication systems are very promising for wireless communication and fifth generation (5G) cellular networks. In massive MIMO, a large number of antennas are employed at the base station (BS), which provides a high degree of spatial freedom and enables the BS to simultaneously communicate with multiple user terminals. Due to the limited angular spread, the user channel vectors lie on low-dimensional subspaces. For each user, we aim to find a low-dimensional beamforming subspace that captures a large amount of the power of its channel vectors. We will see, that this signal subspace estimation problem can be reduced to finding a good estimator of the covariance matrix in terms of a truncated version of the nuclear norm. Since the channel covariance matrix is not a priori known in practice, it has to be estimated from the observed data samples. In this talk, theoretical guarantees for signal covariance and subspace estimation from compressed measurements are investigated. We derive improved bounds on the estimation error in terms of the number of observed time samples, the truncation and noise level.

01.02.2024, 16:15

**Arinze Folarin** (RWTH Aachen University):**Tensor Recovery: Exploring Hierarchical Tensor Representation in ISLET Algorithm**

This talk is intended to introduce you to the Importance Sketching Low-rank Estimation for Tensors (ISLET) Algorithm by Anru Zhang. The algorithm utilizes the High-Order Orthogonal Iteration tensor decomposition method to derive important sketching directions, which are valuable for the tensor estimates produced by the ISLET algorithm. I will introduce the Hierarchical Tensor representation to derive sketching directions, generating a variant of the ISLET algorithm. These algorithms produce tensor estimates using the responses and tensor covariates with randomized designs from a given low-rank tensor regression model, enabling the recovery of the unknown required low-rank tensor.