Abstract
In this talk, we introduce an analysis of deep neural networks through convex optimization, sparse signal processing and geometric (Clifford) algebra. We begin by introducing exact convex optimization formulations for ReLU neural networks. This approach demonstrates that deep networks can be globally trained through convex programs, offering a globally optimal solution. Our results further establish an equivalent characterization of neural networks as high-dimensional convex Lasso models, traditionally employed in sparse signal processing. These models employ a discrete set of features and apply sparsity-inducing convex regularization to fit data. Our framework provides an intuitive geometric interpretation where the optimal neurons represent signed volumes of parallelotopes formed by data vectors. Specifically, we show that the Lasso signal dictionary is constructed from a discrete set of wedge products of input samples, with deeper network architectures leading to geometric reflections of these features. This analysis also reveals that standard convolutional neural networks can be globally optimized in fully polynomial time. Numerical simulations validate our claims, illustrating that the proposed convex approach is faster and more reliable than standard local search heuristics, such as stochastic gradient descent and its variants. We also discuss extensions to batch normalization, generative adversarial networks, transformers and diffusion models.
Bio
Mert Pilanci is an assistant professor of Electrical Engineering at Stanford University. He earned his Ph.D. in Electrical Engineering and Computer Science from UC Berkeley in 2016. Before joining Stanford, he served as an assistant professor of Electrical Engineering and Computer Science at the University of Michigan. In 2017, he was a Math+X postdoctoral fellow working with Emmanuel Candès at Stanford University. Mert’s research interests include neural networks, machine learning, optimization, and signal processing. His group focuses on developing theory and algorithms to solve large-scale optimization problems in machine learning. Additionally, his research aims to create safe and interpretable artificial intelligence while exploring the information-theoretic foundations of distributed computing.
