The Department of Applied Mathematics weekly seminar is given by scholars and researchers working in applied mathematics, broadly interpreted.
Title: Facets of regularization in overparameterized machine learning
Abstract: Modern machine learning often operates in an overparameterized regime in which the number of parameters far exceeds the number of observations. In this regime, models can exhibit surprising generalization behaviors: (1) Models can overfit with zero training error yet still generalize well (benign overfitting); furthermore, in some cases, even adding and tuning explicit regularization can favor no regularization at all (obligatory overfitting). (2) The generalization error can vary non-monotonically with the model or sample size (double/multiple descent). These behaviors challenge classical notions of overfitting and the role of explicit regularization.
In this talk, I will present theoretical and methodological results related to these behaviors, primarily focusing on the concrete case of ridge regularization. First, I will identify conditions under which the optimal ridge penalty is zero (or even negative) and show that standard techniques such as leave-one-out and generalized cross-validation, when analytically continued, remain uniformly consistent for the generalization error and thus yield the optimal penalty, whether positive, negative, or zero. Second, I will introduce a general framework to mitigate double/multiple descent in the sample size based on subsampling and ensembling and show its intriguing connection to ridge regularization. As an implication of this connection, I will show that the generalization error of optimally tuned ridge regression is monotonic in the sample size (under mild data assumptions) and mitigates double/multiple descent. Key to both parts is the role of implicit regularization, either self-induced by the overparameterized data or externally induced by subsampling and ensembling. Finally, I will briefly mention some extensions and variants beyond ridge regularization.
The talk will feature joint work with the following collaborators (in alphabetical order): Pierre Bellec, Jin-Hong Du, Takuya Koriyama, Arun Kumar Kuchibhotla, Alessandro Rinaldo, Kai Tan, Ryan Tibshirani, Yuting Wei. The corresponding papers (in talk-chronological order) are: optimal ridge landscape , ridge cross-validation, risk monotonization, ridge equivalences, and extensions and variants.