Towards Theoretical Understanding of Overparametrization in Deep Learning

Published in 2018

Guest Speaker: Jason Lee, Assistant Professor, University of Southern California

Date/Time: Thursday, October 4, 2018, 2:00 pm

Location:  3403 Siebel

Sponsored by the Department of Computer Science and DAIS


Abstract:  We provide new theoretical insights on why over-parametrization is effective in learning neural networks. For a k hidden node shallow network with quadratic activation and n training data points, we show as long as k>=sqrt(2n), over-parametrization enables local search algorithms to find a \emph{globally} optimal solution for general smooth and convex loss functions. Further, despite that the number of parameters may exceed the sample size, we show with weight decay, the solution also generalizes well.

Next, we analyze the implicit regularization effects of various optimization algorithms. In particular we prove that for least squares with mirror descent, the algorithm converges to the closest solution in terms of the bregman divergence. For linearly separable classification problems, we prove that the steepest descent with respect to a norm solves SVM with respect to the same norm. For over-parametrized non-convex problems such as matrix sensing or neural net with quadratic activation, we prove that gradient descent converges to the minimum nuclear norm solution, which allows for both meaningful optimization and generalization guarantees.


Bio: Jason Lee is an assistant professor in Data Sciences and Operations at the University of Southern California. Prior to that, he was a postdoctoral researcher at UC Berkeley working with Michael Jordan. Jason received his PhD at Stanford University advised by Trevor Hastie and Jonathan Taylor. His research interests are in statistics, machine learning, and optimization. Lately, he has worked on high dimensional statistical inference, analysis of non-convex optimization algorithms, and theory for deep learning.

This is joint work with Suriya Gunasekar, Mor Shpigel, Daniel Soudry, Nati Srebro, and Simon Du.

Xiang Ren, PhD Candidate, “Minimal-Effort StructMine: Turning Massive Corpora into Structures”

Published in 2017

Time: 11am-12pm

Location:  3401 Siebel Center


While “Big Data” technologies are gaining great successes in unlocking knowledge from structured data, real-world data are largely unstructured and in the form of natural-language text. One of the grand challenges is to turn such massive text data into machine-actionable structures. Yet, most existing systems have heavy reliance on human efforts when dealing with text corpora of various kinds, slowing down the development of downstream applications.

In this talk, I will introduce a data-driven framework, minimal-effort StructMine, that extracts factual structures from massive text corpora with minimal human involvement. In particular, I will discuss how to apply Minimal-Effort StructMine to solve three subtasks: from identifying typed entities in text, to refining entity types into more fine-grained levels, to understanding the typed relationships between entities. Together, these three solutions form a clear roadmap for turning a massive corpus into a structured network to represent factual knowledge. Finally, I will share some directions towards mining corpus-specific structured networks for knowledge discovery.


Xiang Ren is a Computer Science PhD candidate at University of Illinois at Urbana-Champaign, working with Jiawei Han and the Data and Information System Lab. Xiang’s research develops data-driven methods for turning unstructured text data into machine-actionable structures. More broadly, his research interests span data mining, machine learning, and natural language processing, with a focus on making sense of massive text corpora. His research has been recognized with a Google PhD Fellowship, Yahoo!-DAIS Research Excellence Award, C. W. Gear Outstanding Graduate Student Award, and has been transferred to US Army Research Lab and Microsoft Bing.