April 15th, 2014, 4:00pm
CSE 691 (The Gates Commons)
Sanjeev Arora, Department of Computer Science, Princeton.
Overcoming Intractability in Unsupervised Learning
Abstract: Unsupervised learning – i.e., learning with unlabeled data – is increasingly important given today’s data deluge. Most natural problems in this domain – e.g. for models such as mixture models, HMMs, graphical models, topic models and sparse coding/dictionary learning, deep learning – are NP-hard. Therefore researchers in practice use either heuristics or convex relaxations with no concrete approximation bounds. Several nonconvex heuristics work well in practice, which is also a mystery.
The talk will describe a sequence of recent results whereby rigorous approaches leading to polynomial running time are possible for several problems in unsupervised learning. The proof of polynomial running time usually relies upon nondegeneracy assumptions on the data and the model parameters, and often also on stochastic properties of the data (average-case analysis). Some of these new algorithms are very efficient and practical – e.g. for topic modeling.
Biosketch: Sanjeev Arora is the Charles C. Fitzmorris Professor of Computer Science at Princeton University. His research spans Theoretical Computer Science, including foundational work in Algorithms and Computational Complexity. His recent work focuses on provable bounds for Machine Learning. He has received numerous awards and distinction. Most recently, these include the 2012 Fulkerson Prize, the 2011 ACM-Infosys Foundation Award in Computing Sciences, the 2010 Goedel Prize, and the Best Paper Award at the 2010 Annual Symposium on Foundations of Computing. He is an ACM Fellow and recipient of the 1995 ACM Doctoral Dissertation Award.