Category Archives: CORE

Nina Balcan; Distributed Machine Learning

CORE Series
Tuesday, May 24, 2016, 4:00 pm
Electrical Engineering Building (EEB) 125

Maria-Florina Balcan, Carnegie Mellon University

TITLE: Distributed Machine Learning

ABSTRACT: We consider the problem of learning from distributed data and analyze fundamental algorithmic and communication complexity questions involved. Broadly, we consider a framework where information is distributed between several locations, and our goal is to learn a low-error hypothesis with respect to the overall data by using as little communication, and as few rounds of communication, as possible. As an example, suppose k research groups around the world have collected large scientific datasets, such as genomic sequence data or sky survey data, and we wish to perform learning over the union of all these different datasets without too much communication.

In this talk, I will first discuss a general statistical or PAC style framework for analyzing communication complexity issues involved when doing distributed supervised machine learning, i.e., learning from annotated data distributed across multiple locations. I will discuss general lower bounds on the amount of communication needed to learn a given class and broadly-applicable techniques for achieving communication-efficient supervised learning.

I will also discuss algorithms with good communication complexity for unsupervised learning and dimensionality reduction problems, with interesting connections to efficient distributed coreset construction.

BIO: Maria-Florina Balcan is an Associate Professor in tNinaBalcanhe School of Computer Science at Carnegie Mellon University. Her main research interests are machine learning, computational aspects in economics and game theory, and algorithms. Her honors include the CMU SCS Distinguished Dissertation Award, an NSF CAREER Award, a Microsoft Fculty Research Fellowship, a Sloan Research Fellowship, and several paper awards. She was a Program Committee Co-chair for COLT 2014 and a board memberof the International Machine Learning Society, and is currently a Program Committee Co-chair for ICML 2016.

Sébastien Bubeck; New Results at the Crossroads of Convexity, Learning and Information Theory

CORE Series
Tuesday, Apr 19, 2016, 4pm

LOW 102
Sébastien Bubeck, Microsoft Research

ABSTRACT: I will present three new results: (i) the Cramer transform of
the uniform measure on a convex body is a universal self-concordant
barrier; (ii) projected gradient descent with Gaussian noise allows to
sample from a log-concave measure in polynomial time; and (iii) Thompson
sampling combined with a multi-scale exploration solves the Bayesian
convex bandit problem. The unifying theme in these results is the
interplay between concepts from convex geometry, learning and
information theory.

bubeckBIO: Sébastien Bubeck is a Researcher at Microsoft Research (Theory Group) in Redmond, WA. Prior to Microsoft Research, he was an Assistant Professor in the Department of Operations Research and Financial Engineering at Princeton University. He received his MS in Mathematics from the Ecole Normale Supérieure de Chachan and his PhD from the University of Lille 1, France, where his PhD thesis won several prizes including the Jacques Neveu prize for the best French PhD thesis in Probability and Statistics. He was awarded the Sloan Research Fellowship in Computer Science in 2015, and a Best Paper Award at COLT (Conference on Learning Theory) in 2009. He was the general co-chair for COLT 2013 and 2014, and serves on the steering committee for COLT. He is also the author of the recent monograph, Convex Optimization: Algorithms and Complexity, published in 2015 as a part of Foundations and Trends in Machine Learning.

Katya Scheinberg; Using randomized models in black-box, derivative free and stochastic optimization

CORE Series
Tuesday, March 8, 2016, 4:00 pm
SMI 304
Katya Scheinberg,Lehigh University.
Using randomized models in black-box, derivative free and stochastic optimization

ABSTRACT: Derivative free optimization (DFO) is the field that addresses optimization of black-box functions – that is functions whose value can be computed (possibly approximately) but whose derivatives cannot be approximated directly. The applications of DFO range from aircraft engineering to hyperparameter tuning in machine learning. All derivative free methods rely on sampling the objective function at one or more points at each iteration. Constructing and maintaining these sample sets has been one of the most essential issues in DFO. Majority of the existing results rely on deterministic sampling techniques.
We will discuss the new developments for using randomized sampled sets within the DFO framework. Randomized sample sets have many advantages over the deterministic sets. In particular, it is often easier to enforce “good” properties of the models with high probability, rather than in the worst case. In addition, randomized sample sets can help automatically discover a good local low dimensional approximation to the high dimensional objective function. We will demonstrate how compressed sensing results can be used to show that reduced size random sample sets can provide full second order information under the assumption of the sparsity of the Hessian.

Bio: Katya Scheinberg is the Harvey E. Wagner Endowed Chair Professor at the Industrial and Systems Engineering Department at Lehigh University. Prior to her current position, Katya was a staff member at the IBM T.J. Watson Research Center for over a decade. Her main research interests lie broadly in continuous optimization, focusing on convex optimization, derivative free optimization, and large-scale methods for Big Data and Machine Learning applications. Katya is currently the Editor-in-Chief of the SIAM-MOS Series on Optimization and an associate editor of the SIAM Journal on Optimization. She is a recent receipt of the prestigious Lagrange Prize, along with Andrew R. Conn and Luis Nunes Vicente, for their highly influential book “Introduction to Derivative-Free Optimization”. Katya’s research is supported by grants from AFOSR, DARPA, NSF, and Yahoo.

Andrew Fitzgibbon; Learning about Shape

CORE Series
Monday, March 30, 2015, 4pm
EEB 125
Andrew Fitzgibbon, Microsoft Research, Cambridge.
Learning about Shape

Abstract: Vision is naturally concerned with shape. If we could recover a stable and compact representation of object shape from images, we would hope it might aid with numerous vision tasks. Just the silhouette of an object is a strong cue to its identity, and the silhouette is generated by its 3D shape. In computer vision, many representations have been explored: collections of points, “simple” shapes like ellipsoids or polyhedra, algebraic surfaces and other implicit surfaces, generalized cylinders and ribbons, and piecewise (rational) polynomial representations like NURBS and subdivision surfaces. Many of these can be embedded more or less straightforwardly into probabilistic shape spaces, and recovery (a.k.a. “learning”) of one such space is the goal of the experimental part of this talk.

When recovering shape from measurements, there is at first sight a natural hierarchy of stability: straight lines can represent very little but may be robustly recovered from data, then come conic sections, splines with fixed knots, and general piecewise representations. I will show, however, that one can pass almost immediately to piecewise representations without loss of robustness. In particular, I shall show how a popular representation in computer graphics—subdivision curves and surfaces—may readily be fit to a variety of image data using the technique for ellipse fitting introduced by Gander, Golub, and Strebel in 1994. I show how we can address the previously-difficult problem of recovering 3D shape from multiple silhouettes, and the considerably harder problem which arises when the silhouettes are not from the same object instance, but from members of an object class, for example 30 images of different dolphins each in different poses. This requires that we simultaneously learn the shape space and the projections from each instance into its image. This simultaneous optimization is reminiscent of the bundle adjustment problem in computer vision, and indeed our most recent application, to tracking the human hand, makes good use of the Ceres Solver.

Joint work with Tom Cashman and others.

Bio: Andrew Fitzgibbon is a principal researcher in the computer vision group at Microsoft Research Cambridge. He is best known for his work on 3D vision, having been a core contributor to the Emmy-award-winning 3D camera tracker “boujou” ( and to the human body tracking of Kinect for Xbox 360, but his interests are broad, spanning computer vision, graphics, machine learning, and even a little neuroscience. He has published numerous highly-cited papers, and received many awards for his work, including 9 conference prizes for best paper or runner-up, the Silver medal of the Royal Academy of Engineering, and the British Computer Society’s Roger Needham award. He is a fellow of the Royal Academy of Engineering, the British Computer Society, and the International Association for Pattern Recognition. He has been program chair of CVPR and ECCV. Before joining Microsoft in 2005, he was a Royal Society University Research Fellow at Oxford University, having previously studied at Edinburgh University, Heriot-Watt University, and University College, Cork.

Dan Spielman; Spectral Sparsification of Graphs

May 8th, 2014, 3:30pm
EEB 105
Dan Spielman, Department of Computer Science, Yale.
Spectral Sparsification of Graphs
Math Across Campus Seminar

Abstract: We introduce a notion of what it means for one graph to be a good spectral approximation of another, and prove that every graph can be well-approximated by a graph with few edges.

We ask how well a given graph can be approximated by a sparse graph. Expander graphs can be viewed as sparse approximations of complete graphs, with Ramanujan expanders providing the best possible approximations. We prove that every graph can be approximated by a sparse graph almost as well as the complete graphs are approximated by the Ramanujan expanders: our approximations employ at most twice as many edges to achieve the same approximation factor.

We also present an efficient randomized algorithm for constructing sparse approximations that only uses a logarithmic factor more edges than optimal.

Our algorithms follow from the solution of a problem in linear algebra. Given any expression of a rank-n symmetric matrix A as a sum of rank-1 symmetric matrices, we show that A can be well approximated by a weighted sum of only O(n) of those rank-1 matrices.

This is joint work with Joshua Batson, Nikhil Srivastava and Shang-Hua Teng.

Biosketch: Daniel Alan Spielman received his B.A. in Mathematics and Computer Science from Yale in 1992, and his Ph.D in Applied Mathematics from M.I.T. in 1995. He spent a year as a NSF Mathematical Sciences Postdoc in the Computer Science Department at U.C. Berkeley, and then taught in the Applied Mathematics Department at M.I.T. until 2005. Since 2006, he has been a Professor at Yale University. He is presently the Henry Ford II Professor of Computer Science, Mathematics, and Applied Mathematics.

He has received many awards, including the 1995 ACM Doctoral Dissertation Award, the 2002 IEEE Information Theory Paper Award, the 2008 Godel Prize, the 2009 Fulkerson Prize, the 2010 Nevanlinna Prize, an inaugural Simons Investigator Award, and a MacArthur Fellowship. He is a Fellow of the Association for Computing Machinery and a member of the Connecticut Academy of Science and Engineering. His main research interests include the design and analysis of algorithms, graph theory, machine learning, error-correcting codes and combinatorial scientific computing.

Sanjeev Arora; Overcoming Intractability in Unsupervised Learning

April 15th, 2014, 4:00pm
CSE 691 (The Gates Commons)
Sanjeev Arora, Department of Computer Science, Princeton.
Overcoming Intractability in Unsupervised Learning

Abstract: Unsupervised learning – i.e., learning with unlabeled data – is increasingly important given today’s data deluge. Most natural problems in this domain – e.g. for models such as mixture models, HMMs, graphical models, topic models and sparse coding/dictionary learning, deep learning – are NP-hard. Therefore researchers in practice use either heuristics or convex relaxations with no concrete approximation bounds. Several nonconvex heuristics work well in practice, which is also a mystery.

The talk will describe a sequence of recent results whereby rigorous approaches leading to polynomial running time are possible for several problems in unsupervised learning. The proof of polynomial running time usually relies upon nondegeneracy assumptions on the data and the model parameters, and often also on stochastic properties of the data (average-case analysis). Some of these new algorithms are very efficient and practical – e.g. for topic modeling.

Biosketch: Sanjeev Arora is the Charles C. Fitzmorris Professor of Computer Science at Princeton University. His research spans Theoretical Computer Science, including foundational work in Algorithms and Computational Complexity. His recent work focuses on provable bounds for Machine Learning. He has received numerous awards and distinction. Most recently, these include the 2012 Fulkerson Prize, the 2011 ACM-Infosys Foundation Award in Computing Sciences, the 2010 Goedel Prize, and the Best Paper Award at the 2010 Annual Symposium on Foundations of Computing. He is an ACM Fellow and recipient of the 1995 ACM Doctoral Dissertation Award.

David Blei; Probabilistic Topic Models of Text and Users

Monday, March 31st, 2014, 3:30pm
EEB 125
David Beli, Department of Computer Science, Princeton.
Probabilistic Topic Models of Text and Users

Abstract: Probabilistic topic models provide a suite of tools for analyzing large document collections. Topic modeling algorithms discover the latent themes that underlie the documents and identify how each document exhibits those themes. Topic modeling can be used to help explore, summarize, and form predictions about documents.

Traditional topic modeling algorithms analyze a document collection and estimate its latent thematic structure. However, many collections contain an additional type of data: how people use the documents. For example, readers click on articles in a newspaper website, scientists place articles in their personal libraries, and lawmakers vote on a collection of bills. Behavior data is essential both for making predictions about users (such as for a recommendation system) and for understanding how a collection and its users are organized.

In this talk, I will review the basics of topic modeling and describe our recent research on collaborative topic models, models that simultaneously analyze a collection of texts and its corresponding user behavior. We studied collaborative topic models on 80,000 scientists’ libraries, a collection that contains 250,000 articles.

With this analysis, I will show how we can build interpretable recommendation systems that point scientists to articles they will like. Further, the same analysis lets us organize the scientific literature according to discovered patterns of readership. For example, we can identify articles important within a field and articles that transcend disciplinary boundaries.

More broadly, topic modeling is a case study in the large field of applied probabilistic modeling. Finally, I will survey some recent advances in this field. I will show how modern probabilistic modeling gives data scientists a rich language for expressing statistical assumptions and scalable algorithms for uncovering hidden patterns in massive data.

Biosketch: David Blei is an associate professor of Computer Science at Princeton University. He earned his Bachelor’s degree in Computer Science and Mathematics from Brown University and his PhD in Computer Science from the University of California, Berkeley. His research focuses on probabilistic topic models, Bayesian nonparametric methods, and approximate posterior inference. He works on a variety of applications, including text, images, music, social networks, user behavior, and scientific data.