Nina Balcan; Distributed Machine Learning

CORE Series
Tuesday, May 24, 2016, 4:00 pm
Electrical Engineering Building (EEB) 125

Maria-Florina Balcan, Carnegie Mellon University

TITLE: Distributed Machine Learning

ABSTRACT: We consider the problem of learning from distributed data and analyze fundamental algorithmic and communication complexity questions involved. Broadly, we consider a framework where information is distributed between several locations, and our goal is to learn a low-error hypothesis with respect to the overall data by using as little communication, and as few rounds of communication, as possible. As an example, suppose k research groups around the world have collected large scientific datasets, such as genomic sequence data or sky survey data, and we wish to perform learning over the union of all these different datasets without too much communication.

In this talk, I will first discuss a general statistical or PAC style framework for analyzing communication complexity issues involved when doing distributed supervised machine learning, i.e., learning from annotated data distributed across multiple locations. I will discuss general lower bounds on the amount of communication needed to learn a given class and broadly-applicable techniques for achieving communication-efficient supervised learning.

I will also discuss algorithms with good communication complexity for unsupervised learning and dimensionality reduction problems, with interesting connections to efficient distributed coreset construction.

BIO: Maria-Florina Balcan is an Associate Professor in tNinaBalcanhe School of Computer Science at Carnegie Mellon University. Her main research interests are machine learning, computational aspects in economics and game theory, and algorithms. Her honors include the CMU SCS Distinguished Dissertation Award, an NSF CAREER Award, a Microsoft Fculty Research Fellowship, a Sloan Research Fellowship, and several paper awards. She was a Program Committee Co-chair for COLT 2014 and a board memberof the International Machine Learning Society, and is currently a Program Committee Co-chair for ICML 2016.