February 26, 2013, 3:30pm
Ben Recht, Computer Sciences Department, University of Wisconsin
How to make predictions when you’re short on information
Abstract: With the advent of massive social networks, exascale computing, and high-throughput biology, researchers in every scientific department now face profound challenges in analyzing, manipulating and identifying behavior from a deluge of noisy, incomplete data. In this talk, I will present a unifying framework to make such data analysis tasks less sensitive to corrupted and missing data by exploiting domain specific knowledge and prior information about structure. Specifically, I will show that when a signal or system of interest can be represented by a combination of a few simple building blocks–called atoms–it can be identified with dramatically fewer sensors and accelerated acquisition times. For example, a few principal factors can determine preferences across a user-base, a small number of genes may constitute the signature of a disease, and a sum of a few permutations can summarize the ranking of sports teams. In each application, the challenge lies not only in defining the appropriate set of atoms, but also in estimating the most parsimonious combination of atoms that agrees with a small set of measurements. This talk advances a framework for transforming notions of simplicity and latent low-dimensionality into convex optimization problems. My approach builds on the recent success of generalizing compressed sensing to matrix completion, creating a unified framework that greatly extends the catalog of objects and structures recoverable from partial information. This framework provides a standardized methodology to sharply bound the number of observations required to robustly estimate a variety of structured models. It also enables focused algorithmic development that can be deployed in many different applications, a variety of which I will detail in this talk. I will close by demonstrating how this framework provides the abstractions necessary to scale these optimization algorithms to the massive data sets we now commonly acquire.