Category Archives: CORE

Zaid Harchaoui; Catalyst, Generic Acceleration Scheme for Gradient-based Optimization

CORE Series
Tuesday, April 18, 2017
EEB 125, 4:00-5:00PM 
Zaid HarchaouiUniversity of Washington

TITLE: Catalyst, Generic Acceleration Scheme for Gradient-based Optimization

ABSTRACT: We introduce a generic scheme called Catalyst for accelerating first-order optimization methods in the sense of Nesterov, which builds upon a new analysis of the accelerated proximal point algorithm. The proposed approach consists of minimizing a convex objective by approximately solving a sequence of well-chosen auxiliary problems, leading to faster convergence. This strategy applies to a large class of algorithms, including gradient descent, block coordinate descent, SAG, SAGA, SDCA, SVRG, Finito/MISO, and their proximal variants. For all of these methods, we provide acceleration and explicit support for non-strongly convex objectives. Furthermore, the approach can be extended to venture into possibly nonconvex optimization problems without sacrificing the rate of convergence to stationary points. We present experimental results showing that the Catalyst acceleration scheme is effective in practice, especially for ill-conditioned problems where we measure significant improvements.

BIO: Zaid Harchaoui is currently a Provost’s Initiative in Data-driven Discovery Assistant Professor in the Department of Statistics and a Data Science Fellow in the eScience Institute at University of Washington. He completed his Ph.D. at ParisTech (now in Univ. Paris-Saclay), working with Eric Moulines, Stephane Canu and Francis Bach. Before joining the University of Washington, he was a visiting assistant professor at the Courant Institute for Mathematical Sciences at New York University (2015 – 2016). Prior to this, he was a permanent researcher on the LEAR team of Inria (2010 – 2015). He was a postdoctoral fellow in the Robotics Institute of Carnegie Mellon University in 2009.

He received the Inria award for scientific excellence and the NIPS reviewer award. He gave a tutorial on “Frank-Wolfe, greedy algorithms, and friends” at ICML’14, on “Large-scale visual recognition” at CVPR’13, and on “Machine learning for computer vision” at MLSS Kyoto 2015. He recently co-organized the “Future of AI” symposium at New York University, the workshop on “Optimization for MachineLearning” at NIPS’14, and the “Optimization and statistical learning” workshop in 2015 and 2013 in Ecole de Physique des Houches (France). He served/will serve as Area Chair for ICML 2015, ICML 2016, NIPS 2016, ICLR 2016. He is currently an associate editor of IEEE Signal Processing Letters.

Liza Levina; Interpretable Prediction Models for Network-Linked Data

CORE Series
Tuesday, April 11, 2017
EEB 125, 4:00-5:00PM 
Liza LevinaUniversity of Michigan

TITLE: Interpretable Prediction Models for Network-Linked Data

ABSTRACT: Prediction problems typically assume the training data are independent samples, but in many modern applications samples come from individuals connected by a network. For example, in adolescent health studies of risk-taking behaviors, information on the subjects’ social networks is often available and plays an important role through network cohesion, the empirically observed phenomenon of friends behaving similarly. Taking cohesion into account should allow us to improve prediction. Here we propose a regression-based framework with a network penalty on individual node effects to encourage similarity between predictions for linked nodes, and show that it outperforms traditional models both theoretically and empirically when network cohesion is present. The framework is easily extended to other models, such as the generalized linear model and Cox’s proportional hazard model. Applications to predicting teenagers’ behavior based on both demographic covariates and their friendship networks from the AddHealth data are discussed in detail.

BIO: Liza Levina received her PhD in Statistics from UC Berkeley in 2002 and joined the University of Michigan the same year.  Her research interestsinclude networks, high-dimensional data, and sparsity.  She has worked on estimating large covariance matrices,
graphical models, and other topics in inference for high-
dimensional data.   She also works on statistical inference for network data, including problems of community detectiona
nd link prediction.   Her research covers methodology, theory, and applications, especially to spectroscopy, remote sensing and, in the past, computer vision. She received the junior Noether Award from the ASA in 2010 and was elected a member of ISI in 2011.

Jon Lee; Mixed-Integer Nonlinear Optimization: Let’s get crazy!

CORE Series
Tuesday, Jan 31, 2017
LOW 105, 4:00-5:00PM 
Jon Lee, University of Michigan

TITLE: Mixed-Integer Nonlinear Optimization: Let’s get crazy!

ABSTRACT: Mixed-Integer Nonlinear Optimization (MINLP) is the mother of all (deterministic) optimization problems. As such, it is too broad a category to do something for in general. Still, there are broad classes where we have positive complexity results, as well as narrow classes where we have negative complexity results. I will sample some of the landscape.

Considering very damning complexity results, we should have modest goals on the computational side. Nonetheless, there has been some substantial success with “general-purpose” algorithms and software for MINLP, applicable to rather broad (and practically relevant) formulations. In particular, for formulations with convex relaxations we have Bonmin (which implements NLP-based branch-and-bound and outer approximation), and for “factorable” formulations with non-convex relaxations we have Baron, Scip, Antigone, and Couenne (which implement spatial branch-and-bound). Still, the situation is not that we have extremely reliable nor scalable technology (in sharp contrast with LP and in less sharp contrast to MILP). Much of the research effort in modeling and algorithm improvement relates to finding tight convexifications. I will present recent mathematical efforts that I have been involved in to make algorithmic approaches to MILP more efficient and robust. In doing so, many areas of mathematics surprisingly pop up. Substantial parts of this are joint works with Daphne Skipper (U.S. Naval Academy) and with Emily Speakman (U. Michigan).

jonleeBIO:  Jon Lee is the G. Lawton and Louise G. Johnson Professor of Engineering at the University of Michigan. He received his Ph.D. from Cornell University. Jon has previously been a faculty member at Yale University and the University of Kentucky, and an adjunct professor at New York University.  He was a Research Staff member at the IBM T.J. Watson Research Center, where he managed the mathematical programming group. Jon is author of ~120 papers, the text “A First Course in Combinatorial Optimization” (Cambridge University Press), and the open-source book “A First Course in Linear Optimization” (Reex Press). He was the founding Managing Editor of the journal Discrete Optimization (2004-06), he is currently co-Editor of the journal Mathematical Programming, and he is on the editorial boards of the journals Optimization and Engineering, and Discrete Applied Mathematics. Jon was Chair of the Executive Committee of the Mathematical Optimization Society (2008-10), and Chair of the INFORMS Optimization Society (2010-12). He was awarded the INFORMS Computing Society Prize (2010), and he is a Fellow of INFORMS (since 2013).

David Shmoys; Models and Algorithms for the Operation and Design of Bike-Sharing Systems

CORE Series (Joint with CSE Theory Seminar)
**Thursday**, Dec 1, 2016
11:00AM-12:00PM in CSE 403
David Shmoys, Cornell University

TITLE: Models and Algorithms for the Operation and Design of Bike-Sharing Systems

ABSTRACT: New York launched the largest bike-sharing system in North America in May 2013. We have worked with Citibike, using analytics and optimization to change how they manage the system. Huge rush-hour usage imbalances the system; we answer both of the questions: where should bikes be at the start of a day and how can we mitigate the imbalances that develop? We will survey the analytics we have employed for the former question, where we developed an approach based on continuous-time Markov chains combined with integer programming models to compute daily stocking levels for the bikes, as well as methods employed for optimizing the capacity of the stations. For the question of mitigating the imbalances that result, we will describe both heuristic methods and approximation algorithms that guide both mid-rush hour and overnight rebalancing, as well as for the positioning of corrals, which have been one of the most effective means of creating adaptive capacity in the system.

This is joint work with Daniel Freund, Shane Henderson, Nanjing Jian, Ashkan Nourozi-Fard, Eoin O’Mahoney, and Alice Paul.

 

David B. Shmoys pic

BIO: David Shmoys is the Laibe/Acheson Professor at Cornell University in the School of Operations Research and Information Engineering, and also the Department of Computer Science at Cornell University, and is currently the Director of the School of Operations Research and Information Engineering. Shmoys’s research has focused on the design and analysis of efficient algorithms for discrete optimization problems, with applications including scheduling, inventory theory, computational biology, and most recently, on stochastic optimization models and algorithms in computational sustainability. His graduate-level text, The Design of Approximation Algorithms, co-authored with David Williamson, was awarded the 2013 INFORMS Lanchester Prize. He is an INFORMS Fellow, a Fellow of the ACM, a SIAM Fellow, and was an NSF Presidential Young Investigator; he has served on numerous editorial boards, and is currently Editor-in-Chief of Research in the Mathematical Sciences (for theoretical computer science) and an Associate Editor of Mathematics of Operations Research.

Save

Nina Balcan; Distributed Machine Learning

CORE Series
Tuesday, May 24, 2016, 4:00 pm
Electrical Engineering Building (EEB) 125

Maria-Florina Balcan, Carnegie Mellon University

TITLE: Distributed Machine Learning

ABSTRACT: We consider the problem of learning from distributed data and analyze fundamental algorithmic and communication complexity questions involved. Broadly, we consider a framework where information is distributed between several locations, and our goal is to learn a low-error hypothesis with respect to the overall data by using as little communication, and as few rounds of communication, as possible. As an example, suppose k research groups around the world have collected large scientific datasets, such as genomic sequence data or sky survey data, and we wish to perform learning over the union of all these different datasets without too much communication.

In this talk, I will first discuss a general statistical or PAC style framework for analyzing communication complexity issues involved when doing distributed supervised machine learning, i.e., learning from annotated data distributed across multiple locations. I will discuss general lower bounds on the amount of communication needed to learn a given class and broadly-applicable techniques for achieving communication-efficient supervised learning.

I will also discuss algorithms with good communication complexity for unsupervised learning and dimensionality reduction problems, with interesting connections to efficient distributed coreset construction.

BIO: Maria-Florina Balcan is an Associate Professor in tNinaBalcanhe School of Computer Science at Carnegie Mellon University. Her main research interests are machine learning, computational aspects in economics and game theory, and algorithms. Her honors include the CMU SCS Distinguished Dissertation Award, an NSF CAREER Award, a Microsoft Fculty Research Fellowship, a Sloan Research Fellowship, and several paper awards. She was a Program Committee Co-chair for COLT 2014 and a board memberof the International Machine Learning Society, and is currently a Program Committee Co-chair for ICML 2016.

Sébastien Bubeck; New Results at the Crossroads of Convexity, Learning and Information Theory

CORE Series
Tuesday, Apr 19, 2016, 4pm

LOW 102
Sébastien Bubeck, Microsoft Research

ABSTRACT: I will present three new results: (i) the Cramer transform of
the uniform measure on a convex body is a universal self-concordant
barrier; (ii) projected gradient descent with Gaussian noise allows to
sample from a log-concave measure in polynomial time; and (iii) Thompson
sampling combined with a multi-scale exploration solves the Bayesian
convex bandit problem. The unifying theme in these results is the
interplay between concepts from convex geometry, learning and
information theory.

bubeckBIO: Sébastien Bubeck is a Researcher at Microsoft Research (Theory Group) in Redmond, WA. Prior to Microsoft Research, he was an Assistant Professor in the Department of Operations Research and Financial Engineering at Princeton University. He received his MS in Mathematics from the Ecole Normale Supérieure de Chachan and his PhD from the University of Lille 1, France, where his PhD thesis won several prizes including the Jacques Neveu prize for the best French PhD thesis in Probability and Statistics. He was awarded the Sloan Research Fellowship in Computer Science in 2015, and a Best Paper Award at COLT (Conference on Learning Theory) in 2009. He was the general co-chair for COLT 2013 and 2014, and serves on the steering committee for COLT. He is also the author of the recent monograph, Convex Optimization: Algorithms and Complexity, published in 2015 as a part of Foundations and Trends in Machine Learning.

Katya Scheinberg; Using randomized models in black-box, derivative free and stochastic optimization

CORE Series
Tuesday, March 8, 2016, 4:00 pm
SMI 304
Katya Scheinberg,Lehigh University.
Using randomized models in black-box, derivative free and stochastic optimization

ABSTRACT: Derivative free optimization (DFO) is the field that addresses optimization of black-box functions – that is functions whose value can be computed (possibly approximately) but whose derivatives cannot be approximated directly. The applications of DFO range from aircraft engineering to hyperparameter tuning in machine learning. All derivative free methods rely on sampling the objective function at one or more points at each iteration. Constructing and maintaining these sample sets has been one of the most essential issues in DFO. Majority of the existing results rely on deterministic sampling techniques.
We will discuss the new developments for using randomized sampled sets within the DFO framework. Randomized sample sets have many advantages over the deterministic sets. In particular, it is often easier to enforce “good” properties of the models with high probability, rather than in the worst case. In addition, randomized sample sets can help automatically discover a good local low dimensional approximation to the high dimensional objective function. We will demonstrate how compressed sensing results can be used to show that reduced size random sample sets can provide full second order information under the assumption of the sparsity of the Hessian.

Bio: Katya Scheinberg is the Harvey E. Wagner Endowed Chair Professor at the Industrial and Systems Engineering Department at Lehigh University. Prior to her current position, Katya was a staff member at the IBM T.J. Watson Research Center for over a decade. Her main research interests lie broadly in continuous optimization, focusing on convex optimization, derivative free optimization, and large-scale methods for Big Data and Machine Learning applications. Katya is currently the Editor-in-Chief of the SIAM-MOS Series on Optimization and an associate editor of the SIAM Journal on Optimization. She is a recent receipt of the prestigious Lagrange Prize, along with Andrew R. Conn and Luis Nunes Vicente, for their highly influential book “Introduction to Derivative-Free Optimization”. Katya’s research is supported by grants from AFOSR, DARPA, NSF, and Yahoo.