**Fri Nov 30** (CORE Series)

2:30pm, MEB 248

Jesús De Loera, *Department of Mathematics, UC Davis
Variations on a theme by G. Dantzig: Revisiting the principles of the Simplex Algorithm
*

# Jesús De Loera; Variations on a theme by G. Dantzig: Revisiting the principles of the Simplex Algorithm

CORE Series

Jesús De Loera, *UC Davis*

**Friday, November 30, 2018**

**MEB 248, 2:30pm**

**TITLE: **Variations on a theme by G. Dantzig: Revisiting the principles of the Simplex Algorithm

**ABSTRACT:
**Linear programs (LPs) are, without any doubt, at the core of both the theory and the practice of modern applied and computational optimization (e.g., in discrete optimization LPs are used in practical computations using branch-and-bound, and in approximation algorithms, e.g., in rounding schemes). Fast algorithms are indispensable.

George Dantzig’s simplex method is one of the most famous algorithms to solve LPs and SIAM even elected it as one of the top 10 most influential algorithms of the 20th Century. But despite its key importance, many simple easy-to-state mathematical properties of the simplex method and its geometry remain unknown. The geometry of the simplex method is a topic in the convex-combinatorial geometry of polyhedra. Perhaps the most famous geometric-combinatorial challenge is to determine a worst-case upper bound for the graph diameter of polyhedra.

In this talk, I will look at how abstractions of the simplex method provide useful insight into the properties of this famous algorithm. The first type of abstraction is to remove coordinates entirely and is related to combinatorial topology, the second is related to generalizing the pivoting moves. This survey lecture includes joint work with Steve Klee, Raymond Hemmecke, and Jon Lee.

**BIO:
**Jesús A. De Loera received his Bachelor of Science degree in Mathematics from the National University of Mexico in 1989, and a Ph.D in Applied Mathematics from Cornell University in 1995. He arrived at UC Davis in 1999, where he is now a professor of Mathematics, as well as a member of the Graduate groups in Computer Science and Applied Mathematics. He has held visiting positions at the University of Minnesota, the Swiss Federal Technology Institute (ETH Zürich), the Mathematical Science Institute at Berkeley (MSRI), Universität Magdeburg (Germany), the Institute for Pure and Applied Mathematics at UCLA (IPAM), the Newton Institute of Cambridge Univ. (UK), and the Technische Universität München.

His research covers a wide range of topics, including Combinatorics, Algorithms, Convex Geometry, Applied Algebra, and Optimization. In 2004 he received an Alexander von Humboldt Fellowship and won the 2010 INFORMS computer society prize for his work in algebraic algorithms in Optimization. For his contributions to Discrete Geometry and Combinatorial Optimization, as well as for service to the profession, including mentoring and diversity, he was elected a fellow of the American Mathematical Society in 2014. For his mentoring and teaching he received the 2013 Chancellor’s award for mentoring undergraduate research and, in 2017, the Mathematical Association of America Golden Section Teaching Award. He has supervised twelve Ph.D students, and over 50 undergraduates research projects. He is currently an associate editor for *SIAM Journal of Discrete Mathematics, SIAM Journal of **Applied Algebra and Geometry*, and the *Boletin de la Sociedad **Matematica Mexicana.*

# Francis Bach; Can machine learning survive the artificial intelligence revolution?

CORE Series

Francis Bach, *Inria and Ecole Normale Supérieure*

**Thursday, November 8, 2018**

**Electrical Engineering Building (EEB) 105, 11:00am**

**TITLE: **Can machine learning survive the artificial intelligence revolution?

**ABSTRACT:**

Data and algorithms are ubiquitous in all scientific, industrial and personal domains. Data now come in multiple forms (text, image, video, web, sensors, etc.), are massive, and require more and more complex processing beyond their mere indexation or the computation of simple statistics, such as recognizing objects in images or translating texts. For all of these tasks, commonly referred to as artificial intelligence (AI), significant recent progress has allowed algorithms to reach performances that were deemed unreachable a few years ago and that make these algorithms useful to everyone.

Many scientific fields contribute to AI, but most of the visible progress come from machine learning and tightly connected fields such as computer vision and natural language processing. Indeed, many of the recent advances are due to the availability of massive data to learn from, large computing infrastructures and new machine learning models (in particular deep neural networks).

Beyond the well publicized visibility of some advances, machine learning has always been a field characterized by the constant exchanges between theory and practice, with a stream of algorithms that exhibit both good empirical performance on real-world problems and some form of theoretical guarantees. Is this still possible?

In this talk, I will present recent illustrating machine learning successes and propose some answers to the question above.

*Francis Bach is the Distinguished Visiting Faculty of the NSF-TRIPODS Algorithmic Foundations of Data Science Institute. The seminar is part of the CORE Seminar Series, the Data Science Seminar Series, and the ML Seminar Series.*

**BIO:
**Francis Bach is a researcher at Inria, leading since 2011 the machine learning team which is part of the Computer Science Department at Ecole Normale Supérieure. He graduated from Ecole Polytechnique in 1997 and completed his Ph.D. in Computer Science at U.C. Berkeley in 2005, working with Professor Michael Jordan. He spent two years in the Mathematical Morphology group at Ecole des Mines de Paris, then he joined the computer vision project-team at Inria/Ecole Normale Supérieure from 2007 to 2010. Francis Bach is primarily interested in machine learning, and especially in graphical models, sparse methods, kernel-based learning, large-scale convex optimization, computer vision and signal processing. He obtained in 2009 a Starting Grant and in 2016 a Consolidator Grant from the European Research Council, and received the Inria young researcher prize in 2012, the ICML test-of-time award in 2014, as well as the Lagrange prize in continuous optimization in 2018. In 2015, he was program co-chair of the International Conference in Machine Learning (ICML), and general chair in 2018; he is now co-editor-in-chief of the Journal of Machine Learning Research.

# Yurii Nesterov; Relative smoothness condition and its application to third-order methods

CORE Series

**Monday, May 21, 2018**

**SMI 205, 4:00PM **

Yurii Nesterov (CORE/INMA, UCL, Belgium)

**TITLE: Relative smoothness condition and its application to third-order methods**

ABSTRACT: In this talk, we show that the recently developed relative smoothness condition can be used for constructing implementable third-order methods for Unconstrained Convex Optimization. At each iteration of these methods, we need to solve an auxiliary problem of minimizing a convex multivariate polynomial, which is a sum of the third-order Taylor approximation and a regularization term. It appears that this nontrivial nonlinear optimization problem can be solved very efficiently by a gradient-type minimization method based on the relative smoothness condition. Its linear rate of convergence depends only on absolute constant. This result opens a possibility for practical implementation of the third-order methods.

BIO: Yurii Nesterov is a professor at the Center for Operations Research and Econometrics (CORE) in Catholic University of Louvain (UCL), Belgium. He received his Ph.D. degree (Applied Mathematics) in 1984 at the Institute of Control Sciences, Moscow. Starting from 1993 he works at the Center of Operations Research and Econometrics (Catholic University of Louvain, Belgium).

His research interests are related to complexity issues and efficient methods for solving various optimization problems. The main results are obtained in Convex Optimization (optimal methods for smooth problems, polynomial-time interior-point methods, smoothing technique for structural optimization, complexity theory for second-order methods, optimization methods for huge-scale problems). He is an author of 5 monographs and more than 100 refereed papers in leading optimization journals. He has received several international prizes, among which are the Dantzig Prize from SIAM and Mathematical Programming society (2000), the von Neumann Theory Prize from INFORMS (2009), the SIAM Outstanding Paper Award (2014), and the Euro Gold Medal from the Association of European Operations Research Societies (2016). In 2018 he also won an Advanced Grant from the European Research Council.

# Mark Schmidt; Let’s Make Block Coordinate Descent Go Fast: Faster Greedy Rules, Message-Passing, Active-Set Complexity, and Superlinear Convergence

**May 29th, 2018, 4:00pm
**SAV 264

Mark Schmidt,

*University of British Columbia*

**Abstract: **Block coordinate descent (BCD) methods are widely-used for large-scale numerical optimization because of their cheap iteration costs, low memory requirements, amenability to parallelization, and ability to exploit problem structure. Three main algorithmic choices influence the performance of BCD methods: the block partitioning strategy, the block selection rule, and the block update rule. In this paper we explore all three of these building blocks and propose variations for each that can lead to significantly faster BCD methods. We (i) propose new greedy block-selection strategies that guarantee more progress per iteration than the Gauss-Southwell rule; (ii) explore practical issues like how to implement the new rules when using “variable” blocks; (iii) explore the use of message-passing to compute matrix or Newton updates efficiently on huge blocks for problems with a sparse dependency between variables; and (iv) consider optimal active manifold identification, which leads to bounds on the “active set complexity” of BCD methods and leads to superlinear convergence for certain problems with sparse solutions (and in some cases finite termination at an optimal solution). We support all of our findings with numerical results for the classic machine learning problems of least squares, logistic regression, multi-class logistic regression, label propagation, and L1-regularization.

**Biography:** Mark Schmidt has been an assistant professor in the Department of Computer Science at the University of British Columbia since 2014, and is a Canada Research Chair and Alfred P. Sloan Fellow. His research focuses on developing faster algorithms for large-scale machine learning, with an emphasis on methods with provable convergence rates and that can be applied to structured prediction problems. From 2011 through 2013 he worked at the École Normale Supérieure in Paris on inexact and stochastic convex optimization methods. He finished his M.Sc. in 2005 at the University of Alberta working as part of the Brain Tumor Analysis Project, and his Ph.D. in 2010 at the University of British Columbia working on graphical model structure learning with L1-regularization. He has also worked at Siemens Medical Solutions on heart motion abnormality detection, with Michael Friedlander in the Scientific Computing Laboratory at the University of British Columbia on semi-stochastic optimization methods, and with Anoop Sarkar at Simon Fraser University on large-scale training of natural language models.

# Aaron Sidford; Faster Algorithms for Computing the Stationary Distribution

CORE Series

**Tuesday, May 8, 2018**

**SAV 264, 4:00PM **

**Aaron Sidford, **Stanford University (Management Science and Engineering)

**TITLE: Faster Algorithms for Computing the Stationary Distribution**

ABSTRACT: Computing the stationary distribution of a Markov Chain is one of the most fundamental problems in optimization. It lies at the heart of numerous computational tasks including computing personalized PageRank vectors, evaluating the utility of policies in Markov decision process, and solving asymmetric diagonally dominant linear systems. Despite the ubiquity of these problems, until recently the fastest known running times for computing the stationary distribution either depended polynomially on the mixing time or desired accuracy or appealed to generic linear system solving machinery, and therefore ran in super-quadratic time.

In this talk I will present recent results showing that the stationary distribution and related quantities can all be computed in almost linear time. I will present new iterative methods for extracting structure from directed graphs and and show how they can be tailored to achieve this new running time. Moreover, I will discuss connections between this work and recent developments in solving Laplacian systems and emerging trends in combining numerical and combinatorial techniques in the design of optimization algorithms.

This talk reflects joint work with Michael B. Cohen (MIT), Jonathan Kelner (MIT), Rasmus Kyng (Harvard), John Peebles (MIT), Richard Peng (Georgia Tech), Anup Rao (Adobe Research), and Adrian Vladu (Boston University).

BIO: Aaron completed his PhD at MIT (CSAIL) advised by Jon Kelner and since then is an Assistant Professor at Stanford (MSE). He works on fast algorithms for various optimization problems. His work received best paper awards at SODA 2014 and FOCS 2014 and a best student paper award at FOCS 2015. In particular he become broadly known in the community for the development of an Interior Point Algorithm that solves LPs in sqrt(rank) many iterations (also improving the total running time compared to previous algorithms). This had been the most prominent open problem in the field of interior point methods since the work of Nesterov and Nemirovski in 1994.

# Walid Krichene; Continuous-time dynamics for convex optimization

**Feb 20th, 2018, 12:00pm
**CSE 403

Walid Krichene,

**Abstract:** Many optimization algorithms can be viewed as a discretization of a continuous-time process (described using an ordinary differential equation, or a stochastic differential equation in the presence of noise). The continuous-time point of view can be useful for simplifying the analysis, drawing connections with physics, and streamlining the design of new algorithms and heuristics. We will review results from continuous-time optimization, from the simple case of gradient flow, to more recent results on accelerated methods. In particular, we give simple interpretations of acceleration, and show how these interpretations motivate heuristics (restarting and adaptive averaging) which, empirically, can significantly improve the speed of convergence. We will then focus on the stochastic case, and study the interaction between acceleration and noise, and their effect on the convergence rates. We will conclude with a brief review of how the same tools can be applied in other problems, such as approximate sampling and non-convex optimization.

**Bio:** Walid Krichene is at Google Research, where he works on large-scale optimization and recommendation. He received his Ph.D. in EECS in 2016 from UC Berkeley, where he was advised by Alex Bayen and Peter Bartlett, a M.A. in Mathematics from U.C. Berkeley, and a M.S. in Engineering and Applied Math from the Ecole des Mines ParisTech. He received the Leon Chua Award and two outstanding GSI awards from U.C. Berkeley. His research interests include convex optimization, stochastic approximation, recommender systems, and online learning.