Fri Nov 30 (CORE Series)

Jesus De Loera, *Department of Mathematics, UC Davis
*

# Yurii Nesterov; Relative smoothness condition and its application to third-order methods

CORE Series

**Monday, May 21, 2018**

**SMI 205, 4:00PM **

Yurii Nesterov (CORE/INMA, UCL, Belgium)

**TITLE: Relative smoothness condition and its application to third-order methods**

ABSTRACT: In this talk, we show that the recently developed relative smoothness condition can be used for constructing implementable third-order methods for Unconstrained Convex Optimization. At each iteration of these methods, we need to solve an auxiliary problem of minimizing a convex multivariate polynomial, which is a sum of the third-order Taylor approximation and a regularization term. It appears that this nontrivial nonlinear optimization problem can be solved very efficiently by a gradient-type minimization method based on the relative smoothness condition. Its linear rate of convergence depends only on absolute constant. This result opens a possibility for practical implementation of the third-order methods.

BIO: Yurii Nesterov is a professor at the Center for Operations Research and Econometrics (CORE) in Catholic University of Louvain (UCL), Belgium. He received his Ph.D. degree (Applied Mathematics) in 1984 at the Institute of Control Sciences, Moscow. Starting from 1993 he works at the Center of Operations Research and Econometrics (Catholic University of Louvain, Belgium).

His research interests are related to complexity issues and efficient methods for solving various optimization problems. The main results are obtained in Convex Optimization (optimal methods for smooth problems, polynomial-time interior-point methods, smoothing technique for structural optimization, complexity theory for second-order methods, optimization methods for huge-scale problems). He is an author of 5 monographs and more than 100 refereed papers in leading optimization journals. He has received several international prizes, among which are the Dantzig Prize from SIAM and Mathematical Programming society (2000), the von Neumann Theory Prize from INFORMS (2009), the SIAM Outstanding Paper Award (2014), and the Euro Gold Medal from the Association of European Operations Research Societies (2016). In 2018 he also won an Advanced Grant from the European Research Council.

# Mark Schmidt; Let’s Make Block Coordinate Descent Go Fast: Faster Greedy Rules, Message-Passing, Active-Set Complexity, and Superlinear Convergence

**May 29th, 2018, 4:00pm
**SAV 264

Mark Schmidt,

*University of British Columbia*

**Abstract: **Block coordinate descent (BCD) methods are widely-used for large-scale numerical optimization because of their cheap iteration costs, low memory requirements, amenability to parallelization, and ability to exploit problem structure. Three main algorithmic choices influence the performance of BCD methods: the block partitioning strategy, the block selection rule, and the block update rule. In this paper we explore all three of these building blocks and propose variations for each that can lead to significantly faster BCD methods. We (i) propose new greedy block-selection strategies that guarantee more progress per iteration than the Gauss-Southwell rule; (ii) explore practical issues like how to implement the new rules when using “variable” blocks; (iii) explore the use of message-passing to compute matrix or Newton updates efficiently on huge blocks for problems with a sparse dependency between variables; and (iv) consider optimal active manifold identification, which leads to bounds on the “active set complexity” of BCD methods and leads to superlinear convergence for certain problems with sparse solutions (and in some cases finite termination at an optimal solution). We support all of our findings with numerical results for the classic machine learning problems of least squares, logistic regression, multi-class logistic regression, label propagation, and L1-regularization.

**Biography:** Mark Schmidt has been an assistant professor in the Department of Computer Science at the University of British Columbia since 2014, and is a Canada Research Chair and Alfred P. Sloan Fellow. His research focuses on developing faster algorithms for large-scale machine learning, with an emphasis on methods with provable convergence rates and that can be applied to structured prediction problems. From 2011 through 2013 he worked at the École Normale Supérieure in Paris on inexact and stochastic convex optimization methods. He finished his M.Sc. in 2005 at the University of Alberta working as part of the Brain Tumor Analysis Project, and his Ph.D. in 2010 at the University of British Columbia working on graphical model structure learning with L1-regularization. He has also worked at Siemens Medical Solutions on heart motion abnormality detection, with Michael Friedlander in the Scientific Computing Laboratory at the University of British Columbia on semi-stochastic optimization methods, and with Anoop Sarkar at Simon Fraser University on large-scale training of natural language models.

# Aaron Sidford; Faster Algorithms for Computing the Stationary Distribution

CORE Series

**Tuesday, May 8, 2018**

**SAV 264, 4:00PM **

**Aaron Sidford, **Stanford University (Management Science and Engineering)

**TITLE: Faster Algorithms for Computing the Stationary Distribution**

ABSTRACT: Computing the stationary distribution of a Markov Chain is one of the most fundamental problems in optimization. It lies at the heart of numerous computational tasks including computing personalized PageRank vectors, evaluating the utility of policies in Markov decision process, and solving asymmetric diagonally dominant linear systems. Despite the ubiquity of these problems, until recently the fastest known running times for computing the stationary distribution either depended polynomially on the mixing time or desired accuracy or appealed to generic linear system solving machinery, and therefore ran in super-quadratic time.

In this talk I will present recent results showing that the stationary distribution and related quantities can all be computed in almost linear time. I will present new iterative methods for extracting structure from directed graphs and and show how they can be tailored to achieve this new running time. Moreover, I will discuss connections between this work and recent developments in solving Laplacian systems and emerging trends in combining numerical and combinatorial techniques in the design of optimization algorithms.

This talk reflects joint work with Michael B. Cohen (MIT), Jonathan Kelner (MIT), Rasmus Kyng (Harvard), John Peebles (MIT), Richard Peng (Georgia Tech), Anup Rao (Adobe Research), and Adrian Vladu (Boston University).

BIO: Aaron completed his PhD at MIT (CSAIL) advised by Jon Kelner and since then is an Assistant Professor at Stanford (MSE). He works on fast algorithms for various optimization problems. His work received best paper awards at SODA 2014 and FOCS 2014 and a best student paper award at FOCS 2015. In particular he become broadly known in the community for the development of an Interior Point Algorithm that solves LPs in sqrt(rank) many iterations (also improving the total running time compared to previous algorithms). This had been the most prominent open problem in the field of interior point methods since the work of Nesterov and Nemirovski in 1994.

# Walid Krichene; Continuous-time dynamics for convex optimization

**Feb 20th, 2018, 12:00pm
**CSE 403

Walid Krichene,

**Abstract:** Many optimization algorithms can be viewed as a discretization of a continuous-time process (described using an ordinary differential equation, or a stochastic differential equation in the presence of noise). The continuous-time point of view can be useful for simplifying the analysis, drawing connections with physics, and streamlining the design of new algorithms and heuristics. We will review results from continuous-time optimization, from the simple case of gradient flow, to more recent results on accelerated methods. In particular, we give simple interpretations of acceleration, and show how these interpretations motivate heuristics (restarting and adaptive averaging) which, empirically, can significantly improve the speed of convergence. We will then focus on the stochastic case, and study the interaction between acceleration and noise, and their effect on the convergence rates. We will conclude with a brief review of how the same tools can be applied in other problems, such as approximate sampling and non-convex optimization.

**Bio:** Walid Krichene is at Google Research, where he works on large-scale optimization and recommendation. He received his Ph.D. in EECS in 2016 from UC Berkeley, where he was advised by Alex Bayen and Peter Bartlett, a M.A. in Mathematics from U.C. Berkeley, and a M.S. in Engineering and Applied Math from the Ecole des Mines ParisTech. He received the Leon Chua Award and two outstanding GSI awards from U.C. Berkeley. His research interests include convex optimization, stochastic approximation, recommender systems, and online learning.

# Jakub Konečný; Federated Learning: Privacy-Preserving Collaborative Machine Learning without Centralized Training Data

**Jan 30th, 2018, 12:00pm
**CSE 403

Jakub Konečný,

**Abstract:** Federated Learning is a machine learning setting where the goal is to

train a high quality centralized model while training data remains

distributed over a large number of clients each with unreliable and

relatively slow network connections. We consider learning algorithms

for this setting where on each round, each client independently

computes an update to the current model based on its local data, and

communicates this update to a central server, where the client-side

updates are aggregated to compute a new global model.

In this talk, I will introduce the underlying algorithms, and present

several ideas for improving the overall system in terms of

communication efficiency, security, and differential privacy.

**Bio: **Jakub Konečný is a research scientist at Google working on Federated

Learning, an effort to decentralize machine learning. Prior to joining

Google, Jakub completed his PhD at University of Edinburgh focusing on

optimization algorithms for machine learning.

# Zeyuan Allen-Zhu; How to Swing By Saddle Points: Faster Non-Convex Optimization Than SGD

**Feb 13th, 2018, 4pm
**LOW 105

Zeyuan Allen-Zhu,

*Microsoft Research*

“The diverse world of deep learning research has given rise to thousands of architectures for neural networks. However, to this date, the underlying training algorithms for neural networks are still stochastic gradient descent (SGD) and its heuristic variants.

In this talk, we present a new stochastic algorithm to train any smooth neural network to ε-approximate local minima, using O(ε^{-3.25}) backpropagations. The best provable result was O(ε^{-4}) by SGD before this work.

More broadly, it finds ε-approximate local minima of any smooth nonconvex function using only O(ε^{-3.25}) stochastic gradient computations.”