Spring 2017 Calendar

Apr 11 [CORE]
Liza Levina, Department of Statistics, University of Michigan
Interpretable Prediction Models for Network-Linked Data

Apr 18 [CORE]
Zaid Harchaoui, Department of Statistics, University of Washington
Catalyst, Generic Acceleration Scheme for Gradient-based Optimization

Apr 25
Andrew Pryhuber, Department of Mathematics, University of Washington
A QCQP Approach for Triangulation

May 2
Scott Roy, Department of Mathematics, University of Washington
An Optimal First-order Method Based on Optimal Quadratic Averaging

May 9
Peng Zheng, Department of Applied Mathematics, University of Washington
What’s the shape of your penalty?

May 30
Kellie MacPheeDepartment of Mathematics, University of Washington
Gauge and perspective duality

Jun 1
Madeleine UdellDept. of Operations Research and Information Engineering, Cornell University
Sketchy Decisions: Convex Low-Rank Matrix Optimization with Optimal Storage

Jun 6
Hongzhou Lin, Inria Grenoble
A Generic Quasi-Newton Algorithm for Faster Gradient-Based Optimization

Felix Ye; Estimate exponential memory decay in Hidden Markov Model and its applications

Nov 14, 2017, 4pm
PDL C-401
Felix Ye, Department of Applied Mathematics
Inference in hidden Markov model has been challenging in terms of scalability due to dependencies in the observation data. In this paper, we utilize the inherent memory decay in hidden Markov models, such that the forward and backward probabilities can be carried out with subsequences, enabling efficient inference over long sequences of observations. We formulate this forward filtering process in the setting of the random dynamical system and there exist Lyapunov exponents in the i.i.d random matrices production. And the rate of the memory decay is known as $\lambda_2-\lambda_1$, the gap of the top two Lyapunov exponents almost surely. An efficient and accurate algorithm is proposed to numerically estimate the gap after the soft-max parametrization. The length of subsequences $B$ given the controlled error $\epsilon$ is $B=\log(\epsilon)/(\lambda_2-\lambda_1)$. We theoretically prove the validity of the algorithm and demonstrate the effectiveness with numerical examples. The method developed here can be applied to widely used algorithms, such as mini-batch stochastic gradient method. Moreover, the continuity of Lyapunov spectrum ensures the estimated $B$ could be reused for the nearby parameter during the inference.

John Duchi; Composite modeling and optimization, with applications to phase retrieval and nonlinear observation modeling

CORE Series
Friday, Oct 20, 2017
SMI 211, 3:30PM 
John DuchiStanford University (Departments of Statistics and Electrical Engineering)

TITLE: Composite modeling and optimization, with applications to phase retrieval and nonlinear observation modeling

ABSTRACT: We consider minimization of stochastic functionals that are compositions of a (potentially) non-smooth convex function h and smooth function c. We develop two stochastic methods–a stochastic prox-linear algorithm and a stochastic (generalized) sub-gradient procedure–and prove that, under mild technical conditions, each converges to first-order stationary points of the stochastic objective. Additionally, we analyze this problem class in the context of phase retrieval and more generic nonlinear modeling problems, showing that we can solve these problems (even with faulty measurements) with extremely high probability under appropriate random measurement models. We provide substantial experiments investigating our methods, indicating the practical effectiveness of the procedures.

 

BIO: John Duchi is an assistant professor of Statistics and Electrical Engineering and (by courtesy) Computer Science at Stanford University, with graduate degrees from UC Berkeleyand undergraduate degrees from Stanford. His work focuses on large scale optimization problems arising out of statistical and machine learning problems, robustness and uncertain data problems, and information theoretic aspects of statistical learning. He has won a number of awards and fellowships, including a best paper award at the International Conference onMachine Learning, an NSF CAREER award, and a Sloan Fellowship in Mathematics.

Hongzhou Lin; A Generic Quasi-Newton Algorithm for Faster Gradient-Based Optimization

June 6, 2017, 4pm
PDL C-401
Hongzhou Lin, Inria Grenoble

In this talk, we propose a generic approach to accelerate gradient-based optimization algorithms with quasi-Newton principles. The proposed scheme, called QuickeNing, can be applied to incremental first-order methods such as stochastic variance-reduced gradient (SVRG) or incremental surrogate optimization (MISO). It is also compatible with composite objectives, meaning that it has the ability to provide exactly sparse solutions when the objective involves a sparsity-inducing regularization. QuickeNing relies on limited-memory BFGS rules, making it appropriate for solving high-dimensional optimization problems. Besides, it enjoys a worst-case linear convergence rate for strongly convex problems. We present experimental results where QuickeNing gives significant improvements over competing methods for solving large-scale high-dimensional machine learning problems.

Kellie MacPhee; Gauge and perspective duality

May 30, 2017, 4pm
PDL C-401
Kellie MacPhee, Department of Mathematics, University of Washington

Abstract: Fenchel-Young duality is widely used in convex optimization and relies on the conjugacy operation for convex functions; however, alternative notions of duality relying on parallel operations exist as well. In particular, gauge and perspective duality are defined via the polarity operation on gauge functions. We present a perturbation argument for deriving gauge duality, thus placing it on equal footing with Fenchel-Young duality. This approach also yields explicit optimality conditions (analogous to KKT conditions), and a simple primal-from-dual recovery method based on existing algorithms. Numerical results confirm the usefulness of this approach in certain contexts (e.g. optimization over PLQ functions).

Peng Zheng; What’s the shape of your penalty?

May 9, 2017, 4pm
PDL C-401
Peng Zheng, Department of Applied Mathematics, University of Washington

Abstract: Performance of machine learning approaches is strongly influenced by choice of misfit penalty, and correct settings of penalty parameters, such as the threshold of the Huber function. These parameter are typically chosen using expert knowledge, cross-validation, or black-box optimization, which are time consuming for large-scale applications.

We present a data-driven approach that simultaneously solves inference problems and learns error structure and penalty parameters. We discuss theoretical properties of these joint problems, and present algorithms for their solution. We show numerical examples from the piecewise linear-quadratic (PLQ) family of penalties.

Scott Roy; An Optimal First-order Method Based on Optimal Quadratic Averaging

May 1, 2017, 4pm
PDL C-401
Scott Roy, Department of Mathematics, University of Washington

Abstract: In a recent paper, Bubeck, Lee, and Singh introduced a new first order method for minimizing smooth strongly convex functions. Their geometric descent algorithm, largely inspired by the ellipsoid method, enjoys the optimal linear rate of convergence. We show that the same iterate sequence is generated by a scheme that in each iteration computes an optimal average of quadratic lower-models of the function. Indeed, the minimum of the averaged quadratic approaches the true minimum at an optimal rate. This intuitive viewpoint reveals clear connections to the original fast-gradient methods and cutting plane ideas, and leads to limited-memory extensions with improved performance.

Joint work with Dmitriy Drusvyatskiy and Maryam Fazel.