Zeyuan Allen-Zhu; How to Swing By Saddle Points: Faster Non-Convex Optimization Than SGD

Feb 13th, 2018, 4pm
LOW 105
Zeyuan Allen-Zhu, Microsoft Research

“The diverse world of deep learning research has given rise to thousands of architectures for neural networks. However, to this date, the underlying training algorithms for neural networks are still stochastic gradient descent (SGD) and its heuristic variants.

In this talk, we present a new stochastic algorithm to train any smooth neural network to ε-approximate local minima, using O(ε^{-3.25}) backpropagations. The best provable result was O(ε^{-4}) by SGD before this work.

More broadly, it finds ε-approximate local minima of any smooth nonconvex function using only O(ε^{-3.25}) stochastic gradient computations.”