|
Rapid Stochastic Gradient Descent: Accelerating Machine Learning
The incorporation of online learning capabilities into real-time computing systems has been hampered by a lack of efficient, scalable optimization algorithms for this purpose: second-order methods are too expensive for large, nonlinear models, conjugate gradient does not tolerate the noise inherent in online learning, and simple gradient descent, evolutionary algorithms, etc., are unacceptably slow to converge. I am addressing this problem by developing new ways to accelerate stochastic gradient descent, using second-order gradient information obtained through the efficient computation of curvature matrix-vector products. In the stochastic meta-descent (SMD) algorithm, this cheap curvature information is built up iteratively into a stochastic approximation of Levenberg-Marquardt second-order gradient steps, which are then used to adapt individual gradient step sizes. SMD handles noisy, correlated, non-stationary signals well, and approaches the rapid convergence of second-order methods at only linear cost per iteration, thus scaling up to extremely large nonlinear systems. To date it has enabled new adaptive techniques in computational fluid dynamics and computer vision. Our most recent development is a version of SMD operating in reproducing kernel Hilbert space.
Video Length: 3549
Date Found: October 13, 2010
Date Produced: February 25, 2007
View Count: 0
|