Statistical inference for the population landscape via moment-adjusted stochastic gradients

Modern statistical inference tasks often require iterative optimization methods to compute the solution. Convergence analysis from an optimization viewpoint informs us only how well the solution is approximated numerically but overlooks the sampling nature of the data. In contrast, recognizing the r...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:Journal of the Royal Statistical Society. Series B (Statistical Methodology). - Blackwell Publishers. - 81(2019), 2, Seite 431-456
1. Verfasser: Liang, Tengyuan (VerfasserIn)
Weitere Verfasser: Su, Weijie J.
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2019
Zugriff auf das übergeordnete Werk:Journal of the Royal Statistical Society. Series B (Statistical Methodology)
Beschreibung
Zusammenfassung:Modern statistical inference tasks often require iterative optimization methods to compute the solution. Convergence analysis from an optimization viewpoint informs us only how well the solution is approximated numerically but overlooks the sampling nature of the data. In contrast, recognizing the randomness in the data, statisticians are keen to provide uncertainty quantification, or confidence, for the solution obtained by using iterative optimization methods. The paper makes progress along this direction by introducing moment-adjusted stochastic gradient descent: a new stochastic optimization method for statistical inference. We establish non-asymptotic theory that characterizes the statistical distribution for certain iterative methods with optimization guarantees. On the statistical front, the theory allows for model mis-specification, with very mild conditions on the data. For optimization, the theory is flexible for both convex and non-convex cases. Remarkably, the moment adjusting idea motivated from ‘error standardization’ in statistics achieves a similar effect to acceleration in first-order optimization methods that are used to fit generalized linear models. We also demonstrate this acceleration effect in the non-convex setting through numerical experiments.
ISSN:14679868