Sparse Learning with Stochastic Composite Optimization

In this paper, we study Stochastic Composite Optimization (SCO) for sparse learning that aims to learn a sparse solution from a composite function. Most of the recent SCO algorithms have already reached the optimal expected convergence rate O(1/λT), but they often fail to deliver sparse solutions at...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on pattern analysis and machine intelligence. - 1979. - 39(2017), 6 vom: 01. Juni, Seite 1223-1236
1. Verfasser: Zhang, Weizhong (VerfasserIn)
Weitere Verfasser: Zhang, Lijun, Jin, Zhongming, Jin, Rong, Cai, Deng, Li, Xuelong, Liang, Ronghua, He, Xiaofei
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2017
Zugriff auf das übergeordnete Werk:IEEE transactions on pattern analysis and machine intelligence
Schlagworte:Journal Article Research Support, Non-U.S. Gov't
Beschreibung
Zusammenfassung:In this paper, we study Stochastic Composite Optimization (SCO) for sparse learning that aims to learn a sparse solution from a composite function. Most of the recent SCO algorithms have already reached the optimal expected convergence rate O(1/λT), but they often fail to deliver sparse solutions at the end either due to the limited sparsity regularization during stochastic optimization (SO) or due to the limitation in online-to-batch conversion. Even when the objective function is strongly convex, their high probability bounds can only attain O(√{log(1/δ)/T}) with δ is the failure probability, which is much worse than the expected convergence rate. To address these limitations, we propose a simple yet effective two-phase Stochastic Composite Optimization scheme by adding a novel powerful sparse online-to-batch conversion to the general Stochastic Optimization algorithms. We further develop three concrete algorithms, OptimalSL, LastSL and AverageSL, directly under our scheme to prove the effectiveness of the proposed scheme. Both the theoretical analysis and the experiment results show that our methods can really outperform the existing methods at the ability of sparse learning and at the meantime we can improve the high probability bound to approximately O(log(log(T)/δ)/λT)
Beschreibung:Date Completed 25.10.2018
Date Revised 25.10.2018
published: Print-Electronic
Citation Status PubMed-not-MEDLINE
ISSN:1939-3539
DOI:10.1109/TPAMI.2016.2578323