Adaptive Batch Size Time Evolving Stochastic Gradient Descent for Federated Learning

Variance reduction has been shown to improve the performance of Stochastic Gradient Descent (SGD) in centralized machine learning. However, when it is extended to federated learning systems, many issues may arise, including (i) mega-batch size settings; (ii) additional noise introduced by the gradie...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on pattern analysis and machine intelligence. - 1979. - PP(2025) vom: 15. Sept.
1. Verfasser: An, Xuming (VerfasserIn)
Weitere Verfasser: Shen, Li, Luo, Yong, Hu, Han, Tao, Dacheng
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2025
Zugriff auf das übergeordnete Werk:IEEE transactions on pattern analysis and machine intelligence
Schlagworte:Journal Article
LEADER 01000naa a22002652c 4500
001 NLM392639866
003 DE-627
005 20250917000139.0
007 cr uuu---uuuuu
008 250917s2025 xx |||||o 00| ||eng c
024 7 |a 10.1109/TPAMI.2025.3610169  |2 doi 
028 5 2 |a pubmed25n1570.xml 
035 |a (DE-627)NLM392639866 
035 |a (NLM)40953432 
040 |a DE-627  |b ger  |c DE-627  |e rakwb 
041 |a eng 
100 1 |a An, Xuming  |e verfasserin  |4 aut 
245 1 0 |a Adaptive Batch Size Time Evolving Stochastic Gradient Descent for Federated Learning 
264 1 |c 2025 
336 |a Text  |b txt  |2 rdacontent 
337 |a ƒaComputermedien  |b c  |2 rdamedia 
338 |a ƒa Online-Ressource  |b cr  |2 rdacarrier 
500 |a Date Revised 15.09.2025 
500 |a published: Print-Electronic 
500 |a Citation Status Publisher 
520 |a Variance reduction has been shown to improve the performance of Stochastic Gradient Descent (SGD) in centralized machine learning. However, when it is extended to federated learning systems, many issues may arise, including (i) mega-batch size settings; (ii) additional noise introduced by the gradient difference between the current iteration and the snapshot point; and (iii) gradient (statistical) heterogeneity. In this paper, we propose a lightweight algorithm termed federated adaptive batch size time evolving variance reduction (FedATEVR) to tackle these issues, consisting of an adaptive batch size setting scheme and a time-evolving variance reduction gradient estimator. In particular, we use the historical gradient information to set an appropriate mega-batch size for each client, which can steadily accelerate the local SGD process and reduce the computation cost. The historical information involves both global and local gradient, which mitigates unstable varying in mega-batch size introduced by gradient heterogeneity among the clients. For each client, the gradient difference between the current iteration and the snapshot point is used to tune the time-evolving weight of the variance reduction term in the gradient estimator. This can avoid meaningless variance reduction caused by the out-of-date snapshot point gradient. We theoretically prove that our algorithm can achieve a linear speedup of of $\mathcal {O}(\frac{1}{\sqrt{SKT}})$ for non-convex objective functions under partial client participation. Extensive experiments demonstrate that our proposed method can achieve higher test accuracy than the baselines and decrease communication rounds greatly 
650 4 |a Journal Article 
700 1 |a Shen, Li  |e verfasserin  |4 aut 
700 1 |a Luo, Yong  |e verfasserin  |4 aut 
700 1 |a Hu, Han  |e verfasserin  |4 aut 
700 1 |a Tao, Dacheng  |e verfasserin  |4 aut 
773 0 8 |i Enthalten in  |t IEEE transactions on pattern analysis and machine intelligence  |d 1979  |g PP(2025) vom: 15. Sept.  |w (DE-627)NLM098212257  |x 1939-3539  |7 nnas 
773 1 8 |g volume:PP  |g year:2025  |g day:15  |g month:09 
856 4 0 |u http://dx.doi.org/10.1109/TPAMI.2025.3610169  |3 Volltext 
912 |a GBV_USEFLAG_A 
912 |a SYSFLAG_A 
912 |a GBV_NLM 
912 |a GBV_ILN_350 
951 |a AR 
952 |d PP  |j 2025  |b 15  |c 09