Adaptive Batch Size Time Evolving Stochastic Gradient Descent for Federated Learning

Variance reduction has been shown to improve the performance of Stochastic Gradient Descent (SGD) in centralized machine learning. However, when it is extended to federated learning systems, many issues may arise, including (i) mega-batch size settings; (ii) additional noise introduced by the gradie...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on pattern analysis and machine intelligence. - 1979. - PP(2025) vom: 15. Sept.
1. Verfasser:	An, Xuming (VerfasserIn)
Weitere Verfasser:	Shen, Li, Luo, Yong, Hu, Han, Tao, Dacheng
Format:	Online-Aufsatz
Sprache:	English
Veröffentlicht:	2025
Zugriff auf das übergeordnete Werk:	IEEE transactions on pattern analysis and machine intelligence
Schlagworte:	Journal Article


LEADER	01000naa a22002652c 4500
001	NLM392639866
003	DE-627
005	20250917000139.0
007	cr uuu---uuuuu
008	250917s2025 xx \|\|\|\|\|o 00\| \|\|eng c
024	7		\|a 10.1109/TPAMI.2025.3610169 \|2 doi
028	5	2	\|a pubmed25n1570.xml
035			\|a (DE-627)NLM392639866
035			\|a (NLM)40953432
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
100	1		\|a An, Xuming \|e verfasserin \|4 aut
245	1	0	\|a Adaptive Batch Size Time Evolving Stochastic Gradient Descent for Federated Learning
264		1	\|c 2025
336			\|a Text \|b txt \|2 rdacontent
337			\|a ƒaComputermedien \|b c \|2 rdamedia
338			\|a ƒa Online-Ressource \|b cr \|2 rdacarrier
500			\|a Date Revised 15.09.2025
500			\|a published: Print-Electronic
500			\|a Citation Status Publisher
520			\|a Variance reduction has been shown to improve the performance of Stochastic Gradient Descent (SGD) in centralized machine learning. However, when it is extended to federated learning systems, many issues may arise, including (i) mega-batch size settings; (ii) additional noise introduced by the gradient difference between the current iteration and the snapshot point; and (iii) gradient (statistical) heterogeneity. In this paper, we propose a lightweight algorithm termed federated adaptive batch size time evolving variance reduction (FedATEVR) to tackle these issues, consisting of an adaptive batch size setting scheme and a time-evolving variance reduction gradient estimator. In particular, we use the historical gradient information to set an appropriate mega-batch size for each client, which can steadily accelerate the local SGD process and reduce the computation cost. The historical information involves both global and local gradient, which mitigates unstable varying in mega-batch size introduced by gradient heterogeneity among the clients. For each client, the gradient difference between the current iteration and the snapshot point is used to tune the time-evolving weight of the variance reduction term in the gradient estimator. This can avoid meaningless variance reduction caused by the out-of-date snapshot point gradient. We theoretically prove that our algorithm can achieve a linear speedup of of $\mathcal {O}(\frac{1}{\sqrt{SKT}})$ for non-convex objective functions under partial client participation. Extensive experiments demonstrate that our proposed method can achieve higher test accuracy than the baselines and decrease communication rounds greatly
650		4	\|a Journal Article
700	1		\|a Shen, Li \|e verfasserin \|4 aut
700	1		\|a Luo, Yong \|e verfasserin \|4 aut
700	1		\|a Hu, Han \|e verfasserin \|4 aut
700	1		\|a Tao, Dacheng \|e verfasserin \|4 aut
773	0	8	\|i Enthalten in \|t IEEE transactions on pattern analysis and machine intelligence \|d 1979 \|g PP(2025) vom: 15. Sept. \|w (DE-627)NLM098212257 \|x 1939-3539 \|7 nnas
773	1	8	\|g volume:PP \|g year:2025 \|g day:15 \|g month:09
856	4	0	\|u http://dx.doi.org/10.1109/TPAMI.2025.3610169 \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a SYSFLAG_A
912			\|a GBV_NLM
912			\|a GBV_ILN_350
951			\|a AR
952			\|d PP \|j 2025 \|b 15 \|c 09