Structured Cooperative Reinforcement Learning With Time-Varying Composite Action Space

In recent years, reinforcement learning has achieved excellent results in low-dimensional static action spaces such as games and simple robotics. However, the action space is usually composite, composed of multiple sub-action with different functions, and time-varying for practical tasks. The existi...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on pattern analysis and machine intelligence. - 1979. - 44(2022), 11 vom: 04. Nov., Seite 8618-8634
1. Verfasser:	Li, Wenhao (VerfasserIn)
Weitere Verfasser:	Wang, Xiangfeng, Jin, Bo, Luo, Dijun, Zha, Hongyuan
Format:	Online-Aufsatz
Sprache:	English
Veröffentlicht:	2022
Zugriff auf das übergeordnete Werk:	IEEE transactions on pattern analysis and machine intelligence
Schlagworte:	Journal Article


LEADER	01000caa a22002652c 4500
001	NLM328919950
003	DE-627
005	20250302075424.0
007	cr uuu---uuuuu
008	231225s2022 xx \|\|\|\|\|o 00\| \|\|eng c
024	7		\|a 10.1109/TPAMI.2021.3102140 \|2 doi
028	5	2	\|a pubmed25n1096.xml
035			\|a (DE-627)NLM328919950
035			\|a (NLM)34347595
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
100	1		\|a Li, Wenhao \|e verfasserin \|4 aut
245	1	0	\|a Structured Cooperative Reinforcement Learning With Time-Varying Composite Action Space
264		1	\|c 2022
336			\|a Text \|b txt \|2 rdacontent
337			\|a ƒaComputermedien \|b c \|2 rdamedia
338			\|a ƒa Online-Ressource \|b cr \|2 rdacarrier
500			\|a Date Revised 05.10.2022
500			\|a published: Print-Electronic
500			\|a Citation Status PubMed-not-MEDLINE
520			\|a In recent years, reinforcement learning has achieved excellent results in low-dimensional static action spaces such as games and simple robotics. However, the action space is usually composite, composed of multiple sub-action with different functions, and time-varying for practical tasks. The existing sub-actions might be temporarily invalid due to the external environment, while unseen sub-actions can be added to the current system. To solve the robustness and transferability problems in time-varying composite action spaces, we propose a structured cooperative reinforcement learning algorithm based on the centralized critic and decentralized actor framework, called SCORE. We model the single-agent problem with composite action space as a fully cooperative partially observable stochastic game and further employ a graph attention network to capture the dependencies between heterogeneous sub-actions. To promote tighter cooperation between the decomposed heterogeneous agents, SCORE introduces a hierarchical variational autoencoder, which maps the heterogeneous sub-action space into a common latent action space. We also incorporate an implicit credit assignment structure into the SCORE to overcome the multi-agent credit assignment problem in the fully cooperative partially observable stochastic game. Performance experiments on the proof-of-concept task and precision agriculture task show that SCORE has significant advantages in robustness and transferability for time-varying composite action space
650		4	\|a Journal Article
700	1		\|a Wang, Xiangfeng \|e verfasserin \|4 aut
700	1		\|a Jin, Bo \|e verfasserin \|4 aut
700	1		\|a Luo, Dijun \|e verfasserin \|4 aut
700	1		\|a Zha, Hongyuan \|e verfasserin \|4 aut
773	0	8	\|i Enthalten in \|t IEEE transactions on pattern analysis and machine intelligence \|d 1979 \|g 44(2022), 11 vom: 04. Nov., Seite 8618-8634 \|w (DE-627)NLM098212257 \|x 1939-3539 \|7 nnas
773	1	8	\|g volume:44 \|g year:2022 \|g number:11 \|g day:04 \|g month:11 \|g pages:8618-8634
856	4	0	\|u http://dx.doi.org/10.1109/TPAMI.2021.3102140 \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a SYSFLAG_A
912			\|a GBV_NLM
912			\|a GBV_ILN_350
951			\|a AR
952			\|d 44 \|j 2022 \|e 11 \|b 04 \|c 11 \|h 8618-8634