|
|
|
|
LEADER |
01000naa a22002652 4500 |
001 |
NLM328919950 |
003 |
DE-627 |
005 |
20231225203959.0 |
007 |
cr uuu---uuuuu |
008 |
231225s2022 xx |||||o 00| ||eng c |
024 |
7 |
|
|a 10.1109/TPAMI.2021.3102140
|2 doi
|
028 |
5 |
2 |
|a pubmed24n1096.xml
|
035 |
|
|
|a (DE-627)NLM328919950
|
035 |
|
|
|a (NLM)34347595
|
040 |
|
|
|a DE-627
|b ger
|c DE-627
|e rakwb
|
041 |
|
|
|a eng
|
100 |
1 |
|
|a Li, Wenhao
|e verfasserin
|4 aut
|
245 |
1 |
0 |
|a Structured Cooperative Reinforcement Learning With Time-Varying Composite Action Space
|
264 |
|
1 |
|c 2022
|
336 |
|
|
|a Text
|b txt
|2 rdacontent
|
337 |
|
|
|a ƒaComputermedien
|b c
|2 rdamedia
|
338 |
|
|
|a ƒa Online-Ressource
|b cr
|2 rdacarrier
|
500 |
|
|
|a Date Revised 05.10.2022
|
500 |
|
|
|a published: Print-Electronic
|
500 |
|
|
|a Citation Status PubMed-not-MEDLINE
|
520 |
|
|
|a In recent years, reinforcement learning has achieved excellent results in low-dimensional static action spaces such as games and simple robotics. However, the action space is usually composite, composed of multiple sub-action with different functions, and time-varying for practical tasks. The existing sub-actions might be temporarily invalid due to the external environment, while unseen sub-actions can be added to the current system. To solve the robustness and transferability problems in time-varying composite action spaces, we propose a structured cooperative reinforcement learning algorithm based on the centralized critic and decentralized actor framework, called SCORE. We model the single-agent problem with composite action space as a fully cooperative partially observable stochastic game and further employ a graph attention network to capture the dependencies between heterogeneous sub-actions. To promote tighter cooperation between the decomposed heterogeneous agents, SCORE introduces a hierarchical variational autoencoder, which maps the heterogeneous sub-action space into a common latent action space. We also incorporate an implicit credit assignment structure into the SCORE to overcome the multi-agent credit assignment problem in the fully cooperative partially observable stochastic game. Performance experiments on the proof-of-concept task and precision agriculture task show that SCORE has significant advantages in robustness and transferability for time-varying composite action space
|
650 |
|
4 |
|a Journal Article
|
700 |
1 |
|
|a Wang, Xiangfeng
|e verfasserin
|4 aut
|
700 |
1 |
|
|a Jin, Bo
|e verfasserin
|4 aut
|
700 |
1 |
|
|a Luo, Dijun
|e verfasserin
|4 aut
|
700 |
1 |
|
|a Zha, Hongyuan
|e verfasserin
|4 aut
|
773 |
0 |
8 |
|i Enthalten in
|t IEEE transactions on pattern analysis and machine intelligence
|d 1979
|g 44(2022), 11 vom: 04. Nov., Seite 8618-8634
|w (DE-627)NLM098212257
|x 1939-3539
|7 nnns
|
773 |
1 |
8 |
|g volume:44
|g year:2022
|g number:11
|g day:04
|g month:11
|g pages:8618-8634
|
856 |
4 |
0 |
|u http://dx.doi.org/10.1109/TPAMI.2021.3102140
|3 Volltext
|
912 |
|
|
|a GBV_USEFLAG_A
|
912 |
|
|
|a SYSFLAG_A
|
912 |
|
|
|a GBV_NLM
|
912 |
|
|
|a GBV_ILN_350
|
951 |
|
|
|a AR
|
952 |
|
|
|d 44
|j 2022
|e 11
|b 04
|c 11
|h 8618-8634
|