Structured Cooperative Reinforcement Learning With Time-Varying Composite Action Space
In recent years, reinforcement learning has achieved excellent results in low-dimensional static action spaces such as games and simple robotics. However, the action space is usually composite, composed of multiple sub-action with different functions, and time-varying for practical tasks. The existi...
Veröffentlicht in: | IEEE transactions on pattern analysis and machine intelligence. - 1979. - 44(2022), 11 vom: 04. Nov., Seite 8618-8634 |
---|---|
1. Verfasser: | |
Weitere Verfasser: | , , , |
Format: | Online-Aufsatz |
Sprache: | English |
Veröffentlicht: |
2022
|
Zugriff auf das übergeordnete Werk: | IEEE transactions on pattern analysis and machine intelligence |
Schlagworte: | Journal Article |
Zusammenfassung: | In recent years, reinforcement learning has achieved excellent results in low-dimensional static action spaces such as games and simple robotics. However, the action space is usually composite, composed of multiple sub-action with different functions, and time-varying for practical tasks. The existing sub-actions might be temporarily invalid due to the external environment, while unseen sub-actions can be added to the current system. To solve the robustness and transferability problems in time-varying composite action spaces, we propose a structured cooperative reinforcement learning algorithm based on the centralized critic and decentralized actor framework, called SCORE. We model the single-agent problem with composite action space as a fully cooperative partially observable stochastic game and further employ a graph attention network to capture the dependencies between heterogeneous sub-actions. To promote tighter cooperation between the decomposed heterogeneous agents, SCORE introduces a hierarchical variational autoencoder, which maps the heterogeneous sub-action space into a common latent action space. We also incorporate an implicit credit assignment structure into the SCORE to overcome the multi-agent credit assignment problem in the fully cooperative partially observable stochastic game. Performance experiments on the proof-of-concept task and precision agriculture task show that SCORE has significant advantages in robustness and transferability for time-varying composite action space |
---|---|
Beschreibung: | Date Revised 05.10.2022 published: Print-Electronic Citation Status PubMed-not-MEDLINE |
ISSN: | 1939-3539 |
DOI: | 10.1109/TPAMI.2021.3102140 |