Masked Contrastive Representation Learning for Reinforcement Learning

In pixel-based reinforcement learning (RL), the states are raw video frames, which are mapped into hidden representation before feeding to a policy network. To improve sample efficiency of state representation learning, recently, the most prominent work is based on contrastive unsupervised represent...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on pattern analysis and machine intelligence. - 1979. - 45(2023), 3 vom: 01. März, Seite 3421-3433
1. Verfasser: Zhu, Jinhua (VerfasserIn)
Weitere Verfasser: Xia, Yingce, Wu, Lijun, Deng, Jiajun, Zhou, Wengang, Qin, Tao, Liu, Tie-Yan, Li, Houqiang
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2023
Zugriff auf das übergeordnete Werk:IEEE transactions on pattern analysis and machine intelligence
Schlagworte:Journal Article
LEADER 01000caa a22002652c 4500
001 NLM341168351
003 DE-627
005 20250303091922.0
007 cr uuu---uuuuu
008 231226s2023 xx |||||o 00| ||eng c
024 7 |a 10.1109/TPAMI.2022.3176413  |2 doi 
028 5 2 |a pubmed25n1137.xml 
035 |a (DE-627)NLM341168351 
035 |a (NLM)35594229 
040 |a DE-627  |b ger  |c DE-627  |e rakwb 
041 |a eng 
100 1 |a Zhu, Jinhua  |e verfasserin  |4 aut 
245 1 0 |a Masked Contrastive Representation Learning for Reinforcement Learning 
264 1 |c 2023 
336 |a Text  |b txt  |2 rdacontent 
337 |a ƒaComputermedien  |b c  |2 rdamedia 
338 |a ƒa Online-Ressource  |b cr  |2 rdacarrier 
500 |a Date Completed 07.04.2023 
500 |a Date Revised 11.04.2023 
500 |a published: Print-Electronic 
500 |a Citation Status PubMed-not-MEDLINE 
520 |a In pixel-based reinforcement learning (RL), the states are raw video frames, which are mapped into hidden representation before feeding to a policy network. To improve sample efficiency of state representation learning, recently, the most prominent work is based on contrastive unsupervised representation. Witnessing that consecutive video frames in a game are highly correlated, to further improve data efficiency, we propose a new algorithm, i.e., masked contrastive representation learning for RL (M-CURL), which takes the correlation among consecutive inputs into consideration. In our architecture, besides a CNN encoder for hidden presentation of input state and a policy network for action selection, we introduce an auxiliary Transformer encoder module to leverage the correlations among video frames. During training, we randomly mask the features of several frames, and use the CNN encoder and Transformer to reconstruct them based on context frames. The CNN encoder and Transformer are jointly trained via contrastive learning where the reconstructed features should be similar to the ground-truth ones while dissimilar to others. During policy evaluation, the CNN encoder and the policy network are used to take actions, and the Transformer module is discarded. Our method achieves consistent improvements over CURL on 14 out of 16 environments from DMControl suite and 23 out of 26 environments from Atari 2600 Games. The code is available at https://github.com/teslacool/m-curl 
650 4 |a Journal Article 
700 1 |a Xia, Yingce  |e verfasserin  |4 aut 
700 1 |a Wu, Lijun  |e verfasserin  |4 aut 
700 1 |a Deng, Jiajun  |e verfasserin  |4 aut 
700 1 |a Zhou, Wengang  |e verfasserin  |4 aut 
700 1 |a Qin, Tao  |e verfasserin  |4 aut 
700 1 |a Liu, Tie-Yan  |e verfasserin  |4 aut 
700 1 |a Li, Houqiang  |e verfasserin  |4 aut 
773 0 8 |i Enthalten in  |t IEEE transactions on pattern analysis and machine intelligence  |d 1979  |g 45(2023), 3 vom: 01. März, Seite 3421-3433  |w (DE-627)NLM098212257  |x 1939-3539  |7 nnas 
773 1 8 |g volume:45  |g year:2023  |g number:3  |g day:01  |g month:03  |g pages:3421-3433 
856 4 0 |u http://dx.doi.org/10.1109/TPAMI.2022.3176413  |3 Volltext 
912 |a GBV_USEFLAG_A 
912 |a SYSFLAG_A 
912 |a GBV_NLM 
912 |a GBV_ILN_350 
951 |a AR 
952 |d 45  |j 2023  |e 3  |b 01  |c 03  |h 3421-3433