Self-Regulated Learning for Egocentric Video Activity Anticipation

Future activity anticipation is a challenging problem in egocentric vision. As a standard future activity anticipation paradigm, recursive sequence prediction suffers from the accumulation of errors. To address this problem, we propose a simple and effective Self-Regulated Learning framework, which...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on pattern analysis and machine intelligence. - 1979. - 45(2023), 6 vom: 01. Juni, Seite 6715-6730
1. Verfasser: Qi, Zhaobo (VerfasserIn)
Weitere Verfasser: Wang, Shuhui, Su, Chi, Su, Li, Huang, Qingming, Tian, Qi
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2023
Zugriff auf das übergeordnete Werk:IEEE transactions on pattern analysis and machine intelligence
Schlagworte:Journal Article
LEADER 01000naa a22002652 4500
001 NLM321551133
003 DE-627
005 20231225180040.0
007 cr uuu---uuuuu
008 231225s2023 xx |||||o 00| ||eng c
024 7 |a 10.1109/TPAMI.2021.3059923  |2 doi 
028 5 2 |a pubmed24n1071.xml 
035 |a (DE-627)NLM321551133 
035 |a (NLM)33596171 
040 |a DE-627  |b ger  |c DE-627  |e rakwb 
041 |a eng 
100 1 |a Qi, Zhaobo  |e verfasserin  |4 aut 
245 1 0 |a Self-Regulated Learning for Egocentric Video Activity Anticipation 
264 1 |c 2023 
336 |a Text  |b txt  |2 rdacontent 
337 |a ƒaComputermedien  |b c  |2 rdamedia 
338 |a ƒa Online-Ressource  |b cr  |2 rdacarrier 
500 |a Date Completed 07.05.2023 
500 |a Date Revised 07.05.2023 
500 |a published: Print-Electronic 
500 |a Citation Status PubMed-not-MEDLINE 
520 |a Future activity anticipation is a challenging problem in egocentric vision. As a standard future activity anticipation paradigm, recursive sequence prediction suffers from the accumulation of errors. To address this problem, we propose a simple and effective Self-Regulated Learning framework, which aims to regulate the intermediate representation consecutively to produce representation that (a) emphasizes the novel information in the frame of the current time-stamp in contrast to previously observed content, and (b) reflects its correlation with previously observed frames. The former is achieved by minimizing a contrastive loss, and the latter can be achieved by a dynamic reweighing mechanism to attend to informative frames in the observed content with a similarity comparison between feature of the current frame and observed frames. The learned final video representation can be further enhanced by multi-task learning which performs joint feature learning on the target activity labels and the automatically detected action and object class tokens. SRL sharply outperforms existing state-of-the-art in most cases on two egocentric video datasets and two third-person video datasets. Its effectiveness is also verified by the experimental fact that the action and object concepts that support the activity semantics can be accurately identified 
650 4 |a Journal Article 
700 1 |a Wang, Shuhui  |e verfasserin  |4 aut 
700 1 |a Su, Chi  |e verfasserin  |4 aut 
700 1 |a Su, Li  |e verfasserin  |4 aut 
700 1 |a Huang, Qingming  |e verfasserin  |4 aut 
700 1 |a Tian, Qi  |e verfasserin  |4 aut 
773 0 8 |i Enthalten in  |t IEEE transactions on pattern analysis and machine intelligence  |d 1979  |g 45(2023), 6 vom: 01. Juni, Seite 6715-6730  |w (DE-627)NLM098212257  |x 1939-3539  |7 nnns 
773 1 8 |g volume:45  |g year:2023  |g number:6  |g day:01  |g month:06  |g pages:6715-6730 
856 4 0 |u http://dx.doi.org/10.1109/TPAMI.2021.3059923  |3 Volltext 
912 |a GBV_USEFLAG_A 
912 |a SYSFLAG_A 
912 |a GBV_NLM 
912 |a GBV_ILN_350 
951 |a AR 
952 |d 45  |j 2023  |e 6  |b 01  |c 06  |h 6715-6730