Unveiling the Power of Self-Supervision for Multi-View Multi-Human Association and Tracking

Multi-view multi-human association and tracking (MvMHAT), is an emerging yet important problem for multi-person scene video surveillance, aiming to track a group of people over time in each view, as well as to identify the same person across different views at the same time, which is different from...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on pattern analysis and machine intelligence. - 1979. - PP(2024) vom: 19. Sept.
1. Verfasser: Feng, Wei (VerfasserIn)
Weitere Verfasser: Wang, Feifan, Han, Ruize, Gan, Yiyang, Qian, Zekun, Hou, Junhui, Wang, Song
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2024
Zugriff auf das übergeordnete Werk:IEEE transactions on pattern analysis and machine intelligence
Schlagworte:Journal Article
LEADER 01000naa a22002652 4500
001 NLM377845361
003 DE-627
005 20240920232950.0
007 cr uuu---uuuuu
008 240920s2024 xx |||||o 00| ||eng c
024 7 |a 10.1109/TPAMI.2024.3463966  |2 doi 
028 5 2 |a pubmed24n1540.xml 
035 |a (DE-627)NLM377845361 
035 |a (NLM)39298301 
040 |a DE-627  |b ger  |c DE-627  |e rakwb 
041 |a eng 
100 1 |a Feng, Wei  |e verfasserin  |4 aut 
245 1 0 |a Unveiling the Power of Self-Supervision for Multi-View Multi-Human Association and Tracking 
264 1 |c 2024 
336 |a Text  |b txt  |2 rdacontent 
337 |a ƒaComputermedien  |b c  |2 rdamedia 
338 |a ƒa Online-Ressource  |b cr  |2 rdacarrier 
500 |a Date Revised 19.09.2024 
500 |a published: Print-Electronic 
500 |a Citation Status Publisher 
520 |a Multi-view multi-human association and tracking (MvMHAT), is an emerging yet important problem for multi-person scene video surveillance, aiming to track a group of people over time in each view, as well as to identify the same person across different views at the same time, which is different from previous MOT and multi-camera MOT tasks only considering the over-time human tracking. This way, the videos for MvMHAT require more complex annotations while containing more information for self-learning. In this work, we tackle this problem with an end-to-end neural network in a self-supervised learning manner. Specifically, we propose to take advantage of the spatial-temporal self-consistency rationale by considering three properties of reflexivity, symmetry, and transitivity. Besides the reflexivity property that naturally holds, we design the self-supervised learning losses based on the properties of symmetry and transitivity, for both appearance feature learning and assignment matrix optimization, to associate multiple humans over time and across views. Furthermore, to promote the research on MvMHAT, we build two new large-scale benchmarks for the network training and testing of different algorithms. Extensive experiments on the proposed benchmarks verify the effectiveness of our method. We have released the benchmark and code to the public 
650 4 |a Journal Article 
700 1 |a Wang, Feifan  |e verfasserin  |4 aut 
700 1 |a Han, Ruize  |e verfasserin  |4 aut 
700 1 |a Gan, Yiyang  |e verfasserin  |4 aut 
700 1 |a Qian, Zekun  |e verfasserin  |4 aut 
700 1 |a Hou, Junhui  |e verfasserin  |4 aut 
700 1 |a Wang, Song  |e verfasserin  |4 aut 
773 0 8 |i Enthalten in  |t IEEE transactions on pattern analysis and machine intelligence  |d 1979  |g PP(2024) vom: 19. Sept.  |w (DE-627)NLM098212257  |x 1939-3539  |7 nnns 
773 1 8 |g volume:PP  |g year:2024  |g day:19  |g month:09 
856 4 0 |u http://dx.doi.org/10.1109/TPAMI.2024.3463966  |3 Volltext 
912 |a GBV_USEFLAG_A 
912 |a SYSFLAG_A 
912 |a GBV_NLM 
912 |a GBV_ILN_350 
951 |a AR 
952 |d PP  |j 2024  |b 19  |c 09