Video Saliency Prediction using Spatiotemporal Residual Attentive Networks

This paper proposes a novel residual attentive learning network architecture for predicting dynamic eye-fixation maps. The proposed model emphasizes two essential issues, i.e, effective spatiotemporal feature integration and multi-scale saliency learning. For the first problem, appearance and motion...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - (2019) vom: 23. Aug.
1. Verfasser: Lai, Qiuxia (VerfasserIn)
Weitere Verfasser: Wang, Wenguan, Sun, Hanqiu, Shen, Jianbing
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2019
Zugriff auf das übergeordnete Werk:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Schlagworte:Journal Article
LEADER 01000caa a22002652 4500
001 NLM300586876
003 DE-627
005 20240229162313.0
007 cr uuu---uuuuu
008 231225s2019 xx |||||o 00| ||eng c
024 7 |a 10.1109/TIP.2019.2936112  |2 doi 
028 5 2 |a pubmed24n1308.xml 
035 |a (DE-627)NLM300586876 
035 |a (NLM)31449021 
040 |a DE-627  |b ger  |c DE-627  |e rakwb 
041 |a eng 
100 1 |a Lai, Qiuxia  |e verfasserin  |4 aut 
245 1 0 |a Video Saliency Prediction using Spatiotemporal Residual Attentive Networks 
264 1 |c 2019 
336 |a Text  |b txt  |2 rdacontent 
337 |a ƒaComputermedien  |b c  |2 rdamedia 
338 |a ƒa Online-Ressource  |b cr  |2 rdacarrier 
500 |a Date Revised 27.02.2024 
500 |a published: Print-Electronic 
500 |a Citation Status Publisher 
520 |a This paper proposes a novel residual attentive learning network architecture for predicting dynamic eye-fixation maps. The proposed model emphasizes two essential issues, i.e, effective spatiotemporal feature integration and multi-scale saliency learning. For the first problem, appearance and motion streams are tightly coupled via dense residual cross connections, which integrate appearance information with multi-layer, comprehensive motion features in a residual and dense way. Beyond traditional two-stream models learning appearance and motion features separately, such design allows early, multi-path information exchange between different domains, leading to a unified and powerful spatiotemporal learning architecture. For the second one, we propose a composite attention mechanism that learns multi-scale local attentions and global attention priors end-to-end. It is used for enhancing the fused spatiotemporal features via emphasizing important features in multi-scales. A lightweight convolutional Gated Recurrent Unit (convGRU), which is flexible for small training data situation, is used for long-term temporal characteristics modeling. Extensive experiments over four benchmark datasets clearly demonstrate the advantage of the proposed video saliency model over other competitors and the effectiveness of each component of our network. Our code and all the results will be available at https://github.com/ashleylqx/STRA-Net 
650 4 |a Journal Article 
700 1 |a Wang, Wenguan  |e verfasserin  |4 aut 
700 1 |a Sun, Hanqiu  |e verfasserin  |4 aut 
700 1 |a Shen, Jianbing  |e verfasserin  |4 aut 
773 0 8 |i Enthalten in  |t IEEE transactions on image processing : a publication of the IEEE Signal Processing Society  |d 1992  |g (2019) vom: 23. Aug.  |w (DE-627)NLM09821456X  |x 1941-0042  |7 nnns 
773 1 8 |g year:2019  |g day:23  |g month:08 
856 4 0 |u http://dx.doi.org/10.1109/TIP.2019.2936112  |3 Volltext 
912 |a GBV_USEFLAG_A 
912 |a SYSFLAG_A 
912 |a GBV_NLM 
912 |a GBV_ILN_350 
951 |a AR 
952 |j 2019  |b 23  |c 08