Embedding Perspective Analysis Into Multi-Column Convolutional Neural Network for Crowd Counting

The crowd counting is challenging for deep networks due to several factors. For instance, the networks can not efficiently analyze the perspective information of arbitrary scenes, and they are naturally inefficient to handle the scale variations. In this work, we deliver a simple yet efficient multi...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - 30(2021) vom: 14., Seite 1395-1407
1. Verfasser: Yang, Yifan (VerfasserIn)
Weitere Verfasser: Li, Guorong, Du, Dawei, Huang, Qingming, Sebe, Nicu
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2021
Zugriff auf das übergeordnete Werk:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Schlagworte:Journal Article
LEADER 01000naa a22002652 4500
001 NLM318808064
003 DE-627
005 20231225170130.0
007 cr uuu---uuuuu
008 231225s2021 xx |||||o 00| ||eng c
024 7 |a 10.1109/TIP.2020.3043122  |2 doi 
028 5 2 |a pubmed24n1062.xml 
035 |a (DE-627)NLM318808064 
035 |a (NLM)33315562 
040 |a DE-627  |b ger  |c DE-627  |e rakwb 
041 |a eng 
100 1 |a Yang, Yifan  |e verfasserin  |4 aut 
245 1 0 |a Embedding Perspective Analysis Into Multi-Column Convolutional Neural Network for Crowd Counting 
264 1 |c 2021 
336 |a Text  |b txt  |2 rdacontent 
337 |a ƒaComputermedien  |b c  |2 rdamedia 
338 |a ƒa Online-Ressource  |b cr  |2 rdacarrier 
500 |a Date Revised 30.12.2020 
500 |a published: Print-Electronic 
500 |a Citation Status PubMed-not-MEDLINE 
520 |a The crowd counting is challenging for deep networks due to several factors. For instance, the networks can not efficiently analyze the perspective information of arbitrary scenes, and they are naturally inefficient to handle the scale variations. In this work, we deliver a simple yet efficient multi-column network, which integrates the perspective analysis method with the counting network. The proposed method explicitly excavates the perspective information and drives the counting network to analyze the scenes. More concretely, we explore the perspective information from the estimated density maps and quantify the perspective space into several separate scenes. We then embed the perspective analysis into the multi-column framework with a recurrent connection. Therefore, the proposed network matches various scales with the different receptive fields efficiently. Secondly, we share the parameters of the branches with various receptive fields. This strategy drives the convolutional kernels to be sensitive to the instances with various scales. Furthermore, to improve the evaluation accuracy of the column with a large receptive field, we propose a transform dilated convolution. The transform dilated convolution breaks the fixed sampling structure of the deep network. Moreover, it needs no extra parameters and training, and the offsets are constrained in a local region, which is designed for the congested scenes. The proposed method achieves state-of-the-art performance on five datasets (ShanghaiTech, UCF CC 50, WorldEXPO'10, UCSD, and TRANCOS) 
650 4 |a Journal Article 
700 1 |a Li, Guorong  |e verfasserin  |4 aut 
700 1 |a Du, Dawei  |e verfasserin  |4 aut 
700 1 |a Huang, Qingming  |e verfasserin  |4 aut 
700 1 |a Sebe, Nicu  |e verfasserin  |4 aut 
773 0 8 |i Enthalten in  |t IEEE transactions on image processing : a publication of the IEEE Signal Processing Society  |d 1992  |g 30(2021) vom: 14., Seite 1395-1407  |w (DE-627)NLM09821456X  |x 1941-0042  |7 nnns 
773 1 8 |g volume:30  |g year:2021  |g day:14  |g pages:1395-1407 
856 4 0 |u http://dx.doi.org/10.1109/TIP.2020.3043122  |3 Volltext 
912 |a GBV_USEFLAG_A 
912 |a SYSFLAG_A 
912 |a GBV_NLM 
912 |a GBV_ILN_350 
951 |a AR 
952 |d 30  |j 2021  |b 14  |h 1395-1407