Local Semantic Enhanced ConvNet for Aerial Scene Recognition

Aerial scene recognition is challenging due to the complicated object distribution and spatial arrangement in a large-scale aerial image. Recent studies attempt to explore the local semantic representation capability of deep learning models, but how to exactly perceive the key local regions remains...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - 30(2021) vom: 08., Seite 6498-6511
1. Verfasser:	Bi, Qi (VerfasserIn)
Weitere Verfasser:	Qin, Kun, Zhang, Han, Xia, Gui-Song
Format:	Online-Aufsatz
Sprache:	English
Veröffentlicht:	2021
Zugriff auf das übergeordnete Werk:	IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Schlagworte:	Journal Article


LEADER	01000naa a22002652 4500
001	NLM327829265
003	DE-627
005	20231225201640.0
007	cr uuu---uuuuu
008	231225s2021 xx \|\|\|\|\|o 00\| \|\|eng c
024	7		\|a 10.1109/TIP.2021.3092816 \|2 doi
028	5	2	\|a pubmed24n1092.xml
035			\|a (DE-627)NLM327829265
035			\|a (NLM)34236963
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
100	1		\|a Bi, Qi \|e verfasserin \|4 aut
245	1	0	\|a Local Semantic Enhanced ConvNet for Aerial Scene Recognition
264		1	\|c 2021
336			\|a Text \|b txt \|2 rdacontent
337			\|a ƒaComputermedien \|b c \|2 rdamedia
338			\|a ƒa Online-Ressource \|b cr \|2 rdacarrier
500			\|a Date Completed 19.07.2021
500			\|a Date Revised 19.07.2021
500			\|a published: Print-Electronic
500			\|a Citation Status PubMed-not-MEDLINE
520			\|a Aerial scene recognition is challenging due to the complicated object distribution and spatial arrangement in a large-scale aerial image. Recent studies attempt to explore the local semantic representation capability of deep learning models, but how to exactly perceive the key local regions remains to be handled. In this paper, we present a local semantic enhanced ConvNet (LSE-Net) for aerial scene recognition, which mimics the human visual perception of key local regions in aerial scenes, in the hope of building a discriminative local semantic representation. Our LSE-Net consists of a context enhanced convolutional feature extractor, a local semantic perception module and a classification layer. Firstly, we design a multi-scale dilated convolution operators to fuse multi-level and multi-scale convolutional features in a trainable manner in order to fully receive the local feature responses in an aerial scene. Then, these features are fed into our two-branch local semantic perception module. In this module, we design a context-aware class peak response (CACPR) measurement to precisely depict the visual impulse of key local regions and the corresponding context information. Also, a spatial attention weight matrix is extracted to describe the importance of each key local region for the aerial scene. Finally, the refined class confidence maps are fed into the classification layer. Exhaustive experiments on three aerial scene classification benchmarks indicate that our LSE-Net achieves the state-of-the-art performance, which validates the effectiveness of our local semantic perception module and CACPR measurement
650		4	\|a Journal Article
700	1		\|a Qin, Kun \|e verfasserin \|4 aut
700	1		\|a Zhang, Han \|e verfasserin \|4 aut
700	1		\|a Xia, Gui-Song \|e verfasserin \|4 aut
773	0	8	\|i Enthalten in \|t IEEE transactions on image processing : a publication of the IEEE Signal Processing Society \|d 1992 \|g 30(2021) vom: 08., Seite 6498-6511 \|w (DE-627)NLM09821456X \|x 1941-0042 \|7 nnns
773	1	8	\|g volume:30 \|g year:2021 \|g day:08 \|g pages:6498-6511
856	4	0	\|u http://dx.doi.org/10.1109/TIP.2021.3092816 \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a SYSFLAG_A
912			\|a GBV_NLM
912			\|a GBV_ILN_350
951			\|a AR
952			\|d 30 \|j 2021 \|b 08 \|h 6498-6511