Fine-Grained Recognition With Learnable Semantic Data Augmentation

Fine-grained image recognition is a longstanding computer vision challenge that focuses on differentiating objects belonging to multiple subordinate categories within the same meta-category. Since images belonging to the same meta-category usually share similar visual appearances, mining discriminat...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - 33(2024) vom: 25., Seite 3130-3144
1. Verfasser:	Pu, Yifan (VerfasserIn)
Weitere Verfasser:	Han, Yizeng, Wang, Yulin, Feng, Junlan, Deng, Chao, Huang, Gao
Format:	Online-Aufsatz
Sprache:	English
Veröffentlicht:	2024
Zugriff auf das übergeordnete Werk:	IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Schlagworte:	Journal Article


LEADER	01000caa a22002652c 4500
001	NLM371514096
003	DE-627
005	20250306032416.0
007	cr uuu---uuuuu
008	240426s2024 xx \|\|\|\|\|o 00\| \|\|eng c
024	7		\|a 10.1109/TIP.2024.3364500 \|2 doi
028	5	2	\|a pubmed25n1237.xml
035			\|a (DE-627)NLM371514096
035			\|a (NLM)38662557
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
100	1		\|a Pu, Yifan \|e verfasserin \|4 aut
245	1	0	\|a Fine-Grained Recognition With Learnable Semantic Data Augmentation
264		1	\|c 2024
336			\|a Text \|b txt \|2 rdacontent
337			\|a ƒaComputermedien \|b c \|2 rdamedia
338			\|a ƒa Online-Ressource \|b cr \|2 rdacarrier
500			\|a Date Revised 01.05.2024
500			\|a published: Print-Electronic
500			\|a Citation Status PubMed-not-MEDLINE
520			\|a Fine-grained image recognition is a longstanding computer vision challenge that focuses on differentiating objects belonging to multiple subordinate categories within the same meta-category. Since images belonging to the same meta-category usually share similar visual appearances, mining discriminative visual cues is the key to distinguishing fine-grained categories. Although commonly used image-level data augmentation techniques have achieved great success in generic image classification problems, they are rarely applied in fine-grained scenarios, because their random editing-region behavior is prone to destroy the discriminative visual cues residing in the subtle regions. In this paper, we propose diversifying the training data at the feature-level to alleviate the discriminative region loss problem. Specifically, we produce diversified augmented samples by translating image features along semantically meaningful directions. The semantic directions are estimated with a covariance prediction network, which predicts a sample-wise covariance matrix to adapt to the large intra-class variation inherent in fine-grained images. Furthermore, the covariance prediction network is jointly optimized with the classification network in a meta-learning manner to alleviate the degenerate solution problem. Experiments on four competitive fine-grained recognition benchmarks (CUB-200-2011, Stanford Cars, FGVC Aircrafts, NABirds) demonstrate that our method significantly improves the generalization performance on several popular classification networks (e.g., ResNets, DenseNets, EfficientNets, RegNets and ViT). Combined with a recently proposed method, our semantic data augmentation approach achieves state-of-the-art performance on the CUB-200-2011 dataset. Source code is available at https://github.com/LeapLabTHU/LearnableISDA
650		4	\|a Journal Article
700	1		\|a Han, Yizeng \|e verfasserin \|4 aut
700	1		\|a Wang, Yulin \|e verfasserin \|4 aut
700	1		\|a Feng, Junlan \|e verfasserin \|4 aut
700	1		\|a Deng, Chao \|e verfasserin \|4 aut
700	1		\|a Huang, Gao \|e verfasserin \|4 aut
773	0	8	\|i Enthalten in \|t IEEE transactions on image processing : a publication of the IEEE Signal Processing Society \|d 1992 \|g 33(2024) vom: 25., Seite 3130-3144 \|w (DE-627)NLM09821456X \|x 1941-0042 \|7 nnas
773	1	8	\|g volume:33 \|g year:2024 \|g day:25 \|g pages:3130-3144
856	4	0	\|u http://dx.doi.org/10.1109/TIP.2024.3364500 \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a SYSFLAG_A
912			\|a GBV_NLM
912			\|a GBV_ILN_350
951			\|a AR
952			\|d 33 \|j 2024 \|b 25 \|h 3130-3144