Fs-DSM : Few-Shot Diagram-Sentence Matching via Cross-Modal Attention Graph Model

Diagram-sentence matching is a valuable academic research because it can help learners effectively understand the diagrams with the assisted by sentences. However, there are many uncommon objects, i.e. few-shot contents in diagrams and sentences. The existing methods for image-sentence matching have...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - 30(2021) vom: 23., Seite 8102-8115
1. Verfasser: Hu, Xin (VerfasserIn)
Weitere Verfasser: Zhang, Lingling, Liu, Jun, Zheng, Qinghua, Zhou, Jianlong
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2021
Zugriff auf das übergeordnete Werk:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Schlagworte:Journal Article
LEADER 01000naa a22002652 4500
001 NLM330966464
003 DE-627
005 20231225212430.0
007 cr uuu---uuuuu
008 231225s2021 xx |||||o 00| ||eng c
024 7 |a 10.1109/TIP.2021.3112294  |2 doi 
028 5 2 |a pubmed24n1103.xml 
035 |a (DE-627)NLM330966464 
035 |a (NLM)34554913 
040 |a DE-627  |b ger  |c DE-627  |e rakwb 
041 |a eng 
100 1 |a Hu, Xin  |e verfasserin  |4 aut 
245 1 0 |a Fs-DSM  |b Few-Shot Diagram-Sentence Matching via Cross-Modal Attention Graph Model 
264 1 |c 2021 
336 |a Text  |b txt  |2 rdacontent 
337 |a ƒaComputermedien  |b c  |2 rdamedia 
338 |a ƒa Online-Ressource  |b cr  |2 rdacarrier 
500 |a Date Revised 29.09.2021 
500 |a published: Print-Electronic 
500 |a Citation Status PubMed-not-MEDLINE 
520 |a Diagram-sentence matching is a valuable academic research because it can help learners effectively understand the diagrams with the assisted by sentences. However, there are many uncommon objects, i.e. few-shot contents in diagrams and sentences. The existing methods for image-sentence matching have great limitations when applied to diagrams. Because they focus on the high-frequency objects during training and ignore the uncommon objects. In addition, the specialty leads to the semantic non-intuition of the diagram itself. In this work, we propose a cross-modal attention graph model for the few-shot diagram-sentence matching task named Fs-DSM. Specifically, it is composed of three modules. The graph initialization module regards the region-level diagram features and word-level sentence features as the nodes of Fs-DSM, and edges are represented as similarity between nodes. The information propagation module is a key point of Fs-DSM, in which the few-shot contents are recognized by an uncommon object recognition strategy, and then the nodes are updated by a neighborhood aggregation procedure with cross-modal propagation between all visual and textual nodes, while the edges are recomputed based on the new node features. The global association module integrates the features of regions and words to represent the global diagrams and sentences. By conducting comprehensive experiments in terms of few-shot and conventional image-sentence matching, we demonstrate that Fs-DSM achieves superior performances over the competitors on the AI2D [Formula: see text] diagram dataset and two public benchmark datasets with nature images 
650 4 |a Journal Article 
700 1 |a Zhang, Lingling  |e verfasserin  |4 aut 
700 1 |a Liu, Jun  |e verfasserin  |4 aut 
700 1 |a Zheng, Qinghua  |e verfasserin  |4 aut 
700 1 |a Zhou, Jianlong  |e verfasserin  |4 aut 
773 0 8 |i Enthalten in  |t IEEE transactions on image processing : a publication of the IEEE Signal Processing Society  |d 1992  |g 30(2021) vom: 23., Seite 8102-8115  |w (DE-627)NLM09821456X  |x 1941-0042  |7 nnns 
773 1 8 |g volume:30  |g year:2021  |g day:23  |g pages:8102-8115 
856 4 0 |u http://dx.doi.org/10.1109/TIP.2021.3112294  |3 Volltext 
912 |a GBV_USEFLAG_A 
912 |a SYSFLAG_A 
912 |a GBV_NLM 
912 |a GBV_ILN_350 
951 |a AR 
952 |d 30  |j 2021  |b 23  |h 8102-8115