Alignment Relation is What You Need for Diagram Parsing

As a knowledge carrier, the diagram is widely distributed in many aspects of human life, such as textbooks, architectural drawings, and documents. Different from natural images, representations of visual elements in the diagram are sparser, and similar visual representations can reflect dissimilar s...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - 33(2024) vom: 18., Seite 2131-2144
1. Verfasser:	Zhang, Xinyu (VerfasserIn)
Weitere Verfasser:	Zhang, Lingling, Hu, Xin, Liu, Jun, Wang, Shaowei, Wang, Qianying
Format:	Online-Aufsatz
Sprache:	English
Veröffentlicht:	2024
Zugriff auf das übergeordnete Werk:	IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Schlagworte:	Journal Article


LEADER	01000caa a22002652 4500
001	NLM369681568
003	DE-627
005	20240319233016.0
007	cr uuu---uuuuu
008	240315s2024 xx \|\|\|\|\|o 00\| \|\|eng c
024	7		\|a 10.1109/TIP.2024.3374511 \|2 doi
028	5	2	\|a pubmed24n1336.xml
035			\|a (DE-627)NLM369681568
035			\|a (NLM)38478439
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
100	1		\|a Zhang, Xinyu \|e verfasserin \|4 aut
245	1	0	\|a Alignment Relation is What You Need for Diagram Parsing
264		1	\|c 2024
336			\|a Text \|b txt \|2 rdacontent
337			\|a ƒaComputermedien \|b c \|2 rdamedia
338			\|a ƒa Online-Ressource \|b cr \|2 rdacarrier
500			\|a Date Revised 18.03.2024
500			\|a published: Print-Electronic
500			\|a Citation Status PubMed-not-MEDLINE
520			\|a As a knowledge carrier, the diagram is widely distributed in many aspects of human life, such as textbooks, architectural drawings, and documents. Different from natural images, representations of visual elements in the diagram are sparser, and similar visual representations can reflect dissimilar semantics. Thus, current methods fail to capture the visual elements with precise semantics. To address this issue, regarding the aligned visual and textual elements as pairs is the way to assign the precise semantics of textual elements to visual elements. We build the first diagram dataset named align diagram element (ADE), which includes annotations for alignment relations between visual and textual elements. And we propose a visual-textual alignment model (VTAM) including graph construction and optimal aligning phases. In the graph construction phase, the relational graphs are constructed between different elements with four relational operators. The relational operators are designed to measure the relations between different elements, according to distance, connection line, inclusion, and feature similarity. In the optimal aligning phase, the representation at each visual-textual pair is improved as a weighted sum of the representations on all relational graphs. Experimental results show that our VTAM achieves a significant improvement of 10.9% on mean test folds of the ADE dataset than the current best competitor. In order to explore the role of alignment relations in diagram parsing, we introduce VTAM to diagram-related tasks, such as diagram question answering (DQA). And we achieve 2.8% to 5.9% and 4.6% to 5.1% improvements on AI2D and Foodwebs after adding VTAM. Our dataset and code are released at: https://github.com/ADE-dataset/ADE-dataset
650		4	\|a Journal Article
700	1		\|a Zhang, Lingling \|e verfasserin \|4 aut
700	1		\|a Hu, Xin \|e verfasserin \|4 aut
700	1		\|a Liu, Jun \|e verfasserin \|4 aut
700	1		\|a Wang, Shaowei \|e verfasserin \|4 aut
700	1		\|a Wang, Qianying \|e verfasserin \|4 aut
773	0	8	\|i Enthalten in \|t IEEE transactions on image processing : a publication of the IEEE Signal Processing Society \|d 1992 \|g 33(2024) vom: 18., Seite 2131-2144 \|w (DE-627)NLM09821456X \|x 1941-0042 \|7 nnns
773	1	8	\|g volume:33 \|g year:2024 \|g day:18 \|g pages:2131-2144
856	4	0	\|u http://dx.doi.org/10.1109/TIP.2024.3374511 \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a SYSFLAG_A
912			\|a GBV_NLM
912			\|a GBV_ILN_350
951			\|a AR
952			\|d 33 \|j 2024 \|b 18 \|h 2131-2144