Detecting and Grounding Multi-Modal Media Manipulation and Beyond

Misinformation has become a pressing issue. Fake media, in both visual and textual forms, is widespread on the web. While various DeepFake detection and text fake news detection methods have been proposed, they are only designed for single-modality forgery based on binary classification, let alone a...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on pattern analysis and machine intelligence. - 1979. - 46(2024), 8 vom: 31. Juli, Seite 5556-5574
1. Verfasser: Shao, Rui (VerfasserIn)
Weitere Verfasser: Wu, Tianxing, Wu, Jianlong, Nie, Liqiang, Liu, Ziwei
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2024
Zugriff auf das übergeordnete Werk:IEEE transactions on pattern analysis and machine intelligence
Schlagworte:Journal Article
LEADER 01000caa a22002652 4500
001 NLM368670171
003 DE-627
005 20240703234502.0
007 cr uuu---uuuuu
008 240222s2024 xx |||||o 00| ||eng c
024 7 |a 10.1109/TPAMI.2024.3367749  |2 doi 
028 5 2 |a pubmed24n1459.xml 
035 |a (DE-627)NLM368670171 
035 |a (NLM)38376967 
040 |a DE-627  |b ger  |c DE-627  |e rakwb 
041 |a eng 
100 1 |a Shao, Rui  |e verfasserin  |4 aut 
245 1 0 |a Detecting and Grounding Multi-Modal Media Manipulation and Beyond 
264 1 |c 2024 
336 |a Text  |b txt  |2 rdacontent 
337 |a ƒaComputermedien  |b c  |2 rdamedia 
338 |a ƒa Online-Ressource  |b cr  |2 rdacarrier 
500 |a Date Revised 03.07.2024 
500 |a published: Print-Electronic 
500 |a Citation Status PubMed-not-MEDLINE 
520 |a Misinformation has become a pressing issue. Fake media, in both visual and textual forms, is widespread on the web. While various DeepFake detection and text fake news detection methods have been proposed, they are only designed for single-modality forgery based on binary classification, let alone analyzing and reasoning subtle forgery traces across different modalities. In this paper, we highlight a new research problem for multi-modal fake media, namely Detecting and Grounding Multi-Modal Media Manipulation (DGM 4). DGM 4 aims to not only detect the authenticity of multi-modal media, but also ground the manipulated content (i.e., image bounding boxes and text tokens), which requires deeper reasoning of multi-modal media manipulation. To support a large-scale investigation, we construct the first DGM 4 dataset, where image-text pairs are manipulated by various approaches, with rich annotation of diverse manipulations. Moreover, we propose a novel HierArchical Multi-modal Manipulation rEasoning tRansformer (HAMMER) to fully capture the fine-grained interaction between different modalities. HAMMER performs: 1) manipulation-aware contrastive learning between two uni-modal encoders as shallow manipulation reasoning and 2) modality-aware cross-attention by multi-modal aggregator as deep manipulation reasoning. Dedicated manipulation detection and grounding heads are integrated from shallow to deep levels based on the interacted multi-modal information. To exploit more fine-grained contrastive learning for cross-modal semantic alignment, we further integrate Manipulation-Aware Contrastive Loss with Local View and construct a more advanced model HAMMER++. Finally, we build an extensive benchmark and set up rigorous evaluation metrics for this new research problem. Comprehensive experiments demonstrate the superiority of HAMMER and HAMMER++; several valuable observations are also revealed to facilitate future research in multi-modal media manipulation 
650 4 |a Journal Article 
700 1 |a Wu, Tianxing  |e verfasserin  |4 aut 
700 1 |a Wu, Jianlong  |e verfasserin  |4 aut 
700 1 |a Nie, Liqiang  |e verfasserin  |4 aut 
700 1 |a Liu, Ziwei  |e verfasserin  |4 aut 
773 0 8 |i Enthalten in  |t IEEE transactions on pattern analysis and machine intelligence  |d 1979  |g 46(2024), 8 vom: 31. Juli, Seite 5556-5574  |w (DE-627)NLM098212257  |x 1939-3539  |7 nnns 
773 1 8 |g volume:46  |g year:2024  |g number:8  |g day:31  |g month:07  |g pages:5556-5574 
856 4 0 |u http://dx.doi.org/10.1109/TPAMI.2024.3367749  |3 Volltext 
912 |a GBV_USEFLAG_A 
912 |a SYSFLAG_A 
912 |a GBV_NLM 
912 |a GBV_ILN_350 
951 |a AR 
952 |d 46  |j 2024  |e 8  |b 31  |c 07  |h 5556-5574