Learning Multi-Modal Nonlinear Embeddings : Performance Bounds and an Algorithm

While many approaches exist in the literature to learn low-dimensional representations for data collections in multiple modalities, the generalizability of multi-modal nonlinear embeddings to previously unseen data is a rather overlooked subject. In this work, we first present a theoretical analysis...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - 30(2021) vom: 13., Seite 4384-4394
1. Verfasser: Kaya, Semih (VerfasserIn)
Weitere Verfasser: Vural, Elif
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2021
Zugriff auf das übergeordnete Werk:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Schlagworte:Journal Article
LEADER 01000naa a22002652 4500
001 NLM324032072
003 DE-627
005 20231225185443.0
007 cr uuu---uuuuu
008 231225s2021 xx |||||o 00| ||eng c
024 7 |a 10.1109/TIP.2021.3071688  |2 doi 
028 5 2 |a pubmed24n1080.xml 
035 |a (DE-627)NLM324032072 
035 |a (NLM)33848248 
040 |a DE-627  |b ger  |c DE-627  |e rakwb 
041 |a eng 
100 1 |a Kaya, Semih  |e verfasserin  |4 aut 
245 1 0 |a Learning Multi-Modal Nonlinear Embeddings  |b Performance Bounds and an Algorithm 
264 1 |c 2021 
336 |a Text  |b txt  |2 rdacontent 
337 |a ƒaComputermedien  |b c  |2 rdamedia 
338 |a ƒa Online-Ressource  |b cr  |2 rdacarrier 
500 |a Date Revised 22.04.2021 
500 |a published: Print-Electronic 
500 |a Citation Status PubMed-not-MEDLINE 
520 |a While many approaches exist in the literature to learn low-dimensional representations for data collections in multiple modalities, the generalizability of multi-modal nonlinear embeddings to previously unseen data is a rather overlooked subject. In this work, we first present a theoretical analysis of learning multi-modal nonlinear embeddings in a supervised setting. Our performance bounds indicate that for successful generalization in multi-modal classification and retrieval problems, the regularity of the interpolation functions extending the embedding to the whole data space is as important as the between-class separation and cross-modal alignment criteria. We then propose a multi-modal nonlinear representation learning algorithm that is motivated by these theoretical findings, where the embeddings of the training samples are optimized jointly with the Lipschitz regularity of the interpolators. Experimental comparison to recent multi-modal and single-modal learning algorithms suggests that the proposed method yields promising performance in multi-modal image classification and cross-modal image-text retrieval applications 
650 4 |a Journal Article 
700 1 |a Vural, Elif  |e verfasserin  |4 aut 
773 0 8 |i Enthalten in  |t IEEE transactions on image processing : a publication of the IEEE Signal Processing Society  |d 1992  |g 30(2021) vom: 13., Seite 4384-4394  |w (DE-627)NLM09821456X  |x 1941-0042  |7 nnns 
773 1 8 |g volume:30  |g year:2021  |g day:13  |g pages:4384-4394 
856 4 0 |u http://dx.doi.org/10.1109/TIP.2021.3071688  |3 Volltext 
912 |a GBV_USEFLAG_A 
912 |a SYSFLAG_A 
912 |a GBV_NLM 
912 |a GBV_ILN_350 
951 |a AR 
952 |d 30  |j 2021  |b 13  |h 4384-4394