Learning multimodal dictionaries

Real-world phenomena involve complex interactions between multiple signal modalities. As a consequence, humans are used to integrate at each instant perceptions from all their senses in order to enrich their understanding of the surrounding world. This paradigm can be also extremely useful in many s...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1997. - 16(2007), 9 vom: 11. Sept., Seite 2272-83
1. Verfasser:	Monaci, Gianluca (VerfasserIn)
Weitere Verfasser:	Jost, Philippe, Vandergheynst, Pierre, Mailhé, Boris, Lesage, Sylvain, Gribonval, Rémi
Format:	Aufsatz
Sprache:	English
Veröffentlicht:	2007
Zugriff auf das übergeordnete Werk:	IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Schlagworte:	Journal Article Research Support, Non-U.S. Gov't


LEADER	01000caa a22002652 4500
001	NLM172953065
003	DE-627
005	20250208153751.0
007	tu
008	231223s2007 xx \|\|\|\|\| 00\| \|\|eng c
028	5	2	\|a pubmed25n0577.xml
035			\|a (DE-627)NLM172953065
035			\|a (NLM)17784601
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
100	1		\|a Monaci, Gianluca \|e verfasserin \|4 aut
245	1	0	\|a Learning multimodal dictionaries
264		1	\|c 2007
336			\|a Text \|b txt \|2 rdacontent
337			\|a ohne Hilfsmittel zu benutzen \|b n \|2 rdamedia
338			\|a Band \|b nc \|2 rdacarrier
500			\|a Date Completed 31.12.2007
500			\|a Date Revised 26.10.2019
500			\|a published: Print
500			\|a Citation Status MEDLINE
520			\|a Real-world phenomena involve complex interactions between multiple signal modalities. As a consequence, humans are used to integrate at each instant perceptions from all their senses in order to enrich their understanding of the surrounding world. This paradigm can be also extremely useful in many signal processing and computer vision problems involving mutually related signals. The simultaneous processing of multimodal data can, in fact, reveal information that is otherwise hidden when considering the signals independently. However, in natural multimodal signals, the statistical dependencies between modalities are in general not obvious. Learning fundamental multimodal patterns could offer deep insight into the structure of such signals. In this paper, we present a novel model of multimodal signals based on their sparse decomposition over a dictionary of multimodal structures. An algorithm for iteratively learning multimodal generating functions that can be shifted at all positions in the signal is proposed, as well. The learning is defined in such a way that it can be accomplished by iteratively solving a generalized eigenvector problem, which makes the algorithm fast, flexible, and free of user-defined parameters. The proposed algorithm is applied to audiovisual sequences and it is able to discover underlying structures in the data. The detection of such audio-video patterns in audiovisual clips allows to effectively localize the sound source on the video in presence of substantial acoustic and visual distractors, outperforming state-of-the-art audiovisual localization algorithms
650		4	\|a Journal Article
650		4	\|a Research Support, Non-U.S. Gov't
700	1		\|a Jost, Philippe \|e verfasserin \|4 aut
700	1		\|a Vandergheynst, Pierre \|e verfasserin \|4 aut
700	1		\|a Mailhé, Boris \|e verfasserin \|4 aut
700	1		\|a Lesage, Sylvain \|e verfasserin \|4 aut
700	1		\|a Gribonval, Rémi \|e verfasserin \|4 aut
773	0	8	\|i Enthalten in \|t IEEE transactions on image processing : a publication of the IEEE Signal Processing Society \|d 1997 \|g 16(2007), 9 vom: 11. Sept., Seite 2272-83 \|w (DE-627)NLM09821456X \|x 1941-0042 \|7 nnns
773	1	8	\|g volume:16 \|g year:2007 \|g number:9 \|g day:11 \|g month:09 \|g pages:2272-83
912			\|a GBV_USEFLAG_A
912			\|a SYSFLAG_A
912			\|a GBV_NLM
912			\|a GBV_ILN_350
951			\|a AR
952			\|d 16 \|j 2007 \|e 9 \|b 11 \|c 09 \|h 2272-83