Spiking Tucker Fusion Transformer for Audio-Visual Zero-Shot Learning

The spiking neural networks (SNNs) that efficiently encode temporal sequences have shown great potential in extracting audio-visual joint feature representations. However, coupling SNNs (binary spike sequences) with transformers (float-point sequences) to jointly explore the temporal-semantic inform...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - 33(2024) vom: 23., Seite 4840-4852
1. Verfasser: Li, Wenrui (VerfasserIn)
Weitere Verfasser: Wang, Penghong, Xiong, Ruiqin, Fan, Xiaopeng
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2024
Zugriff auf das übergeordnete Werk:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Schlagworte:Journal Article