Deep Visual-Semantic Alignments for Generating Image Descriptions

We present a model that generates natural language descriptions of images and their regions. Our approach leverages datasets of images and their sentence descriptions to learn about the inter-modal correspondences between language and visual data. Our alignment model is based on a novel combination...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on pattern analysis and machine intelligence. - 1979. - 39(2017), 4 vom: 11. Apr., Seite 664-676
1. Verfasser:	Karpathy, Andrej (VerfasserIn)
Weitere Verfasser:	Fei-Fei, Li
Format:	Online-Aufsatz
Sprache:	English
Veröffentlicht:	2017
Zugriff auf das übergeordnete Werk:	IEEE transactions on pattern analysis and machine intelligence
Schlagworte:	Journal Article Research Support, U.S. Gov't, Non-P.H.S.

Online verfügbar	Volltext