Deep Visual-Semantic Alignments for Generating Image Descriptions

We present a model that generates natural language descriptions of images and their regions. Our approach leverages datasets of images and their sentence descriptions to learn about the inter-modal correspondences between language and visual data. Our alignment model is based on a novel combination...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on pattern analysis and machine intelligence. - 1979. - 39(2017), 4 vom: 11. Apr., Seite 664-676
1. Verfasser: Karpathy, Andrej (VerfasserIn)
Weitere Verfasser: Fei-Fei, Li
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2017
Zugriff auf das übergeordnete Werk:IEEE transactions on pattern analysis and machine intelligence
Schlagworte:Journal Article Research Support, U.S. Gov't, Non-P.H.S.