Visual Dialog

We introduce the task of Visual Dialog, which requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Specifically, given an image, a dialog history, and a question about the image, the agent has to ground the question in image, infer co...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on pattern analysis and machine intelligence. - 1979. - (2018) vom: 19. Apr.
1. Verfasser:	Das, Abhishek (VerfasserIn)
Weitere Verfasser:	Kottur, Satwik, Gupta, Khushi, Singh, Avi, Yadav, Deshraj, Lee, Stefan, Moura, Jose, Parikh, Devi, Batra, Dhruv
Format:	Online-Aufsatz
Sprache:	English
Veröffentlicht:	2018
Zugriff auf das übergeordnete Werk:	IEEE transactions on pattern analysis and machine intelligence
Schlagworte:	Journal Article

Beschreibung
Zusammenfassung:	We introduce the task of Visual Dialog, which requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Specifically, given an image, a dialog history, and a question about the image, the agent has to ground the question in image, infer context from history, and answer the question accurately. Visual Dialog is disentangled enough from a specific downstream task so as to serve as a general test of machine intelligence, while being sufficiently grounded in vision to allow objective evaluation of individual responses and benchmark progress. We develop a novel two-person real-time chat data-collection protocol to curate a large-scale Visual Dialog dataset (VisDial). VisDial v0.9 has been released and consists of dialog question-answer pairs from 10-round, human-human dialogs grounded in images from the COCO dataset
Beschreibung:	Date Revised 27.02.2024 published: Print-Electronic Citation Status Publisher
ISSN:	1939-3539
DOI:	10.1109/TPAMI.2018.2828437