LLaFS++ : Few-Shot Image Segmentation With Large Language Models

Despite the rapid advancements in few-shot segmentation (FSS), most of existing methods in this domain are hampered by their reliance on the limited and biased information from only a small number of labeled samples. This limitation inherently restricts their capability to achieve sufficiently high...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on pattern analysis and machine intelligence. - 1979. - 47(2025), 9 vom: 26. Aug., Seite 7715-7732
1. Verfasser:	Zhu, Lanyun (VerfasserIn)
Weitere Verfasser:	Chen, Tianrun, Ji, Deyi, Xu, Peng, Ye, Jieping, Liu, Jun
Format:	Online-Aufsatz
Sprache:	English
Veröffentlicht:	2025
Zugriff auf das übergeordnete Werk:	IEEE transactions on pattern analysis and machine intelligence
Schlagworte:	Journal Article

Beschreibung
Zusammenfassung:	Despite the rapid advancements in few-shot segmentation (FSS), most of existing methods in this domain are hampered by their reliance on the limited and biased information from only a small number of labeled samples. This limitation inherently restricts their capability to achieve sufficiently high levels of performance. To address this issue, this paper proposes a pioneering framework named LLaFS++, which, for the first time, applies large language models (LLMs) into FSS and achieves notable success. LLaFS++ leverages the extensive prior knowledge embedded by LLMs to guide the segmentation process, effectively compensating for the limited information contained in the few-shot labeled samples and thereby achieving superior results. To enhance the effectiveness of the text-based LLMs in FSS scenarios, we present several innovative and task-specific designs within the LLaFS++ framework. Specifically, we introduce an input instruction that allows the LLM to directly produce segmentation results represented as polygons, and propose a region-attribute corresponding table to simulate the human visual system and provide multi-modal guidance. We also synthesize pseudo samples and use curriculum learning for pretraining to augment data and achieve better optimization, and propose a novel inference method to mitigate potential oversegmentation hallucinations caused by the regional guidance information. Incorporating these designs, LLaFS++ constitutes an effective framework that achieves state-of-the-art results on multiple datasets including PASCAL-$5^{i}$5i, COCO-$20^{i}$20i, and FSS-1000. Our superior performance showcases the remarkable potential of applying LLMs to process few-shot vision tasks
Beschreibung:	Date Revised 07.08.2025 published: Print Citation Status PubMed-not-MEDLINE
ISSN:	1939-3539
DOI:	10.1109/TPAMI.2025.3573609