LLaFS++ : Few-Shot Image Segmentation With Large Language Models

Despite the rapid advancements in few-shot segmentation (FSS), most of existing methods in this domain are hampered by their reliance on the limited and biased information from only a small number of labeled samples. This limitation inherently restricts their capability to achieve sufficiently high...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on pattern analysis and machine intelligence. - 1979. - 47(2025), 9 vom: 26. Aug., Seite 7715-7732
1. Verfasser:	Zhu, Lanyun (VerfasserIn)
Weitere Verfasser:	Chen, Tianrun, Ji, Deyi, Xu, Peng, Ye, Jieping, Liu, Jun
Format:	Online-Aufsatz
Sprache:	English
Veröffentlicht:	2025
Zugriff auf das übergeordnete Werk:	IEEE transactions on pattern analysis and machine intelligence
Schlagworte:	Journal Article


LEADER	01000caa a22002652c 4500
001	NLM389036293
003	DE-627
005	20250807232043.0
007	cr uuu---uuuuu
008	250714s2025 xx \|\|\|\|\|o 00\| \|\|eng c
024	7		\|a 10.1109/TPAMI.2025.3573609 \|2 doi
028	5	2	\|a pubmed25n1523.xml
035			\|a (DE-627)NLM389036293
035			\|a (NLM)40418603
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
100	1		\|a Zhu, Lanyun \|e verfasserin \|4 aut
245	1	0	\|a LLaFS++ \|b Few-Shot Image Segmentation With Large Language Models
264		1	\|c 2025
336			\|a Text \|b txt \|2 rdacontent
337			\|a ƒaComputermedien \|b c \|2 rdamedia
338			\|a ƒa Online-Ressource \|b cr \|2 rdacarrier
500			\|a Date Revised 07.08.2025
500			\|a published: Print
500			\|a Citation Status PubMed-not-MEDLINE
520			\|a Despite the rapid advancements in few-shot segmentation (FSS), most of existing methods in this domain are hampered by their reliance on the limited and biased information from only a small number of labeled samples. This limitation inherently restricts their capability to achieve sufficiently high levels of performance. To address this issue, this paper proposes a pioneering framework named LLaFS++, which, for the first time, applies large language models (LLMs) into FSS and achieves notable success. LLaFS++ leverages the extensive prior knowledge embedded by LLMs to guide the segmentation process, effectively compensating for the limited information contained in the few-shot labeled samples and thereby achieving superior results. To enhance the effectiveness of the text-based LLMs in FSS scenarios, we present several innovative and task-specific designs within the LLaFS++ framework. Specifically, we introduce an input instruction that allows the LLM to directly produce segmentation results represented as polygons, and propose a region-attribute corresponding table to simulate the human visual system and provide multi-modal guidance. We also synthesize pseudo samples and use curriculum learning for pretraining to augment data and achieve better optimization, and propose a novel inference method to mitigate potential oversegmentation hallucinations caused by the regional guidance information. Incorporating these designs, LLaFS++ constitutes an effective framework that achieves state-of-the-art results on multiple datasets including PASCAL-$5^{i}$5i, COCO-$20^{i}$20i, and FSS-1000. Our superior performance showcases the remarkable potential of applying LLMs to process few-shot vision tasks
650		4	\|a Journal Article
700	1		\|a Chen, Tianrun \|e verfasserin \|4 aut
700	1		\|a Ji, Deyi \|e verfasserin \|4 aut
700	1		\|a Xu, Peng \|e verfasserin \|4 aut
700	1		\|a Ye, Jieping \|e verfasserin \|4 aut
700	1		\|a Liu, Jun \|e verfasserin \|4 aut
773	0	8	\|i Enthalten in \|t IEEE transactions on pattern analysis and machine intelligence \|d 1979 \|g 47(2025), 9 vom: 26. Aug., Seite 7715-7732 \|w (DE-627)NLM098212257 \|x 1939-3539 \|7 nnas
773	1	8	\|g volume:47 \|g year:2025 \|g number:9 \|g day:26 \|g month:08 \|g pages:7715-7732
856	4	0	\|u http://dx.doi.org/10.1109/TPAMI.2025.3573609 \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a SYSFLAG_A
912			\|a GBV_NLM
912			\|a GBV_ILN_350
951			\|a AR
952			\|d 47 \|j 2025 \|e 9 \|b 26 \|c 08 \|h 7715-7732