The Safety Illusion? Testing the Boundaries of Concept Removal in Diffusion Models

Text-to-image diffusion models are capable of producing high-quality images from textual descriptions; however, they present notable security concerns. These include the potential for generating Not-Safe-For-Work (NSFW) content, replicating artists' styles without authorization, or creating dee...

Description complète

Détails bibliographiques
Publié dans:	IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - PP(2025) vom: 17. Okt.
Auteur principal:	Pan, Yixiang (Auteur)
Autres auteurs:	Luo, Ting, Li, Yufeng, Xing, Wenpeng, Chen, Minjie, Han, Meng
Format:	Article en ligne
Langue:	English
Publié:	2025
Accès à la collection:	IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Sujets:	Journal Article


LEADER	01000naa a22002652c 4500
001	NLM394217152
003	DE-627
005	20251018232425.0
007	cr uuu---uuuuu
008	251018s2025 xx \|\|\|\|\|o 00\| \|\|eng c
024	7		\|a 10.1109/TIP.2025.3620665 \|2 doi
028	5	2	\|a pubmed25n1603.xml
035			\|a (DE-627)NLM394217152
035			\|a (NLM)41105541
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
100	1		\|a Pan, Yixiang \|e verfasserin \|4 aut
245	1	4	\|a The Safety Illusion? Testing the Boundaries of Concept Removal in Diffusion Models
264		1	\|c 2025
336			\|a Text \|b txt \|2 rdacontent
337			\|a ƒaComputermedien \|b c \|2 rdamedia
338			\|a ƒa Online-Ressource \|b cr \|2 rdacarrier
500			\|a Date Revised 17.10.2025
500			\|a published: Print-Electronic
500			\|a Citation Status Publisher
520			\|a Text-to-image diffusion models are capable of producing high-quality images from textual descriptions; however, they present notable security concerns. These include the potential for generating Not-Safe-For-Work (NSFW) content, replicating artists' styles without authorization, or creating deepfakes. Recent advancements have proposed concept erasure techniques to eliminate sensitive concepts from these models, aiming to mitigate the generation of undesirable content. Nevertheless, the robustness of these techniques against a wide range of adversarial inputs has not been comprehensively investigated. To address this challenge, a novel two-stage optimization attack framework based on adversarial perturbations, referred to as Concept Embedding Adversary (CEA), was proposed in the present study. By leveraging the cross-modal alignment priors of the CLIP model, CEA iteratively adjusts adversarial embedding vectors to approximate the semantic expression of specific target concepts. This process enables the construction of deceptive adversarial prompts that exploit diffusion models, compelling them to regenerate previously erased concepts. The performance of concept erasure methods was evaluated, specifically when dealing with diversified adversarial prompts targeting erased concepts, such as NSFW content, artistic styles, and objects. Extensive experimental results demonstrate that existing concept erasure methods are unable to completely eliminate target concepts. In contrast, the proposed CEA framework exploits residual vulnerabilities within the generative latent space through a two-stage optimization process. By achieving precise cross-modal alignment, CEA attains significantly higher ASR in regenerating erased concepts
650		4	\|a Journal Article
700	1		\|a Luo, Ting \|e verfasserin \|4 aut
700	1		\|a Li, Yufeng \|e verfasserin \|4 aut
700	1		\|a Xing, Wenpeng \|e verfasserin \|4 aut
700	1		\|a Chen, Minjie \|e verfasserin \|4 aut
700	1		\|a Han, Meng \|e verfasserin \|4 aut
773	0	8	\|i Enthalten in \|t IEEE transactions on image processing : a publication of the IEEE Signal Processing Society \|d 1992 \|g PP(2025) vom: 17. Okt. \|w (DE-627)NLM09821456X \|x 1941-0042 \|7 nnas
773	1	8	\|g volume:PP \|g year:2025 \|g day:17 \|g month:10
856	4	0	\|u http://dx.doi.org/10.1109/TIP.2025.3620665 \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a SYSFLAG_A
912			\|a GBV_NLM
912			\|a GBV_ILN_350
951			\|a AR
952			\|d PP \|j 2025 \|b 17 \|c 10