|
|
|
|
| LEADER |
01000naa a22002652c 4500 |
| 001 |
NLM394217152 |
| 003 |
DE-627 |
| 005 |
20251018232425.0 |
| 007 |
cr uuu---uuuuu |
| 008 |
251018s2025 xx |||||o 00| ||eng c |
| 024 |
7 |
|
|a 10.1109/TIP.2025.3620665
|2 doi
|
| 028 |
5 |
2 |
|a pubmed25n1603.xml
|
| 035 |
|
|
|a (DE-627)NLM394217152
|
| 035 |
|
|
|a (NLM)41105541
|
| 040 |
|
|
|a DE-627
|b ger
|c DE-627
|e rakwb
|
| 041 |
|
|
|a eng
|
| 100 |
1 |
|
|a Pan, Yixiang
|e verfasserin
|4 aut
|
| 245 |
1 |
4 |
|a The Safety Illusion? Testing the Boundaries of Concept Removal in Diffusion Models
|
| 264 |
|
1 |
|c 2025
|
| 336 |
|
|
|a Text
|b txt
|2 rdacontent
|
| 337 |
|
|
|a ƒaComputermedien
|b c
|2 rdamedia
|
| 338 |
|
|
|a ƒa Online-Ressource
|b cr
|2 rdacarrier
|
| 500 |
|
|
|a Date Revised 17.10.2025
|
| 500 |
|
|
|a published: Print-Electronic
|
| 500 |
|
|
|a Citation Status Publisher
|
| 520 |
|
|
|a Text-to-image diffusion models are capable of producing high-quality images from textual descriptions; however, they present notable security concerns. These include the potential for generating Not-Safe-For-Work (NSFW) content, replicating artists' styles without authorization, or creating deepfakes. Recent advancements have proposed concept erasure techniques to eliminate sensitive concepts from these models, aiming to mitigate the generation of undesirable content. Nevertheless, the robustness of these techniques against a wide range of adversarial inputs has not been comprehensively investigated. To address this challenge, a novel two-stage optimization attack framework based on adversarial perturbations, referred to as Concept Embedding Adversary (CEA), was proposed in the present study. By leveraging the cross-modal alignment priors of the CLIP model, CEA iteratively adjusts adversarial embedding vectors to approximate the semantic expression of specific target concepts. This process enables the construction of deceptive adversarial prompts that exploit diffusion models, compelling them to regenerate previously erased concepts. The performance of concept erasure methods was evaluated, specifically when dealing with diversified adversarial prompts targeting erased concepts, such as NSFW content, artistic styles, and objects. Extensive experimental results demonstrate that existing concept erasure methods are unable to completely eliminate target concepts. In contrast, the proposed CEA framework exploits residual vulnerabilities within the generative latent space through a two-stage optimization process. By achieving precise cross-modal alignment, CEA attains significantly higher ASR in regenerating erased concepts
|
| 650 |
|
4 |
|a Journal Article
|
| 700 |
1 |
|
|a Luo, Ting
|e verfasserin
|4 aut
|
| 700 |
1 |
|
|a Li, Yufeng
|e verfasserin
|4 aut
|
| 700 |
1 |
|
|a Xing, Wenpeng
|e verfasserin
|4 aut
|
| 700 |
1 |
|
|a Chen, Minjie
|e verfasserin
|4 aut
|
| 700 |
1 |
|
|a Han, Meng
|e verfasserin
|4 aut
|
| 773 |
0 |
8 |
|i Enthalten in
|t IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
|d 1992
|g PP(2025) vom: 17. Okt.
|w (DE-627)NLM09821456X
|x 1941-0042
|7 nnas
|
| 773 |
1 |
8 |
|g volume:PP
|g year:2025
|g day:17
|g month:10
|
| 856 |
4 |
0 |
|u http://dx.doi.org/10.1109/TIP.2025.3620665
|3 Volltext
|
| 912 |
|
|
|a GBV_USEFLAG_A
|
| 912 |
|
|
|a SYSFLAG_A
|
| 912 |
|
|
|a GBV_NLM
|
| 912 |
|
|
|a GBV_ILN_350
|
| 951 |
|
|
|a AR
|
| 952 |
|
|
|d PP
|j 2025
|b 17
|c 10
|