Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The poisoned images aren't intended to be viewed, rather scraped and pass a basic human screen. You wouldn't be able to denoise as you'd have to denoise the entire dataset, the entire point is that these are virtually undetectable from typical training set examples, but they can push prompt frequencies around at will with a small number of poisoned examples.


> You wouldn't be able to denoise as you'd have to denoise the entire dataset

Doing that requires much less compute than training a large generative image model.


I guess the idea is that the model trainers are ignorant of this and wouldn't know to preprocess/wouldn't bother?

That's actually quite plausible.


> I guess the idea is that the model trainers are ignorant of this

Maybe they're ignorant of it right up until you announce it, but then they're no longer ignorant of it.


Right, but they aren't necessarily paying attention to this.

I am not trying to belittle foundational model trainers, but a lot goes on in ML land. Even groups can't track every development.


> the entire point is that these are virtually undetectable from typical training set examples

I'll repeat this point for clarity. After going over the paper again, denoising shouldn't affect this attack, it's the ability of plausible images to not be detected by human or AI discriminators (yet)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: