Forcing Generative Models to Degenerate Ones: The Power of Data Poisoning Attacks

While poisoning attacks have received significant attention in the image domain (e.g., object detection), and classification tasks, their implications for generative models, particularly in the realm of natural language generation (NLG) tasks, remain poorly understood. To bridge this gap, we perform a comprehensive exploration of various poisoning techniques to assess their effectiveness across a range of generative tasks.

Note that this is specifically for natural language generation tasks.

In this section, we demonstrate the effectiveness of our designed data poisoning attacks on poisoning LLMs during fine tuning for two NLG tasks: text summarization and text completion... For both tasks, our attacks reduce the model performance only marginally... Overall, our findings suggest: 1. Increasing the percentage of poisoned training data in general significantly improves the success of the attack, while slightly decreases the stealthiness. 2. For text summarization, full fine-tuning is more susceptible to poisoning attacks than prefix-tuning; and vise versa for text completion. 3. Trigger insertion plays a crucial role in the success and stealthiness of the attack. 4. Hardness of attacks depends on the task.

But first movers don't always have to have great results. "Data Poisoning" is a concept that will be with us for a while.

To the best of our knowledge, this is the first work to investigate and characterize in detail poisoning attacks on NLG tasks.

Poisoning Data to Protect It – Communications of the ACM

"It's like they're saying, 'if you ask nicely, we won't break into your house.' How about you just don't break in at all?"

New tools are adding structured noise to images that confounds AI model training.

I suppose that someone somewhere is thinking about ways to defeat this, but what a thing to do with your life.

The article ends with a nice set of links to reviewed articles for further reading.

