Training Data Poisoning
Manipulating training data to introduce biases or backdoors.
What is Training Data Poisoning?
Training Data Poisoning involves manipulating the data used to train or fine-tune an LLM. By introducing malicious, biased, or incorrect data, attackers can compromise the model's behavior, introduce backdoors, or degrade its performance.
This can happen at various stages:
- Pre-training: Poisoning the massive datasets scraped from the web.
- Fine-tuning: Injecting malicious examples during the instruction tuning phase.
- RAG (Retrieval-Augmented Generation): Injecting malicious documents into the knowledge base that the model retrieves from.