Prompt Injection

Manipulating LLM input to bypass safety filters.

What is Prompt Injection?

Prompt Injection is a vulnerability where an attacker manipulates the input to a Large Language Model (LLM) to bypass its safety filters or execute unauthorized actions. It's similar to SQL Injection, but for AI models.

There are two main types:

Direct Injection (Jailbreaking): The user directly tells the model to ignore its instructions.
Indirect Injection: The model processes a malicious document or website that contains hidden instructions.

Real-World Impact

Attackers can force models to generate hate speech, reveal sensitive data, or perform actions on behalf of the user (like sending emails) if the model is connected to plugins.