Prompt Injection Attack Classification Framework¶

This repository contains a comprehensive collection of prompt injection techniques categorized by method, target, severity, and other relevant attributes. The goal is to provide security researchers, AI developers, and testers with a reference to understand, test, and mitigate prompt injection vulnerabilities.

Classification System¶

Primary Categories¶

Instruction Override Attacks
- Direct override attempts (“Ignore previous instructions”)
- Nested override (“Ignore everything before this…”)
- Priority manipulation (“This is more important than…”)
Role-Playing Attacks
- Character assumption (“You are now X…”)
- Film/scene setup (“In this movie, Alice and Bob…”)
- DAN-style jailbreaks (“Developer Mode”, “DAN”, “STAN”, etc.)
Context Manipulation
- False context insertion (“Context: X is true”)
- Context dismissal (“Forget about the context”)
- Authority invocation (claiming authority to change instructions)
Formatting Tricks
- Letter scattering
- Special characters and Unicode manipulation
- Whitespace and line break exploitation
Multilingual Attacks
- Language switching mid-prompt
- Translation requests
- Non-Latin character sets
Psychological Manipulation
- Emotional appeals
- Urgency creation
- Trust exploitation
Jailbreak Techniques
- Structured frameworks (DAN, DUDE, etc.)
- Token systems
- Alternative “modes” of operation
Hijacking Attacks
- Direct command hijacking
- Capitalized directives
- “System level” commands
Authority Role Impersonation
- Terminal/system impersonation
- Authority figure creation
- Expert/specialist role assumption