Prompt Injection Attack Classification Framework¶
This repository contains a comprehensive collection of prompt injection techniques categorized by method, target, severity, and other relevant attributes. The goal is to provide security researchers, AI developers, and testers with a reference to understand, test, and mitigate prompt injection vulnerabilities.
Classification System¶
Primary Categories¶
Instruction Override Attacks
Direct override attempts (“Ignore previous instructions”)
Nested override (“Ignore everything before this…”)
Priority manipulation (“This is more important than…”)
Role-Playing Attacks
Character assumption (“You are now X…”)
Film/scene setup (“In this movie, Alice and Bob…”)
DAN-style jailbreaks (“Developer Mode”, “DAN”, “STAN”, etc.)
Context Manipulation
False context insertion (“Context: X is true”)
Context dismissal (“Forget about the context”)
Authority invocation (claiming authority to change instructions)
Formatting Tricks
Letter scattering
Special characters and Unicode manipulation
Whitespace and line break exploitation
Multilingual Attacks
Language switching mid-prompt
Translation requests
Non-Latin character sets
Psychological Manipulation
Emotional appeals
Urgency creation
Trust exploitation
Jailbreak Techniques
Structured frameworks (DAN, DUDE, etc.)
Token systems
Alternative “modes” of operation
Hijacking Attacks
Direct command hijacking
Capitalized directives
“System level” commands
Authority Role Impersonation
Terminal/system impersonation
Authority figure creation
Expert/specialist role assumption