Prompt Injection Attack Classification Framework

This repository contains a comprehensive collection of prompt injection techniques categorized by method, target, severity, and other relevant attributes. The goal is to provide security researchers, AI developers, and testers with a reference to understand, test, and mitigate prompt injection vulnerabilities.

Classification System

Primary Categories

  1. Instruction Override Attacks

    • Direct override attempts (“Ignore previous instructions”)

    • Nested override (“Ignore everything before this…”)

    • Priority manipulation (“This is more important than…”)

  2. Role-Playing Attacks

    • Character assumption (“You are now X…”)

    • Film/scene setup (“In this movie, Alice and Bob…”)

    • DAN-style jailbreaks (“Developer Mode”, “DAN”, “STAN”, etc.)

  3. Context Manipulation

    • False context insertion (“Context: X is true”)

    • Context dismissal (“Forget about the context”)

    • Authority invocation (claiming authority to change instructions)

  4. Formatting Tricks

    • Letter scattering

    • Special characters and Unicode manipulation

    • Whitespace and line break exploitation

  5. Multilingual Attacks

    • Language switching mid-prompt

    • Translation requests

    • Non-Latin character sets

  6. Psychological Manipulation

    • Emotional appeals

    • Urgency creation

    • Trust exploitation

  7. Jailbreak Techniques

    • Structured frameworks (DAN, DUDE, etc.)

    • Token systems

    • Alternative “modes” of operation

  8. Hijacking Attacks

    • Direct command hijacking

    • Capitalized directives

    • “System level” commands

  9. Authority Role Impersonation

    • Terminal/system impersonation

    • Authority figure creation

    • Expert/specialist role assumption