Most "AI hacking" content online describes prompt injection in the abstract and stops there.
That's the introductory chapter. Real LLM hacking in 2026 is multi-stage attacks against
agentic systems, indirect injection through trusted-data channels, and exfiltration techniques
that don't show up in alignment refusals. Here's the actual landscape.
Read more — the five technique categories that matter
Direct prompt injection
The classic. You type something into a chat interface that overrides the system prompt's
instructions. "Ignore previous instructions" was the 2023 version; the 2026 versions are
structured to look like legitimate tool outputs, schema definitions, or completed reasoning
chains so the model treats them as authoritative. Direct injection still works against most
deployments because input/instruction separation is rarely enforced at the model level — only
at the application level, where it's bypassable.
Indirect prompt injection
This is where AI hacking gets interesting. You don't attack the LLM through its user input
channel; you attack it through whatever data it retrieves. Plant the payload in a webpage the
agent will browse. In a PDF the RAG system will index. In a calendar invite the assistant will
summarize. In a GitHub issue the coding agent will read. The model encounters the payload
during normal operation and follows the embedded instructions as if they came from its operator.
Indirect injection works against an enormous fraction of production AI deployments because the
builders treated retrieved content as data, not as untrusted instructions. The model treats
retrieved content the same way it treats the system prompt — as text to follow. That trust
asymmetry is the actual vulnerability, and there's no clean fix at the model layer. Defense
has to happen in the application's data pipeline.
Jailbreaking
Jailbreaking specifically targets the model's alignment training — the RLHF and constitutional
AI techniques that teach the model to refuse harmful requests. The point isn't always to extract
harmful content; often it's to demonstrate the alignment can be defeated, which matters for
deployments that rely on alignment as a safety control. Techniques in active use:
role-playing as a less-restricted persona, multi-turn priming that gradually shifts context,
cipher-encoded prompts that bypass content classifiers, and adversarial suffix attacks
generated against the model's logit outputs.
Each major frontier model (GPT-4, Claude, Gemini, Llama) has its own jailbreak landscape.
Techniques transfer partially between models but rarely cleanly. Anthropic, OpenAI, and Google
ship alignment improvements continuously; what worked last quarter often doesn't work this
quarter. The
jailbreaking techniques database tracks
what's currently effective.
Agentic exploitation
Agents are LLMs with tools — the ability to call APIs, execute code, browse the web, access
files. The attack surface expands dramatically because exploitation now has consequences in
the world, not just in the conversation. A successful prompt injection against a browser agent
can result in actual HTTP requests to attacker-controlled endpoints. Against a coding agent,
it can result in committed malicious code. Against a customer support agent with database
access, it can result in PII exfiltration.
Agentic exploitation is where AI hacking transitions from "the model said a bad thing" to
"the model did a bad thing." The defense techniques (capability-based security, tool isolation,
human-in-the-loop for high-impact actions) are immature and inconsistently deployed. This is
the highest-impact area of AI red team work in 2026 and where the most paid bug bounty findings
are landing.
What you don't see online
Public AI hacking content lags real-world findings by 6-12 months. Researchers and red team
professionals don't publish working techniques against frontier models because (a) the model
providers patch them within weeks, and (b) there's commercial value in keeping them private.
What gets published are the techniques that have already been patched, the academic-paper
versions of attacks, and the entry-level concepts. Real practitioner work happens in private
engagements and bug bounty programs.
Everything taught in the
AI LLM Hacking Course is based on techniques that
currently work or recently worked against production systems. Not academic theory.