What this site is for

AI Attacks exists to cover offensive AI security with the same rigor a working AI red teamer would expect — and the same honesty about what does and doesn’t land in production.

What we publish:

Technical writeups of working attacks. Prompt injection variants, jailbreak techniques and the model behaviors they exploit, indirect injection through retrieved content, multi-modal attack chains, agent and tool-use abuse. Where possible, reproducible PoCs against open models. Closed models get attack patterns and behavioral analysis.

Adversarial ML, applied. Membership inference, model extraction, evasion attacks, training-data extraction, backdoors — focused on what’s exploitable in deployed systems, not theoretical bounds.

Red team methodology. Scoping AI engagements, building attack libraries, communicating findings to a model team that doesn’t speak security and a security team that doesn’t speak ML.

Tooling reviews. Honest takes on the offensive AI security tooling landscape — Garak, PyRIT, promptmap, the LLM-specific scanners — and what each is actually good for.

What we don’t publish:

Press release rewrites
Listicles
Anything we can’t source to primary material

Bylines are pseudonymous. The work is the point. Tips, attack reports, and corrections to the editor.

Real content starts shortly.

Subscribe