Home Blog

Agent red-teaming is the new go-live checklist: prompt-injection → tool-safety (practical edition)

Published 2026-03-16 • Tags: AI trends, security, governance, agentic workflows, operations

The moment your AI stops being “chat-only” and starts reading tickets, Confluence pages, emails, PDFs, or web pages, you’ve created a new security surface: untrusted instructions inside trusted workflows.

This isn’t theoretical. Prompt injection is basically phishing for tool-using models. It’s also why agent deployments feel scary: it’s not the model’s IQ — it’s the lack of an operating checklist.

Thesis: treat agents like production systems. Before “go-live”, run a small red-team checklist focused on instruction hierarchy, tool safety, and rollback. You’ll ship faster and sleep better.

What you’re defending against (in plain language)

The practical red-team checklist (do this in 60–120 minutes)

1) Draw trust boundaries (two lists)

Write these down in the runbook — don’t leave it implicit:

Rule: untrusted content may contain data, but it must never become instructions.

2) Require citations for “why”, and structured outputs for “what”

For operational recommendations, force a structure like: Recommendation + Rationale + Citations + Assumptions + Next tests. This makes injection attempts obvious (“why is it citing a random ticket line as a policy?”).

3) Put every tool behind a guardrail (capability design)

4) Add two budgets: time and blast radius

Agents go wrong when they keep trying. Add explicit ceilings:

5) Run three injection scripts against your own workflow

You don’t need fancy security testing to start. Put these payloads in a test ticket or doc and see what happens:

If the agent complies, your fix is usually not “better prompting” — it’s permission lanes, tool design, and gates.

What good looks like: the ‘Draft-first’ operating model

The simplest safe default for most businesses:

Outcome: you get speed (drafts in minutes) without “agent chaos”.

Where Workflow ADL fits

We build agent workflows with governance baked in: scoped queues, safe tool design, eval gates, and audit trails. If you want to deploy business AI safely (without slowing down), book a consult.

Freshness (RSS): OpenAI: Designing AI agents to resist prompt injection, OpenAI: OpenAI to acquire Promptfoo, Hugging Face: Community Evals.