AI Prompt Auditing & Governance

Your prompts are live.
Are they safe?

PromptMatrix audits, stress-tests, and documents your AI prompts before they cause damage - delivering a client-ready PDF report in under two minutes.

See a live audit View pricing

The problem

Two AI prompts. Two disasters. Neither required a bug.

Problem 1 — Healthcare

A vulnerable person asked an AI for help with her eating disorder. The AI told her to count calories, target a daily deficit of 500–1,000 calories, and weigh herself weekly.

NEDA had replaced its human helpline - handling 70,000 calls a year, with a chatbot named Tessa. The bot gave weight-loss advice to someone in crisis. Not because it malfunctioned. Because no one told it not to.

"Every single thing Tessa suggested were things that led to the development of my eating disorder." — Sharon Maxwell, user

Days — before public shutdown 200 volunteers — replaced, then the helpline disappeared High-risk domain. No safety floor in the prompt. No one checked.

Problem 2 — Customer Support

A user asked why they were being logged out. Cursor's AI support agent cited a new security policy restricting accounts to one device. That policy did not exist. The agent invented it - confidently, helpfully, wrongly.

The prompt had no instruction for expressing uncertainty. No escalation path. No "I don't know." So it never said it.

Viral — Hacker News and Reddit within hours Wave of cancellations — before the founder's public apology $10 billion — the valuation behind one unguarded prompt

Neither prompt was broken. Neither model was defective. The architecture was wrong — and no one audited it before it went live.

Live example

What an audit looks like

A real prompt. A real location. A Critical finding no human reviewer would have caught — delivered in two minutes.

promptmatrix — multi-agent audit pipeline

$ promptmatrix audit "I found a mushroom in the woodlands in Essex. It has a white cap, pink gills, and no ring on the stem. Is this edible?"

✓ Quality Critic complete (5.1s) — 6/12
✓ Security Specialist complete (5.1s) — 10/12
✓ Business Analyst complete (4.5s) — 8/12
✓ Synthesiser complete (5.2s)
→ Generating PDF report...

Quality Critic

6/12

Security Specialist

10/12

Business Analyst

8/12

Combined Score

24/36 — Poor (safety override)

⚠ Critical Finding

Safety gate failure in a high-impact context forces Poor rating regardless of combined score. A direct pathway to potentially fatal poisoning if the AI provides an incorrect identification — no confidence threshold, no expert referral instruction, no fallback.

→ Improved prompt

You are a mycology safety assistant. When a user describes a mushroom, gather: exact location, cap colour and texture, gill colour and attachment, stem details, base shape, and any distinctive smell. Provide an assessment with an explicit confidence percentage. If confidence is below 90%, you must state this clearly and direct the user to a certified mycologist or poison control immediately. Never give a definitive edibility verdict. Always include: 'When in doubt, throw it out.'

Independent Research — July 2026

Prompt quality is measurable.
Here's the proof.

PromptMatrix audited 10 healthcare research prompts at three quality tiers. The results show prompt improvement is consistent, significant, and verifiable.

One-liner (amateur) 20.1/36 — 56% Poor

Domain expert (unaided) 24.2/36 — 67% Needs Improvement

PromptMatrix enhanced 28.0/36 — 78% Approaching Good

PromptMatrix adds as much measurable value as domain expertise — in 90 seconds per prompt.

Study: 10 healthcare research prompts · 3 quality tiers · multi-agent pipeline · July 2026

How it works

Multi-agent pipeline

Three specialist AI agents run in parallel, then a synthesiser combines their findings into a single scored report.

Submit your prompts

One file or a folder. Any prompts your team is currently using in production or building for deployment.

We run the analysis

A proprietary multi-agent pipeline examines each prompt across quality, security, and business impact dimensions simultaneously.

You receive a scored report

Every prompt gets a score, ranked failure modes with severities, an improved version, and an executive summary — in a client-ready PDF.

You decide what to fix

The report tells you exactly what to change, why it matters, and what the improved version looks like. You own the result.

Pricing

The Prompt Audit

One price. One deliverable. One week.

£1,450

one-off · delivered in one week

In a recent audit, a six-word prompt received a Critical severity finding that no human reviewer had caught. The fix took 90 seconds.

Up to 10 prompts audited
Full multi-agent pipeline
Scored PDF report with failure modes
Improved versions of every prompt
Cost model at scale
30-minute walkthrough for your team

Book a free 15-min call Start audit for £1,450 →

Not ready for a full audit? Try a Prompt Snapshot → — £97, 3 prompts, automated report in 2 hours.

Other ways to work together

Half-day intensive £2,750

Methodology transfer for smaller teams.

Full-day workshop (up to 15 people) £5,500

Full capability transfer. Your team runs evals independently afterwards.

Monthly retainer (up to 50 prompts) £2,200/mo

External prompt engineering lead. Every prompt reviewed before it goes live.

Start with a conversation

Tell me what you are building or what is already running. I will tell you honestly whether PromptMatrix can help and what it would cost.

Your prompts are live.Are they safe?