Free Checklist

The AI System Self-Audit Checklist

20 questions that separate a trustworthy production AI system from a liability. Run them against any chatbot, knowledge system, or agent you have deployed - whoever built it. If you can't answer a question, that itself is the finding.

Get the checklist by email

We'll send you this checklist as a document you can share with your team, plus occasional practical notes on AI system quality. Nothing else.

No spam. Unsubscribe anytime.

01Accuracy & grounding

  1. 1.Has anyone measured the system's answer accuracy against a documented question set built from your real documents - and can you see that number?
  2. 2.When the answer isn't in the source material, does the system say "I don't know" - or does it produce something anyway?
  3. 3.Does every factual answer cite the specific source document and passage it came from?
  4. 4.Has accuracy been re-measured since launch, or only during the initial demo?
  5. 5.If the system answers in Arabic (or any second language), was quality measured separately for that language?

02Security

  1. 6.Has anyone attempted prompt injection against it (instructions hidden in user input or uploaded documents)?
  2. 7.Can one user's session ever see another user's data - and has that actually been tested?
  3. 8.Can the system be talked into actions or statements outside its intended scope (discounts, commitments, policy exceptions)?
  4. 9.Are the system's internal instructions extractable by a determined user?

03Cost & economics

  1. 10.Do you know the true cost per query, including retries and context overhead - not just the vendor's per-token price?
  2. 11.What does the monthly bill become at 10x current volume, and does anyone have that projection in writing?
  3. 12.If the model provider raises prices or deprecates the model, what is the migration path?

04Operations

  1. 13.If answer quality degraded next week, would anything alert you - or would customers find out first?
  2. 14.Are all interactions logged in a way that lets you reconstruct what the system said to whom?
  3. 15.What happens during a model-provider outage: graceful fallback, or silent failure?
  4. 16.Is there a defined human escalation path when the system is unsure?

05Ownership & governance

  1. 17.Do you have the source code, prompts, and configuration - or does the vendor hold them?
  2. 18.Is there a written statement of what data leaves your infrastructure and where it goes?
  3. 19.Can you name the person accountable for the system's answers - inside your company or the vendor's?
  4. 20.If the vendor disappeared tomorrow, could anyone else maintain the system from its documentation?

Scoring: count the questions you could not answer or answered "no". 0-3: unusually healthy - verify with spot measurement. 4-8: typical, with real exposure worth quantifying. 9+: the system is running on trust, not evidence.

Want the answers measured, not estimated?

Our fixed-fee audit answers all 20 questions with evidence: measured hallucination rate on your documents, tested security exposure, a written report in 10 business days.

See the full audit