The AI System Self-Audit Checklist
20 questions that separate a trustworthy production AI system from a liability. Run them against any chatbot, knowledge system, or agent you have deployed - whoever built it. If you can't answer a question, that itself is the finding.
Get the checklist by email
We'll send you this checklist as a document you can share with your team, plus occasional practical notes on AI system quality. Nothing else.
No spam. Unsubscribe anytime.
01Accuracy & grounding
- 1.Has anyone measured the system's answer accuracy against a documented question set built from your real documents - and can you see that number?
- 2.When the answer isn't in the source material, does the system say "I don't know" - or does it produce something anyway?
- 3.Does every factual answer cite the specific source document and passage it came from?
- 4.Has accuracy been re-measured since launch, or only during the initial demo?
- 5.If the system answers in Arabic (or any second language), was quality measured separately for that language?
02Security
- 6.Has anyone attempted prompt injection against it (instructions hidden in user input or uploaded documents)?
- 7.Can one user's session ever see another user's data - and has that actually been tested?
- 8.Can the system be talked into actions or statements outside its intended scope (discounts, commitments, policy exceptions)?
- 9.Are the system's internal instructions extractable by a determined user?
03Cost & economics
- 10.Do you know the true cost per query, including retries and context overhead - not just the vendor's per-token price?
- 11.What does the monthly bill become at 10x current volume, and does anyone have that projection in writing?
- 12.If the model provider raises prices or deprecates the model, what is the migration path?
04Operations
- 13.If answer quality degraded next week, would anything alert you - or would customers find out first?
- 14.Are all interactions logged in a way that lets you reconstruct what the system said to whom?
- 15.What happens during a model-provider outage: graceful fallback, or silent failure?
- 16.Is there a defined human escalation path when the system is unsure?
05Ownership & governance
- 17.Do you have the source code, prompts, and configuration - or does the vendor hold them?
- 18.Is there a written statement of what data leaves your infrastructure and where it goes?
- 19.Can you name the person accountable for the system's answers - inside your company or the vendor's?
- 20.If the vendor disappeared tomorrow, could anyone else maintain the system from its documentation?
Scoring: count the questions you could not answer or answered "no". 0-3: unusually healthy - verify with spot measurement. 4-8: typical, with real exposure worth quantifying. 9+: the system is running on trust, not evidence.
Want the answers measured, not estimated?
Our fixed-fee audit answers all 20 questions with evidence: measured hallucination rate on your documents, tested security exposure, a written report in 10 business days.
See the full audit