Resillion

Internal cheat sheet

AI Assurance Cheat Sheet

A practical reference for Resillion delivery and QA teams to check whether an AI feature is ready to release. Use it to think through data quality, model behavior, evaluation, controls, oversight, and the evidence you should collect before and after launch.

What AI Assurance Covers

AI assurance is the set of checks that shows an AI system is fit for purpose, controlled, and observable in the real world. It is about proving the system behaves acceptably for the use case, not proving it is perfect.

Safety Reliability Fairness Security Transparency Compliance

Practical rule: if you cannot explain what the model is supposed to do, how it will be measured, and what happens when it goes wrong, the assurance is not ready.

Assurance Layers

Before release

Build evidence that the system is safe enough to launch.

Use curated data checks, offline evaluation, scenario testing, and sign-off gates.

After release

Prove the system stays safe enough in production.

Monitor drift, quality, latency, incidents, user feedback, and control breaches.

Data assurance
Is the input data correct, lawful, current, and representative?

Model assurance
Does the model behave consistently, accurately, and within limits?

Operational assurance
Are monitoring, escalation, logging, and rollback in place?

Data Assurance

Most AI failures start with data issues. Check the source, quality, lineage, and permissions first.

Lineage Consent Freshness Bias Completeness

Ask: where did the data come from, who owns it, how often does it change, and what is excluded?

Model Assurance

Validate expected behavior, failure modes, and any known boundaries before users depend on it.

Accuracy Robustness Hallucination rate Safety

Ask: what is the model good at, what is it bad at, and which cases must be blocked or escalated?

Evaluation Pack

A useful eval pack is small, repeatable, and tied to the actual business use case.

Golden set Edge cases Adversarial prompts Regression set

Minimum set: happy path, known bad inputs, ambiguity, sensitive topics, and off-topic requests.

Risk Controls

Input controls

Validate prompts, block unsafe content, and sanitize user-provided text.

Output controls

Check format, confidence, policy breaches, and unacceptable claims before delivery.

Access controls

Limit who can use, tune, or view the system, data, and logs.

Fallback controls

Provide a manual path, retry policy, or safe response when the model fails.

Human Oversight

Good

Humans review high-risk or low-confidence outputs.

The operator knows when to accept, edit, block, or escalate a response.

Bad

Users assume the model is always right.

No review step, no escalation path, and no owner for bad outcomes.

Define the human role clearly: reviewer, approver, exception handler, or incident responder.

Governance and Documentation

Artifact	What it should answer	Why it matters
Use case statement	What problem is the AI solving, for whom, and where is it not allowed to operate?	Stops scope creep and unrealistic expectations.
Risk assessment	What can go wrong, how severe is it, and what controls reduce the risk?	Shows you understand the failure modes before launch.
Evaluation report	Which tests were run, what passed, what failed, and what remains open?	Creates evidence for go-live decisions.
Operational runbook	How do we monitor, triage incidents, and roll back if needed?	Supports production readiness and incident response.

Common Failure Modes

Hallucination
The model invents facts, references, or rationale that are not grounded in the input.

Prompt injection
Malicious or accidental instructions override system intent or policy.

Data drift
The real-world distribution changes and the model quality drops over time.

Over-trust
Users treat the output as authoritative even when confidence is low.

Release Checklist

Before go-live, confirm:
          - The use case and boundaries are documented
          - Data sources are approved and traceable
          - Offline evaluation has been run against a relevant test set
          - Known failure modes are understood and communicated
          - Human review is defined for high-risk outputs
          - Logging, monitoring, and rollback are in place
          - An owner exists for incidents and ongoing tuning
          - The business accepts the residual risk

Decision question: if this system behaved exactly as tested, would we still be comfortable launching it?

Useful Prompts for Assurance Reviews

Use case	Prompt
Risk review	Review this AI use case and identify the key technical, operational, legal, and user-facing risks, then rank them by severity.
Eval design	Create a practical evaluation plan for this AI feature, including golden-path cases, edge cases, adversarial prompts, and pass/fail criteria.
Control check	Assess whether the proposed controls are enough to prevent unsafe outputs, prompt injection, and inappropriate user trust.
Release decision	Summarise the evidence needed to support a go-live decision and identify any missing artifacts or open questions.