A traditional software incident announces itself. A service returns a 500, a queue backs up, a dashboard turns red, and a pager fires. The system tells you it is broken. An AI failure does the opposite. The system keeps running, the responses keep returning HTTP 200, the output keeps looking reasonable — and somewhere in that stream of plausible output, a model has started getting things wrong in a way that nothing in the stack treats as an error. By the time anyone notices, the failure has propagated through every downstream action the AI was wired to trigger.
This article is about what you do in that situation. Not why AI projects fail before they ship, and not how to design the integration layer — both are upstream problems with their own articles. This is the operational playbook for after a live AI system has produced a harmful, wrong, or anomalous output, and someone has to decide what to do in the next sixty minutes. The structure that determines how that hour goes is built before the incident, not during it.
Why an AI Incident Is Not a Software Incident
Conventional incident response assumes a few properties that AI systems violate. Understanding the violations is what separates an effective response from an instinctive one borrowed from the wrong playbook.
The first property is determinism. A normal bug reproduces. You capture the inputs, replay them, watch the same failure occur, and fix the code path that produced it. An LLM with a non-zero temperature may not return the same output for the same input twice. A model that gave a harmful answer at 2 PM may give a correct one at 2:05 PM with identical prompt text. This breaks the reproduce-then-fix loop that most engineers reach for first. You cannot always make the failure happen on demand, which means you cannot always confirm a fix by watching the failure stop.
The second is failure visibility. Conventional systems fail loudly — exceptions, status codes, timeouts. AI systems fail quietly and plausibly. A summarization model that silently drops a critical clause produces a fluent, confident, shorter summary. Nothing in the response signals that information was lost. The output passes every structural check because it is structurally valid. It is only wrong in a way that requires understanding the content to detect, which is precisely the understanding the AI was deployed to provide.
The third is blast radius. A failing API endpoint affects the callers of that endpoint. An AI failure affects everything the AI's output touches downstream — and in most production designs, that output is no longer reviewed by a human before it acts. Consider a hypothetical support-automation agent that begins misclassifying refund requests as fraud flags. Each misclassification triggers an automated account hold, a templated email, and a record in the fraud-review queue. The model produced one category of wrong output; the operating model around it converted that single failure mode into thousands of customer-facing actions. The AI was the source. The automation was the amplifier.
The fourth is attribution. When a deterministic service breaks, the cause is in the code or the data. When an AI system starts behaving differently, the cause could be a model version change, a prompt template edit, a shift in the input distribution, a degraded retrieval index, an upstream data source that changed format, or a provider-side update you were never told about. The failure surface is wider, and several of its contributors live outside your codebase entirely.
These four properties mean an AI incident requires a response sequence built for non-determinism, silent failure, automated propagation, and distributed causation. The borrowed software playbook handles none of them well.
Detection: Making Silent Failures Visible
You cannot respond to an incident you have not detected, and AI failures do not detect themselves. The monitoring that makes them visible has to be designed deliberately, because the default signals — error rates, latency, uptime — stay green through most AI failures.
Output-distribution monitoring is the primary signal. Track the statistical shape of the model's output over time, not just whether it returned. For a classifier, watch the distribution across categories: if a refund-classification model historically routes two percent of cases to fraud and that figure moves to nine percent over an hour, the system is telling you something even though every individual response looks valid. For a generative system, track length distributions, refusal rates, and the frequency of specific output patterns. A sudden shift in any of these is the AI equivalent of a spiking error rate. It does not tell you what is wrong. It tells you to look.
Input-distribution monitoring catches the upstream causes. A model that was correct yesterday and wrong today may not have changed at all — the inputs may have. If the data feeding the model drifts outside the distribution it was validated against, performance degrades without any code change. Monitoring input characteristics gives you a leading indicator and, later, a candidate root cause.
Ground-truth sampling is the honest check. Pull a small random sample of production outputs and have a human evaluate them against what the correct output should have been. This is expensive and slow, which is why most teams skip it, and why most teams discover failures from customer complaints instead. A modest sampling rate — even a few dozen outputs reviewed daily — establishes a measured accuracy baseline. When that baseline drops, you have evidence of a real failure rather than an anecdote.
Downstream-action monitoring watches the blast radius. Because AI output triggers automated actions, the actions themselves are a detection surface. A spike in automated account holds, refund denials, or escalations is often the first visible symptom of an upstream model failure. Instrumenting the consequences, not only the model, shortens the time between failure and detection — which is the single variable that most determines how large an AI incident becomes.
The principle underneath all four: a failure that nothing measures is a failure you learn about from the people it harmed. Detection is the difference between a contained incident and a public one.
Triage: Classifying Severity Before You Act
The moment a potential incident surfaces, the first decision is not how to fix it. It is how bad it is. Severity classification governs everything downstream — who gets pulled in, how aggressively you contain, whether you communicate externally. Classifying without a predefined scale means improvising under stress, and improvised severity is consistently wrong in both directions: teams over-react to cosmetic issues and under-react to harmful ones.
A workable severity scale for AI incidents has three levels.
Severity 1 — Harmful or unsafe output reaching users
The model is producing output that causes direct harm: unsafe instructions, exposure of data that should not be exposed, discriminatory decisions, financial actions taken in error, or any output that creates legal or safety exposure. Response is immediate and aggressive. The default action at this level is containment first, diagnosis second — you stop the harm before you understand it.
Severity 2 — Materially wrong output, contained harm
The model is wrong in a way that degrades the product or misleads users, but the consequences are recoverable and not safety-critical. A summarization feature dropping information, a recommendation engine returning irrelevant results, a classifier with elevated error rates that route to a reversible action. Response is urgent but measured. You can take time to diagnose before containing, provided the wrong output is not accumulating irreversible downstream effects.
Severity 3 — Anomalous behavior, no confirmed harm
The monitoring shows a distribution shift or an anomaly, but you have not confirmed that any output is actually wrong or harmful. This is an investigation, not yet an incident. The response is to assign an owner, gather evidence, and either escalate to Severity 2 once harm is confirmed or close it as a benign shift.
The classification has two inputs: severity of harm and reversibility of the downstream action. The second input is specific to AI systems and easy to neglect. An output that is moderately wrong but triggers an irreversible action — a payment, an irreversible account change, a published communication — is more severe than a badly wrong output that triggers a reversible one. Severity tracks consequence, not the magnitude of the model's error.
Containment: Stopping the Bleeding
Containment is the set of actions that stop the failure from causing further harm, taken before you understand the root cause. The defining discipline of AI incident response is that containment precedes diagnosis at high severity. The instinct to understand before acting is correct for a reproducible bug and dangerous for a propagating one.
Containment has a hierarchy, ordered from most to least disruptive.
The kill switch disables the AI feature entirely. This is the bluntest instrument and the most important one to have built in advance, because it is the only action that definitively stops the harm regardless of cause. A kill switch that requires a code deployment is not a kill switch — it is a forty-minute outage with extra steps. The mechanism must be a configuration flag or a runtime toggle that any responder can flip in seconds, because the situations that require it are exactly the situations where you do not have time to ship code.
Fallback to human routes the work the AI was handling to a person. This degrades capacity rather than removing capability. The automated support agent stops auto-resolving tickets and queues them for human agents; the AI-assisted decision reverts to a manual decision. This requires that the human path still exists — organizations that adopted AI specifically to eliminate the human path have removed their own fallback, which is a containment decision made unknowingly at deployment time.
Graceful degradation drops to a simpler, more predictable behavior. A recommendation engine showing AI-personalized results falls back to a static, rule-based list. A generative feature falls back to a templated response. The product gets worse but stays safe and functional. This preserves the most user value of the three options and is the right default at Severity 2, where the goal is to limit harm without removing the feature entirely.
The containment decision is a tradeoff between harm and capability, and the severity classification you already made determines which way the tradeoff resolves. At Severity 1, you take the kill switch and accept the capability loss, because the cost of continued harm exceeds the cost of the outage. At Severity 2, you degrade gracefully and keep diagnosing. The decision is faster and more defensible when the options are built and the thresholds are agreed before the incident.
Rollback: Reverting to a Known-Good State
Containment stops the harm. Rollback restores correct behavior by reverting whatever changed. The difficulty specific to AI systems is that "whatever changed" spans more surfaces than code, and several of them are not under version control in most organizations.
There are four candidate rollback targets, and identifying the right one is half the work.
Model version. If the incident followed a model update — a new fine-tune, a new base model, a provider-pushed version change — reverting to the previous model is the cleanest rollback, provided the previous version is still deployable. This requires that you pin model versions explicitly and retain the ability to redeploy a prior one. Teams that call a provider's latest endpoint without version pinning cannot perform this rollback, because they do not control the version and may not even be notified when it changes.
Prompt or template. If the incident followed a prompt-template edit, the change is small, fast to revert, and frequently the actual cause. Prompts are code that controls model behavior, and they deserve the same version control, review, and rollback capability as application code. The common anti-pattern is editing prompts directly in a production configuration with no history — which makes this rollback impossible because there is no prior version to return to.
Retrieval or context source. For systems that feed the model retrieved context, a degraded index, a corrupted document, or a changed data source can cause failures while the model and prompt are unchanged. Rollback here means reverting the index or removing the offending source.
Configuration. Parameters like temperature, token limits, and routing rules change model behavior and are often edited outside the normal deployment pipeline. A temperature increase made to improve output variety can push a system into unreliable territory.
The operational requirement underneath all four is that every behavior-affecting surface must be versioned and individually revertible. The model, the prompts, the retrieval sources, and the configuration are all inputs to the system's behavior, and an input you cannot roll back is an input that can cause an incident you cannot resolve quickly. The non-determinism problem compounds this: because you may not be able to reproduce the failure on demand, you often cannot confirm a rollback worked by re-triggering the failure. You confirm it by watching the output distribution return to its baseline over the monitoring window, which makes the detection layer a prerequisite for clean rollback rather than a separate concern.
Communication: Internal and External
An incident has a technical track and a communication track, and they run in parallel. Neglecting the communication track turns a contained technical incident into a trust failure that outlasts the bug by months.
Internal communication has one job during the active incident: keep the people who need to act aligned without pulling responders out of the response. This means a single incident channel, one designated incident lead who owns decisions and coordination, and status updates at a fixed cadence rather than on demand. The incident lead is not necessarily the person fixing the problem — their role is to hold the timeline, make the containment and escalation calls, and keep everyone else from interrupting the responders for status. For an AI incident specifically, the internal record must capture what the model produced, what downstream actions fired, and what was contained, because that record becomes the input to the postmortem and the basis for any external communication.
External communication is governed by reversibility and harm. If the failure produced output that affected users in a way they can see, or that requires action — an incorrect decision, an erroneous communication, exposed data — silence is the wrong choice, because the affected users already know something is wrong and silence reads as either incompetence or concealment. The communication should state what happened in plain terms, what the consequence was, what you have done, and what the affected user should do, if anything. It should not minimize, and it should not over-explain the technical detail. The standard is the same one any incident communication should meet: accurate, specific, and respectful of the reader's situation.
The decision of whether to communicate externally is not a judgment call to be made under pressure by whoever happens to be in the channel. It belongs in the severity definition. A Severity 1 incident that reached users carries a default communication obligation; the incident lead executes it rather than debating it. Predefining the trigger removes the most common failure in incident communication, which is a stressed team deciding that this particular incident is the exception that does not need disclosure.
The Postmortem: Closing the Loop So It Cannot Recur
The incident is not over when the system is healthy. It is over when the same failure can no longer happen the same way. The postmortem is the mechanism that converts a resolved incident into a structural improvement, and a postmortem that produces only a narrative of what happened has done half the job.
A useful AI postmortem answers a specific set of questions. What was the failure, in terms of the output the model produced? What was the root cause across the full surface — model, prompt, retrieval, configuration, or input distribution? How long did it run before detection, and why did it take that long? What contained it, and could containment have been faster? What was the downstream blast radius, and which automated actions amplified it?
The questions matter, but the output matters more. Every postmortem must produce specific, owned, scheduled changes that reduce the recurrence or the blast radius of this class of failure. A postmortem without action items is documentation, not closure. The action items for AI incidents tend to cluster into recognizable categories: a detection gap that lets the failure run too long becomes a new monitoring signal; a missing rollback capability becomes a versioning requirement on the surface that changed; an oversized blast radius becomes a constraint on what the AI is permitted to trigger automatically; a slow containment becomes a kill-switch improvement.
The feedback that closes the loop most durably is constraining the blast radius. Many AI incidents are severe not because the model was badly wrong but because the operating model converted a small error into a large consequence through unchecked automation. The structural fix is to insert a validation step between the AI's output and the irreversible action — the same principle that governs AI output before it reaches production code. The model proposes; a deterministic check or a human confirms the consequential action; the harm is bounded at the action, not the model. A postmortem that adds this constraint has made the entire class of incident less severe, not merely fixed the instance.
This is also where AI postmortems generalize. The same discipline that captures a lesson and feeds it back into a governance system applies here: the postmortem's findings should update the runbook, the severity definitions, and the monitoring, so that the next incident of this shape is detected faster, contained faster, and communicated correctly by default.
Readiness Is Built Before the First Incident
Every section here has pointed back to the same place. The kill switch has to exist before you need it. The severity scale has to be agreed before you classify under pressure. The rollback requires versioned surfaces established long before anything breaks. The monitoring that makes a silent failure visible has to be instrumented while the system is healthy. The communication trigger has to be defined before the stressed team is tempted to call this incident the exception. Incident response is almost entirely a readiness problem disguised as a real-time one.
Readiness for AI incidents reduces to a small set of artifacts that exist before the first failure: a runbook a stressed responder can execute without improvising, a severity scale that ties harm and reversibility to a specific response including the external-communication trigger, a versioning discipline across every behavior-affecting surface so rollback is a reversion rather than a redesign, and a detection layer that watches the signals the default monitoring misses. Each needs a single named owner, because an artifact no one owns is an artifact no one maintains.
The teams that handle AI incidents well are not the ones with the most capable models or the fastest responders. They are the ones who treated the first incident as inevitable and built the response structure before it arrived — so that when the system started failing silently, plausibly, and at scale, the next hour was a procedure rather than an improvisation. The model will eventually produce something wrong in production. The only variable you control is whether the structure to catch it, contain it, and learn from it was already there when it did.






