From observe to enforce: rolling out guardrails without a Friday-night incident
The standard guardrail launch failure is: a policy author writes a rule against a synthetic test fixture; the rule ships into production; the rule is wrong; an actual customer flow gets denied at 4pm on a Friday. MCP Guard's rollout shape exists to make that failure mode impossible to reach by accident. The contract is two modes, an identical call site, a readiness verdict, and a one-button flip that an auditor can trace.
The two-mode contract
Every policy decision is one of three values: allow, review, deny. Every tenant runs in one of two modes: observe or enforce. The mode determines what happens when the decision is not allow:
| Decision | Observe | Enforce |
|---|---|---|
| allow | passes, audit row written | passes, audit row written |
| review | coerced to allow, flagged in audit | throws MCPGuardReviewError, queue row created |
| deny | coerced to allow, flagged in audit | throws MCPGuardDeniedError, tool call skipped |
Observe mode is not a debug mode. It is a production mode whose job is to surface the calls that would have been blocked in enforce. You ship it; you watch the audit log fill up; you tune rules from real call shapes; you flip.
The call site is identical
The same guard.enforce(...) call works in both modes. The mode is a tenant-level setting on the server, not a flag in your code. That removes a whole class of bugs where the local config drifts from production and you find out at 3am.
// Same code in observe and enforce. Mode is decided server-side.
await guard.enforce({
action_id: 'billing.refund',
params: { invoice_id, amount_cents },
user_context: { user_id, plan },
})
// In observe mode this never throws. The audit log records
// the would-have-been decision so you can see the impact.
await stripe.refunds.create({ charge, amount: amount_cents })The mode is fetched from a per-tenant cache on the evaluation worker (60-second TTL), so flipping a tenant takes effect within a minute without redeploying anything.
The readiness verdict
Before the dashboard lets you flip a tenant to enforce, it runs a readiness verdict against the last 14 days of observe-mode audit rows. The verdict checks:
- Coverage. Did every production
action_idget evaluated at least once? Missing actions are a red flag — usually it means a code path is bypassing the gate. - Stability. Across the window, what fraction of calls would have been
deny? A spike near the flip date often signals a rule mis-fit. - Review backlog. If
reviewverdicts in observe are firing at >5% of total volume, your team will drown the moment enforce flips on. Tune first. - Audit completeness. Are all rows hash-chain-verified? Any gaps block the flip until reconciled.
The verdict is a yellow/green light, not a hard gate — you can override it with a reason — but the override is recorded.
AAL2 + 10-character reason to flip
The actual flip is gated by two things at the dashboard:
- AAL2 (TOTP step-up). The session must have been re-authenticated with a TOTP code in the last 5 minutes. A stale session cannot flip the mode even if it has the right role.
- A 10-character reason. Free text. We do not parse it. We pin it to the
mode_changesrow so an auditor can match the flip to a Jira ticket or a CAB approval.
The PUT endpoint at /api/config/mode enforces both server-side; the dashboard ModeFlipButton drives the reason → challenge → verify → retry state machine on the client. Same-origin requests only.
Per-action override
The tenant default is observe or enforce, but every action can override. The recommended ramp is:
- Tenant default =
observe. - Promote low-risk read-only actions first by setting
actions.mode_override = enforceon them. If anything goes wrong, the blast radius is one tool. - Promote the next tier. Watch the audit log.
- When the override list covers every action, flip the tenant default to enforce and clear the overrides.
The per-action override uses the same flip gate (AAL2 + reason) and lands in the same audit table.
Auditable rollout
Every mode change — tenant default or per-action — writes a row to mode_changes:
changed_by_user_idchanged_atprevious_mode / new_modescope(tenant or action_id)reason(the 10+ char free-text field)aal_at_change(must beaal2)
The audit bundle export includes these rows alongside the per-call decisions, so an auditor can ask "what was enforced on day X for tenant Y?" and get a hash-verified answer without trusting either party.