Rollout · Guide

From observe to enforce: rolling out guardrails without a Friday-night incident

The standard guardrail launch failure is: a policy author writes a rule against a synthetic test fixture; the rule ships into production; the rule is wrong; an actual customer flow gets denied at 4pm on a Friday. MCP Guard's rollout shape exists to make that failure mode impossible to reach by accident. The contract is two modes, an identical call site, a readiness verdict, and a one-button flip that an auditor can trace.

The two-mode contract

Every policy decision is one of three values: allow, review, deny. Every tenant runs in one of two modes: observe or enforce. The mode determines what happens when the decision is not allow:

DecisionObserveEnforce
allowpasses, audit row writtenpasses, audit row written
reviewcoerced to allow, flagged in auditthrows MCPGuardReviewError, queue row created
denycoerced to allow, flagged in auditthrows MCPGuardDeniedError, tool call skipped

Observe mode is not a debug mode. It is a production mode whose job is to surface the calls that would have been blocked in enforce. You ship it; you watch the audit log fill up; you tune rules from real call shapes; you flip.

The call site is identical

The same guard.enforce(...) call works in both modes. The mode is a tenant-level setting on the server, not a flag in your code. That removes a whole class of bugs where the local config drifts from production and you find out at 3am.

any_handler.tsts
// Same code in observe and enforce. Mode is decided server-side.
await guard.enforce({
  action_id: 'billing.refund',
  params: { invoice_id, amount_cents },
  user_context: { user_id, plan },
})

// In observe mode this never throws. The audit log records
// the would-have-been decision so you can see the impact.
await stripe.refunds.create({ charge, amount: amount_cents })

The mode is fetched from a per-tenant cache on the evaluation worker (60-second TTL), so flipping a tenant takes effect within a minute without redeploying anything.

The readiness verdict

Before the dashboard lets you flip a tenant to enforce, it runs a readiness verdict against the last 14 days of observe-mode audit rows. The verdict checks:

  • Coverage. Did every production action_id get evaluated at least once? Missing actions are a red flag — usually it means a code path is bypassing the gate.
  • Stability. Across the window, what fraction of calls would have been deny? A spike near the flip date often signals a rule mis-fit.
  • Review backlog. If review verdicts in observe are firing at >5% of total volume, your team will drown the moment enforce flips on. Tune first.
  • Audit completeness. Are all rows hash-chain-verified? Any gaps block the flip until reconciled.

The verdict is a yellow/green light, not a hard gate — you can override it with a reason — but the override is recorded.

AAL2 + 10-character reason to flip

The actual flip is gated by two things at the dashboard:

  1. AAL2 (TOTP step-up). The session must have been re-authenticated with a TOTP code in the last 5 minutes. A stale session cannot flip the mode even if it has the right role.
  2. A 10-character reason. Free text. We do not parse it. We pin it to the mode_changes row so an auditor can match the flip to a Jira ticket or a CAB approval.
Why a reason field?
Because the auditor will eventually ask "why was this flipped on October 14th at 11pm?" and the answer "because someone clicked a button" is not good enough. The reason field is the cheapest way to make the answer good enough.

The PUT endpoint at /api/config/mode enforces both server-side; the dashboard ModeFlipButton drives the reason → challenge → verify → retry state machine on the client. Same-origin requests only.

Per-action override

The tenant default is observe or enforce, but every action can override. The recommended ramp is:

  1. Tenant default = observe.
  2. Promote low-risk read-only actions first by setting actions.mode_override = enforce on them. If anything goes wrong, the blast radius is one tool.
  3. Promote the next tier. Watch the audit log.
  4. When the override list covers every action, flip the tenant default to enforce and clear the overrides.

The per-action override uses the same flip gate (AAL2 + reason) and lands in the same audit table.

Auditable rollout

Every mode change — tenant default or per-action — writes a row to mode_changes:

  • changed_by_user_id
  • changed_at
  • previous_mode / new_mode
  • scope (tenant or action_id)
  • reason (the 10+ char free-text field)
  • aal_at_change (must be aal2)

The audit bundle export includes these rows alongside the per-call decisions, so an auditor can ask "what was enforced on day X for tenant Y?" and get a hash-verified answer without trusting either party.

The summary
Observe → enforce on MCP Guard is a deliberate, audited transition. The call site never changes. The mode lives on the server. The flip requires re-authentication and a written reason. Every event is in a verifiable log.
Ready to drop this in? Free up to 10k evaluations / month — no card.