Human-in-the-loop for LangGraph: a production pattern
LangGraph ships interrupt — a primitive that pauses graph execution mid-tool-call so a human can approve, edit, or reject. That is the easy part. The review queue, the reviewer UI, the audit log, the observe-mode ramp, and the approve-with-amendments path are the hard part. This guide is how we ship them.
What interrupt gives you, and what it doesn't
interrupt is a LangGraph primitive that lets a node yield a value and pause execution; you resume the graph by calling Command(resume=...) with the human response. It is a clean, minimal building block — exactly what you want from a graph framework.
The pieces it does not solve for you:
- The queue. Where does the pending review live? Who can claim it? What stops two reviewers from approving the same call?
- The reviewer UI. The model-output blob is not a review surface. Reviewers need diffs of params, the rule that fired, suggested amendments, and a one-click resolve.
- The audit trail. Who approved, when, with what reason. Hash-chained so an auditor can verify the log was not edited after the fact.
- The observe-mode ramp. You cannot launch a guardrail that has never seen real traffic. You need a mode where the policy runs but never blocks — so you can tune rules from real call patterns before you flip to enforce.
- The resume contract. If the reviewer amends the params, who validates the amended payload? What if the agent has already moved on?
The MCP Guard pattern
The pattern is: wrap the tool node, not the graph. Your LangGraph stays a LangGraph. The guard.enforce() call lives inside the tool node, before the side effect. If the policy says allow, the tool runs. If it says review, MCP Guard creates a review row server-side and the SDK throws MCPGuardReviewError with the review id. You let that bubble — or, more usefully, you catch it and call guard.waitForReview(id, ...), which long-polls until the reviewer resolves the row.
The graph control flow stays in LangGraph. The queue/UI/audit/rollout machinery lives in MCP Guard. You write one wrapped tool node; you get the review queue at /dashboard/reviews for free.
Working example
A LangGraph tool node that processes refunds. The handler runs the gate, dispatches to Stripe on allow, and on review it blocks the node until a human resolves the queue row.
import { MCPGuard, MCPGuardReviewError, MCPGuardDeniedError } from 'mcpguard-sdk'
import { stripe } from './stripe'
const guard = new MCPGuard({ apiKey: process.env.MCPGUARD_API_KEY! })
// A LangGraph node. State carries { invoice_id, amount_cents, user }.
export async function refundNode(state: GraphState) {
const { invoice_id, amount_cents, user } = state
try {
await guard.enforce({
action_id: 'billing.refund',
params: { invoice_id, amount_cents },
user_context: { user_id: user.id, plan: user.plan },
})
} catch (err) {
if (err instanceof MCPGuardReviewError) {
// Long-poll until a human approves / rejects in the queue.
const verdict = await guard.waitForReview(
err.review_id,
{ timeoutMs: 10 * 60_000 },
);
if (verdict.status !== 'approved') {
return { refund: null, reason: verdict.status }
}
// Re-eval with any amended params the reviewer set.
const finalParams = verdict.resolution?.amended_params ?? { invoice_id, amount_cents }
const refund = await stripe.refunds.create(finalParams)
return { refund }
}
if (err instanceof MCPGuardDeniedError) {
return { refund: null, reason: 'policy_denied', rule: err.matched_rule }
}
throw err
}
// Allow path.
const refund = await stripe.refunds.create({
charge: invoice_id, amount: amount_cents,
})
return { refund }
}refundNode above is shown as a plain async function for clarity; in your StateGraph you would register it with .addNode('refund', refundNode). The MCP Guard surface (enforce, waitForReview, the two error classes) is verbatim from packages/sdk/src/index.ts — copy it as-is.The observe → enforce rollout
Do not flip enforce on day one. The recommended path is:
- Deploy the wrapped node in observe mode. Every call is evaluated and logged; nothing is ever blocked. The policy can have bugs; that's the point.
- Watch
/dashboard/auditfor a few days. Tune rules from real call shapes — not the synthetic ones you wrote against your test fixtures. - Promote per action, not per tenant. The dashboard's mode override lets you flip
billing.refundto enforce while every other action stays observe. - When you flip the whole tenant to enforce, the dashboard requires AAL2 (TOTP step-up) plus a 10-character reason. Every flip is in
mode_changes; your auditor can prove who flipped what when.
Full pattern walkthrough: From observe to enforce.
Audit & resume contract
Every enforce() call produces an audit row, hash-chained against the prior row. The chain is per-tenant and per-environment; an auditor can export the bundle with POST /v1/audit/bundle and replay the SHA verifier to prove no row was edited after the fact.
When a reviewer approves with amended params, the new params are stored on the review row (resolution.amended_params) and the audit row records the diff. The agent is responsible for using the amended params on resume — see the example: the handler pulls verdict.resolution?.amended_params and falls back to the original on no-edit.
Caveats
- Long waits.
waitForReviewdefaults to a 5-minute deadline. For human SLAs that span hours, stash the review id and resume on a background worker. - Idempotency. Pass
idempotency_keyon theenforcecall so retries from a flaky LangGraph re-execute do not create duplicate review rows. - Streaming. If your graph streams tokens back to the user, throw the review error early — before any tokens that imply the tool ran — so the UX is honest about the pause.