LangGraph · Guide

Human-in-the-loop for LangGraph: a production pattern

LangGraph ships interrupt — a primitive that pauses graph execution mid-tool-call so a human can approve, edit, or reject. That is the easy part. The review queue, the reviewer UI, the audit log, the observe-mode ramp, and the approve-with-amendments path are the hard part. This guide is how we ship them.

What interrupt gives you, and what it doesn't

interrupt is a LangGraph primitive that lets a node yield a value and pause execution; you resume the graph by calling Command(resume=...) with the human response. It is a clean, minimal building block — exactly what you want from a graph framework.

The pieces it does not solve for you:

  • The queue. Where does the pending review live? Who can claim it? What stops two reviewers from approving the same call?
  • The reviewer UI. The model-output blob is not a review surface. Reviewers need diffs of params, the rule that fired, suggested amendments, and a one-click resolve.
  • The audit trail. Who approved, when, with what reason. Hash-chained so an auditor can verify the log was not edited after the fact.
  • The observe-mode ramp. You cannot launch a guardrail that has never seen real traffic. You need a mode where the policy runs but never blocks — so you can tune rules from real call patterns before you flip to enforce.
  • The resume contract. If the reviewer amends the params, who validates the amended payload? What if the agent has already moved on?
The build-vs-buy line
The first three look like a weekend. They are about three months of work to ship with reasonable SLAs and a clean audit story. The last two are the ones that bite you six months in.

The MCP Guard pattern

The pattern is: wrap the tool node, not the graph. Your LangGraph stays a LangGraph. The guard.enforce() call lives inside the tool node, before the side effect. If the policy says allow, the tool runs. If it says review, MCP Guard creates a review row server-side and the SDK throws MCPGuardReviewError with the review id. You let that bubble — or, more usefully, you catch it and call guard.waitForReview(id, ...), which long-polls until the reviewer resolves the row.

The graph control flow stays in LangGraph. The queue/UI/audit/rollout machinery lives in MCP Guard. You write one wrapped tool node; you get the review queue at /dashboard/reviews for free.

Working example

A LangGraph tool node that processes refunds. The handler runs the gate, dispatches to Stripe on allow, and on review it blocks the node until a human resolves the queue row.

refund_node.tsts
import { MCPGuard, MCPGuardReviewError, MCPGuardDeniedError } from 'mcpguard-sdk'
import { stripe } from './stripe'

const guard = new MCPGuard({ apiKey: process.env.MCPGUARD_API_KEY! })

// A LangGraph node. State carries { invoice_id, amount_cents, user }.
export async function refundNode(state: GraphState) {
  const { invoice_id, amount_cents, user } = state

  try {
    await guard.enforce({
      action_id: 'billing.refund',
      params: { invoice_id, amount_cents },
      user_context: { user_id: user.id, plan: user.plan },
    })
  } catch (err) {
    if (err instanceof MCPGuardReviewError) {
      // Long-poll until a human approves / rejects in the queue.
      const verdict = await guard.waitForReview(
        err.review_id,
        { timeoutMs: 10 * 60_000 },
      );
      if (verdict.status !== 'approved') {
        return { refund: null, reason: verdict.status }
      }
      // Re-eval with any amended params the reviewer set.
      const finalParams = verdict.resolution?.amended_params ?? { invoice_id, amount_cents }
      const refund = await stripe.refunds.create(finalParams)
      return { refund }
    }
    if (err instanceof MCPGuardDeniedError) {
      return { refund: null, reason: 'policy_denied', rule: err.matched_rule }
    }
    throw err
  }

  // Allow path.
  const refund = await stripe.refunds.create({
    charge: invoice_id, amount: amount_cents,
  })
  return { refund }
}
On LangGraph specifics
refundNode above is shown as a plain async function for clarity; in your StateGraph you would register it with .addNode('refund', refundNode). The MCP Guard surface (enforce, waitForReview, the two error classes) is verbatim from packages/sdk/src/index.ts — copy it as-is.

The observe → enforce rollout

Do not flip enforce on day one. The recommended path is:

  1. Deploy the wrapped node in observe mode. Every call is evaluated and logged; nothing is ever blocked. The policy can have bugs; that's the point.
  2. Watch /dashboard/audit for a few days. Tune rules from real call shapes — not the synthetic ones you wrote against your test fixtures.
  3. Promote per action, not per tenant. The dashboard's mode override lets you flip billing.refund to enforce while every other action stays observe.
  4. When you flip the whole tenant to enforce, the dashboard requires AAL2 (TOTP step-up) plus a 10-character reason. Every flip is in mode_changes; your auditor can prove who flipped what when.

Full pattern walkthrough: From observe to enforce.

Audit & resume contract

Every enforce() call produces an audit row, hash-chained against the prior row. The chain is per-tenant and per-environment; an auditor can export the bundle with POST /v1/audit/bundle and replay the SHA verifier to prove no row was edited after the fact.

When a reviewer approves with amended params, the new params are stored on the review row (resolution.amended_params) and the audit row records the diff. The agent is responsible for using the amended params on resume — see the example: the handler pulls verdict.resolution?.amended_params and falls back to the original on no-edit.

Caveats

  • Long waits. waitForReview defaults to a 5-minute deadline. For human SLAs that span hours, stash the review id and resume on a background worker.
  • Idempotency. Pass idempotency_key on the enforce call so retries from a flaky LangGraph re-execute do not create duplicate review rows.
  • Streaming. If your graph streams tokens back to the user, throw the review error early — before any tokens that imply the tool ran — so the UX is honest about the pause.
Ready to drop this in? Free up to 10k evaluations / month — no card.