Skip to content

Checks

A check inspects a candidate value and returns true to pass, or a reason (a string, or an object with a message) to fail. All three builders below produce the same Check<T> shape, so they mix freely in the checks array and all drive the same retry loop.

Validate against any Standard Schema — Zod, Valibot, ArkType. No schema library is bundled; passmuster only reads the standard interface. Failures are reported as schema issues, with paths.

import { schemaCheck } from "passmuster";
import { z } from "zod";
schemaCheck(z.object({ title: z.string(), tags: z.array(z.string()) }));
// fail → "title: Expected string, received number; tags: Required"

Any predicate over the value — sync or async. Return true, or a reason to fail.

import { check } from "passmuster";
check("under-budget", (text) => text.length <= 2000 || `too long: ${text.length} chars`);
check("valid-json", (s) => { try { JSON.parse(s); return true; } catch { return "not valid JSON"; } });
check("grounded", async (answer) => (await isGrounded(answer)) || "not grounded in sources");

An LLM-as-judge check — grade the output with a model. You bring the complete function, so passmuster stays model-agnostic.

import { judge } from "passmuster";
judge("answers-question", {
ask: (out) => `Does this fully answer "${question}"? Reply PASS or FAIL: reason.\n\n${out}`,
complete: (prompt) => anthropic.messages.create(...).then(toText),
});

By default the verdict is read as PASS when the response contains PASS (and not FAIL); anything else fails with the judge’s text as the reason. Override with interpret for custom scoring:

judge("quality", {
ask: (out) => `Rate 0–1 how well this answers the question:\n${out}`,
complete,
interpret: (resp) => (Number(resp.match(/[\d.]+/)?.[0]) >= 0.8 ? true : `too low: ${resp}`),
});

Order doesn’t matter for correctness, but by default all checks run each attempt so the feedback is complete:

checks: [
schemaCheck(Plan), // structural
check("no-todos", noTodos), // cheap predicate
judge("actionable", actionable), // expensive, model-graded
]

Set stopOnFirstFailure: true to short-circuit — useful when a later check is expensive (an LLM judge) and pointless if an earlier one already failed.