Checks
A check inspects a candidate value and returns true to pass, or a reason
(a string, or an object with a message) to fail. All three builders below
produce the same Check<T> shape, so they mix freely in the checks array and
all drive the same retry loop.
schemaCheck(schema, name?)
Section titled “schemaCheck(schema, name?)”Validate against any Standard Schema — Zod, Valibot, ArkType. No schema library is bundled; passmuster only reads the standard interface. Failures are reported as schema issues, with paths.
import { schemaCheck } from "passmuster";import { z } from "zod";
schemaCheck(z.object({ title: z.string(), tags: z.array(z.string()) }));// fail → "title: Expected string, received number; tags: Required"check(name, fn)
Section titled “check(name, fn)”Any predicate over the value — sync or async. Return true, or a reason to fail.
import { check } from "passmuster";
check("under-budget", (text) => text.length <= 2000 || `too long: ${text.length} chars`);check("valid-json", (s) => { try { JSON.parse(s); return true; } catch { return "not valid JSON"; } });check("grounded", async (answer) => (await isGrounded(answer)) || "not grounded in sources");judge(name, options)
Section titled “judge(name, options)”An LLM-as-judge check — grade the output with a model. You bring the complete
function, so passmuster stays model-agnostic.
import { judge } from "passmuster";
judge("answers-question", { ask: (out) => `Does this fully answer "${question}"? Reply PASS or FAIL: reason.\n\n${out}`, complete: (prompt) => anthropic.messages.create(...).then(toText),});By default the verdict is read as PASS when the response contains PASS
(and not FAIL); anything else fails with the judge’s text as the reason.
Override with interpret for custom scoring:
judge("quality", { ask: (out) => `Rate 0–1 how well this answers the question:\n${out}`, complete, interpret: (resp) => (Number(resp.match(/[\d.]+/)?.[0]) >= 0.8 ? true : `too low: ${resp}`),});Composing
Section titled “Composing”Order doesn’t matter for correctness, but by default all checks run each attempt so the feedback is complete:
checks: [ schemaCheck(Plan), // structural check("no-todos", noTodos), // cheap predicate judge("actionable", actionable), // expensive, model-graded]Set stopOnFirstFailure: true to short-circuit — useful when a later check is
expensive (an LLM judge) and pointless if an earlier one already failed.