Retry with feedback
The point of passmuster isn’t just to reject bad output — it’s to help the
model produce good output. When an attempt fails, every failure is collected and
handed to the next generate call as feedback.
The flow
Section titled “The flow”generate: async ({ attempt, feedback }) => { const prompt = feedback ? `${basePrompt}\n\n${feedback.text}` // splice the failures back in : basePrompt; // first attempt: no feedback return parse(await model.complete(prompt));};- On attempt 1,
feedbackisundefined. - On every later attempt,
feedbackdescribes what failed last time.
What feedback contains
Section titled “What feedback contains”interface Feedback { failures: { check: string; message: string }[]; text: string; // pre-formatted, ready to splice into a prompt}feedback.text reads like:
The previous output failed these checks:- [no-todos] remove TODO placeholders- [actionable] FAIL: step 2 is vagueFix every issue above and produce a corrected output.Use feedback.text for the quick path, or build your own message from
feedback.failures if you want full control over how corrections are phrased.
Tuning the loop
Section titled “Tuning the loop”maxAttempts(default3) — the ceiling on tries. The loop stops early the moment an attempt passes.stopOnFirstFailure— only report the first failing check each attempt. Cheaper, but the model gets narrower feedback.onAttempt— observe each attempt as it happens (logging, metrics).throwOnFail— throwPassMusterError(with the fullattemptstrail) instead of returning{ ok: false }.
const { ok, value, attempts } = await passMuster({ generate, checks, maxAttempts: 4, onAttempt: (a) => logger.info(`attempt ${a.attempt}: ${a.passed ? "pass" : a.failures.length + " failed"}`),});A note on cost
Section titled “A note on cost”Each retry is another model call (plus any judge calls). Keep maxAttempts
modest, put cheap checks first, and reach for stopOnFirstFailure when a later
LLM-judge check is pointless once a structural check has already failed.