Skip to content

Recording

The recording side of agentwatch is a small, pure library. You create a recorder, call a method per thing that happens, and out comes a stream of JSON-serializable events — optionally appended, one per line, to a JSONL file. There are no classes with behaviour, no network, and no dependency on any AI SDK. This page covers the whole write side.

An agent run is recorded as a flat, ordered stream of events. Every event is JSON-serializable and carries two common fields:

  • ts — epoch milliseconds when the event was recorded.
  • type — a discriminator naming the kind of event.

Because everything is plain data, a stream round-trips losslessly through JSONL: write it out, read it back, and you have the same events.

typeFieldsMeaning
messagerole, textA chat message produced or consumed by the agent. role is system, user, assistant, or tool.
modelmodel, prompt, durationMs?One model invocation (an LLM call / step). prompt is a short, human-readable summary of what was sent.
tool-callname, argsThe agent decided to call a tool. args is arbitrary JSON.
tool-resultname, result, durationMs?A tool finished and returned. result is arbitrary JSON.
usageinputTokens, outputTokens, model?Token usage reported for a model call.
errormessageSomething went wrong.

The union of all of these is the AgentEvent type. Here’s the same run from the example session, as raw JSONL:

{"type":"message","ts":1717200000000,"role":"user","text":"What's the weather in Tokyo and should I bring an umbrella?"}
{"type":"model","ts":1717200000100,"model":"gpt-4o","prompt":"system + user: weather question for Tokyo","durationMs":820}
{"type":"tool-call","ts":1717200000950,"name":"getWeather","args":{"city":"Tokyo","units":"metric"}}
{"type":"tool-result","ts":1717200001210,"name":"getWeather","result":{"tempC":18,"condition":"light rain"},"durationMs":260}
{"type":"usage","ts":1717200001220,"inputTokens":412,"outputTokens":86,"model":"gpt-4o"}

A few small helpers move events to and from JSONL. These are the same functions the CLI uses, and they’re exported for your own tooling:

import { encodeEvent, parseJsonl, isAgentEvent } from "agentwatch";
encodeEvent(event); // → one JSONL line (no trailing newline)
parseJsonl(fileContents); // → AgentEvent[]
isAgentEvent(value); // → structural type guard

parseJsonl is deliberately forgiving: it skips blank lines and silently ignores any line that doesn’t parse or doesn’t look like an event. That’s what makes --follow safe — a half-written tail line during a live append never crashes the reader.

createRecorder(options?) returns a Recorder. Each method records exactly one event: it builds the full event (stamping type and ts for you), pushes it onto an in-memory array, and — when out is set — appends one JSONL line to that file.

import { createRecorder } from "agentwatch";
const rec = createRecorder({ out: "session.jsonl" });
OptionTypeDefaultDescription
outstring(none)If set, every recorded event is appended as one JSONL line to this path. Omit it to record in memory only.
now() => numberDate.nowClock used to stamp ts. Injectable, mainly for deterministic tests.

Each method takes the event’s fields minus type and ts (those are filled in for you), records the event, and returns the fully-formed event object.

rec.message({ role: "user", text: "Find me a flight." });
rec.model({ model: "gpt-4o", prompt: "plan the search", durationMs: 740 });
rec.toolCall({ name: "searchFlights", args: { from: "ADD", to: "NRT" } });
rec.toolResult({ name: "searchFlights", result: { count: 12 }, durationMs: 310 });
rec.usage({ inputTokens: 540, outputTokens: 120, model: "gpt-4o" });
rec.error({ message: "rate limited" });
MethodRecordsReturns
message(input)a message eventMessageEvent
model(input)a model eventModelEvent
toolCall(input)a tool-call eventToolCallEvent
toolResult(input)a tool-result eventToolResultEvent
usage(input)a usage eventUsageEvent
error(input)an error eventErrorEvent
rec.events; // readonly AgentEvent[] — every event recorded so far, in order
rec.close(); // flush / finish

rec.events is the in-memory event array, reflecting the same stream that was written to out. rec.close() is part of the contract for symmetry; today it’s a no-op because writes are synchronous, but you should still call it when you’re done so your code keeps working if flushing ever becomes asynchronous.

Recording is pure and synchronous. The only side effect is an isolated, append-only write to out (via appendFileSync). There is no buffering to flush, no background thread, and no network — which is exactly why a partially written file is safe to tail with --follow.

recordStep(recorder, step) is the one-line bridge from the Vercel AI SDK to agentwatch. Pass it as the SDK’s onStepFinish callback and every step is recorded:

import { generateText } from "ai";
import { createRecorder, recordStep } from "agentwatch";
const rec = createRecorder({ out: "session.jsonl" });
await generateText({
model,
prompt,
tools,
maxSteps: 5,
onStepFinish: (step) => recordStep(rec, step),
});
rec.close();

For each step, recordStep emits events in order:

  1. Assistant text — a message event with role: "assistant", but only when the step has non-empty text.
  2. Tool calls — one tool-call event per entry in step.toolCalls, using the call’s toolName and args.
  3. Tool results — one tool-result event per entry in step.toolResults, using the result’s toolName and result.
  4. Usage — one usage event when the step reports usage.

The adapter deliberately does not import the ai package. It only models the shape of the step object the SDK hands to onStepFinish, so agentwatch carries no runtime dependency on ai and works across SDK versions. Every field on the modeled step is optional, so a partial or future-version step degrades gracefully rather than throwing.

It even smooths over a naming change between SDK versions: usage tokens are read from inputTokens/outputTokens if present, otherwise from the older promptTokens/completionTokens, defaulting to 0.

The modeled step looks like this:

interface AiSdkStep {
text?: string;
toolCalls?: { toolName: string; args?: unknown }[];
toolResults?: { toolName: string; result?: unknown }[];
usage?: {
promptTokens?: number;
completionTokens?: number;
inputTokens?: number;
outputTokens?: number;
};
model?: string; // some versions surface the resolved model id here
durationMs?: number;
}

costOf(usage) turns a usage event into an approximate USD figure:

import { costOf } from "agentwatch";
costOf({ inputTokens: 540, outputTokens: 120, model: "gpt-4o" }); // ≈ USD

It looks up the model in a small, built-in price table (USD per 1,000,000 tokens), computes input/1M × inputPrice + output/1M × outputPrice, and returns the total. Unknown models cost 0 — so a session’s totals never become NaN just because one model isn’t in the table.

PRICES is the exported, built-in snapshot. As of the last review (2026-05) it covers:

ModelInput ($/1M)Output ($/1M)
gpt-4o2.510
gpt-4o-mini0.150.6
gpt-4.128
gpt-4.1-mini0.41.6
o3-mini1.14.4
claude-3.5-sonnet315
claude-3.5-haiku0.84
claude-3-opus1575

priceOf(model) returns the ModelPrice for a model name (or undefined if it’s unknown). Matching is case-insensitive and tolerant of common prefixes and suffixes — it picks the longest known key the model name contains. So gpt-4o-2024-08-06 and anthropic/claude-3.5-sonnet both resolve correctly.

import { priceOf } from "agentwatch";
priceOf("gpt-4o-2024-08-06"); // → { input: 2.5, output: 10 }
priceOf("anthropic/claude-3.5-sonnet"); // → { input: 3, output: 15 }
priceOf("some-unknown-model"); // → undefined