Skip to content

CLI

Terminal window
heyllm [options] "your prompt"
command | heyllm [options] "instruction"

Give heyllm a prompt as a positional argument, optionally pipe context into it, and it streams the model’s answer to stdout. Multiple positional words are joined with spaces, so quoting the prompt is optional but recommended.

FlagDescription
-m, --model <name>Model to use. Default: gpt-4o-mini.
--system <text>System prompt, prepended as a system message.
--base-url <url>API base URL. Default: https://api.openai.com/v1.
--api-key <key>API key. Defaults to $OPENAI_API_KEY.
--no-streamWait for the full response instead of streaming tokens.
--jsonPrint the raw JSON response instead of just the text.
-h, --helpShow help and exit.
-v, --versionShow the version and exit.

Unknown flags are rejected: heyllm exits 2 with an error rather than passing them through.

heyllm builds a single chat request from up to three pieces:

  • System prompt (--system) — added as a system message when present.
  • Prompt (positional argument) — the instruction.
  • Stdin — read in full when input is piped (i.e. stdin is not a TTY).

The prompt and stdin are merged into one user message, prompt first, with a blank line between them. So this:

Terminal window
git diff | heyllm "write a conventional commit message"

sends a user message of roughly:

write a conventional commit message
<the diff>

You can supply either piece alone:

  • Prompt onlyheyllm "explain monads" (nothing piped).
  • Stdin onlycat notes.md | heyllm (no positional prompt); stdin becomes the user message on its own.

If you provide neither a prompt nor piped input, heyllm exits 2 with a usage error.

By default heyllm streams: it requests a streaming completion and prints each token as it arrives, parsed from the API’s server-sent events (data: lines, ignoring the terminating [DONE] sentinel). Frames without a content delta or with malformed JSON are skipped rather than crashing the stream.

Pass --no-stream to send a single non-streaming request and print the full response once it’s complete — useful when a provider doesn’t support streaming, or when you’re capturing output into a variable and don’t need progressive display:

Terminal window
heyllm --no-stream "what is 2+2"

--json prints the raw, pretty-printed JSON response from the API instead of just the assistant text. It implies a non-streaming request, since it needs the complete response object to print it. Pipe it into a tool like jq to pull out fields:

Terminal window
heyllm --json "ping" | jq '.usage'
{
"prompt_tokens": 9,
"completion_tokens": 1,
"total_tokens": 10
}
CodeMeaning
0Success.
1API, HTTP, or network error (e.g. a non-2xx response from the provider).
2Usage or configuration error (bad flags, missing API key, no prompt).

On an HTTP error, heyllm includes the status code and any response body in the message it writes to stderr, so you can see why the provider rejected the call.

Stream an answer:

Terminal window
heyllm "explain monads simply"

Pipe context in — the prompt is the instruction, stdin is the context:

Terminal window
git diff | heyllm "write a conventional commit message"

Pick a model and add a system prompt:

Terminal window
heyllm -m gpt-4o --system "You are terse" "summarize REST in 3 bullets"

Talk to a local Ollama (no key needed):

Terminal window
heyllm --base-url http://localhost:11434/v1 -m llama3 "hello"

Get the raw API response for scripting:

Terminal window
heyllm --json "ping" | jq '.usage'

Disable streaming — one request, the full response at once:

Terminal window
heyllm --no-stream "what is 2+2"

Print the version or the help text:

Terminal window
heyllm --version
heyllm --help