ai / claude
Britt and I use Anthropic's Claude 4 models via API to power our internal software tool at work. Here is what we learned about structured outputs, system vs. user prompts, and hallucination guardrails.
Thin client
We wrote a thin API client over the Messages API using the HTTP gem instead of depending on an Anthropic SDK.
Three methods cover what we need:
chat— single-turn, no toolschat_with_web_search— adds the web search toolresearch— extended thinking + web search
Every method requires a json_schema and returns [text, err].
Model selection
We use three models:
- Haiku (
claude-haiku-4-5) — fast and cheap. Good for short summarization, scoring, and classification. Most of our jobs use Haiku. - Sonnet (
claude-sonnet-4-5) — mid-tier. We use it when the input is messy (e.g. raw email threads) or the task needs more nuance than Haiku can manage. - Opus (
claude-opus-4-6) — expensive, slow, smart. Reserved for deep research with extended thinking and web search.
We use model aliases (claude-sonnet-4-5) instead of dated snapshots
so the API resolves aliases to the latest version automatically.
Always use structured outputs
The most important thing we did was require
structured outputs
via json_schema in output_config on every API call.
Without it, Claude returns conversational preamble ("Okay, I'll help you with that...") or wraps text in markdown fences. Prompting it away is fragile.
With a schema, the API enforces the format at the protocol level.
No preamble, no post-processing, no strip_surrounding_double_quotes.
Define a JSON_SCHEMA constant in each job:
JSON_SCHEMA = {
type: "object",
properties: {
headline: {
type: "string",
description: "A Y Combinator-style company headline, 80 characters or less."
}
},
required: ["headline"],
additionalProperties: false
}.freeze
Then parse the response:
response, err = client.chat(
model: MODEL_HAIKU,
json_schema: JSON_SCHEMA,
user_prompt: prompt
)
headline = JSON.parse(response).fetch("headline")
Every API call in the codebase now requires a schema.
System prompt vs. user prompt
After some experimentation, we settled on a rule:
Reserve system_prompt for separating instructions from untrusted data.
When you pass scraped websites, user-generated notes,
or raw email threads in user_prompt,
the model can confuse data for instructions.
A system prompt carries higher authority
and keeps your instructions safe from prompt injection.
If the prompt is self-contained — you control all the data —
put everything in user_prompt and skip system_prompt entirely.
We don't use "You are a..." persona lines.
Detailed instructions already constrain output.
Reduce hallucinations
Claude sometimes fabricates details when provided data is thin or supplements with training-data knowledge. We applied Anthropic's hallucination minimization guidelines across all prompts:
Restrict to provided data. Add "base your response only on the information provided" when the prompt passes structured context that should not be supplemented with outside knowledge.
Allow expressing uncertainty. Instruct Claude to output "Insufficient data" when context is too thin. Handle that sentinel string before writing to the database — filter out insufficient sections, NULL the column, or skip the record.
Require citations.
For research tasks, require inline [n] citations that map to sources.
Instruct Claude to omit claims it cannot cite
rather than stating them without attribution.
Format data as JSON.
We switched from Ruby's .inspect (hash syntax)
to .to_json when passing data in prompts.
Clean input reduces misinterpretation by the model.
Context window math
The client computes a max input size per model:
(context_window - max_output_tokens - buffer) * 4 chars/token
200k context window, minus 64k max output for Haiku/Sonnet (128k for Opus), minus a 5k buffer for the system prompt. User prompts are truncated to this limit before sending.