Costs & Budgets

Nitejar tracks the cost of every inference call across every agent. Token counts, model pricing, and estimated spend are recorded automatically. You do not need an external billing system -- the app gives you a complete ledger.

In the broader app model:

Command Center shows cost pressure as an attention signal.
Costs is the ledger and source of truth.

How cost tracking works

Every time an agent makes an inference call, Nitejar records:

Prompt tokens and completion tokens for the call
Estimated cost, computed from the model's per-token pricing in the model catalog
Which agent made the call and which model was used

Costs are computed at call time, not after the fact. The model catalog stores pricing per model, and the runtime multiplies token counts by those rates. If the catalog does not have pricing for a model, cost is recorded as zero.

The ledger is the source of truth. Every dollar is traceable to a specific run, a specific inference call, and a specific agent.

The cost ledger

The cost ledger lives at Costs. It shows:

Total spend across all agents and models
Per-agent breakdown with trend charts
Per-model breakdown
Daily cost trend

Each entry in the ledger links back to the work item and run that incurred the cost. You can drill from the top-level summary down to individual inference calls.

Where to verify

Open Costs to see the full ledger. The summary cards show total spend, and the tables below break it down by agent and model.

Per-agent budget limits

Budget limits prevent runaway spend. You configure them per-agent from the agent's settings page.

Period -- hourly, daily, or monthly
Limit (USD) -- the maximum dollar amount the agent can spend per period
Soft threshold -- a percentage of the limit that triggers a warning
Hard threshold -- a percentage of the limit at which new runs are blocked

When an agent hits its hard budget threshold, new runs are refused until the period resets or you raise the limit. The agent is not killed mid-run -- it finishes its current work, but will not start new work until the budget clears.

Budget limits can also be set at the org level from Costs, applying a global ceiling across all agents.

Where to verify

Open Agents > [agent] and scroll to the Inference Costs section. Budget limits are displayed alongside current spend for the active period.

Model cost comparison

Different models have wildly different price points. Choosing the right model per agent is the single biggest lever you have on cost.

Model	Approximate cost	Notes
`arcee-ai/trinity-large-preview:free`	$0	Default. Free via OpenRouter. Good enough for most tasks.
Mid-tier models (Llama, Mistral)	$0.10 -- $0.50 / 1M tokens	Reasonable for higher-quality output without breaking the bank.
Frontier models (GPT-4o, Claude)	$2 -- $15 / 1M tokens	Best quality, but costs add up fast with high-volume agents.

The model catalog in Settings > Gateway shows per-token pricing for every model available to your instance.

Model call receipts

Every inference call is recorded as a receipt. Each receipt includes:

Model used
Prompt tokens
Completion tokens
Estimated cost
Latency
Tool call flag

Receipts are visible in two places: the work item timeline and the cost ledger.

Where to verify

Open Activity > [work item] to see the timeline of inference calls for a specific run. The same data rolls up into Costs at the aggregate level.

Cost at the run level

Each work item tracks its own total cost. When you open a work item's detail page, the cost is shown alongside the run timeline, token counts, and tool calls.

The Command Center surface summarizes cost pressure across the fleet, giving you a quick signal when spend is part of the current attention picture. Use it for posture. Use Costs for the actual ledger.

Costs & Budgets

How cost tracking works

The cost ledger

Per-agent budget limits

Model cost comparison

Model call receipts

Cost at the run level

On this page