Nitejar Docs
Use Nitejar

Costs & Budgets

Cost tracking, budget limits, and the ledger.

Nitejar tracks the cost of every inference call across every agent. Token counts, model pricing, and estimated spend are recorded automatically. You do not need an external billing system -- the app gives you a complete ledger.

In the broader app model:

  • Command Center shows cost pressure as an attention signal.
  • Costs is the ledger and source of truth.

How cost tracking works

Every time an agent makes an inference call, Nitejar records:

  • Prompt tokens and completion tokens for the call
  • Estimated cost, computed from the model's per-token pricing in the model catalog
  • Which agent made the call and which model was used

Costs are computed at call time, not after the fact. The model catalog stores pricing per model, and the runtime multiplies token counts by those rates. If the catalog does not have pricing for a model, cost is recorded as zero.

The ledger is the source of truth. Every dollar is traceable to a specific run, a specific inference call, and a specific agent.

The cost ledger

The cost ledger lives at Costs. It shows:

  • Total spend across all agents and models
  • Per-agent breakdown with trend charts
  • Per-model breakdown
  • Daily cost trend

Each entry in the ledger links back to the work item and run that incurred the cost. You can drill from the top-level summary down to individual inference calls.

Where to verify

Open Costs to see the full ledger. The summary cards show total spend, and the tables below break it down by agent and model.

Per-agent budget limits

Budget limits prevent runaway spend. You configure them per-agent from the agent's settings page.

  • Period -- hourly, daily, or monthly
  • Limit (USD) -- the maximum dollar amount the agent can spend per period
  • Soft threshold -- a percentage of the limit that triggers a warning
  • Hard threshold -- a percentage of the limit at which new runs are blocked

When an agent hits its hard budget threshold, new runs are refused until the period resets or you raise the limit. The agent is not killed mid-run -- it finishes its current work, but will not start new work until the budget clears.

Budget limits can also be set at the org level from Costs, applying a global ceiling across all agents.

Where to verify

Open Agents > [agent] and scroll to the Inference Costs section. Budget limits are displayed alongside current spend for the active period.

Model cost comparison

Different models have wildly different price points. Choosing the right model per agent is the single biggest lever you have on cost.

ModelApproximate costNotes
arcee-ai/trinity-large-preview:free$0Default. Free via OpenRouter. Good enough for most tasks.
Mid-tier models (Llama, Mistral)$0.10 -- $0.50 / 1M tokensReasonable for higher-quality output without breaking the bank.
Frontier models (GPT-4o, Claude)$2 -- $15 / 1M tokensBest quality, but costs add up fast with high-volume agents.

The model catalog in Settings > Gateway shows per-token pricing for every model available to your instance.

Model call receipts

Every inference call is recorded as a receipt. Each receipt includes:

  • Model used
  • Prompt tokens
  • Completion tokens
  • Estimated cost
  • Latency
  • Tool call flag

Receipts are visible in two places: the work item timeline and the cost ledger.

Where to verify

Open Activity > [work item] to see the timeline of inference calls for a specific run. The same data rolls up into Costs at the aggregate level.

Cost at the run level

Each work item tracks its own total cost. When you open a work item's detail page, the cost is shown alongside the run timeline, token counts, and tool calls.

The Command Center surface summarizes cost pressure across the fleet, giving you a quick signal when spend is part of the current attention picture. Use it for posture. Use Costs for the actual ledger.