Skip to content
Back to work

Cost-aware agent runtime · Python

nightshift

Sending every agent subtask to a frontier API burns budget on work a small local model could do.

View source

What I built

A Python agent runtime that routes extraction / retrieval / eval tasks to local models (T5-small, a MiniLM cross-encoder, ChromaDB) behind a confidence gate, with multi-provider dispatch (OpenAI / Anthropic / Google / DeepSeek) and per-call token-cost budgeting.

Stack

PythonChromaDBTransformershttpx

Status

Runtime built; 138 passing tests.