One endpoint between your AI applications and any number of model backends — OpenAI, Anthropic, on-prem vLLM, Ollama, multimodal, speech, vision, embeddings. Routing, failover, prefix-cache-aware affinity, traffic-driven health checks. The operational machinery to run inference infrastructure in production.

Written in Go. Ships as a 7MB scratch container. Non-root, no shell, no libc. Configured via YAML, hot-reloaded via SIGHUP.

Three things make it different

Privacy as architecture

Hardened builds elide prompt-handling code paths at compile time. Capture is opt-in and bounded — never on by default.

Go-native performance

Several thousand concurrent requests on a 2-core machine. Memory bounds under sustained agentic load. Certified performance report per release.

Built to global banking privacy standards

Designed against the audit bar serious financial institutions are held to. Not retrofitted to it.

The name comes from the historical Japanese courier system — the hikyaku (飛脚) ran messages across the country, fast and reliably, without reading them. A courier delivers. A courier doesn’t make a copy for the archive on the way through.