/local LLMs
Why InferHaven runs your AI offline by default
Using a cloud API sends your repo to someone else. InferHaven flips that default — your code stays local, and the network is an opt-in, not an opt-out.
When you point Cursor, Claude Code, or OpenAI Codex at the usual cloud provider APIs, your repo’s source has the potential to be incorporated into an external training corpus, and the resulting model could produce answers derived from your source. The shape of the function you just wrote, the variable names that leaked your domain, the comment with the bug ID — all of it passes through somebody else’s API. As models become increasingly capable, the providers will inevitably put even more capable filtering, censoring, and guardrails into place.
You accept the tradeoffs quietly because the model is good and the prompt-to-completion loop is fast. Sometimes this is an acceptable tradeoff, but in certain workflows should not be tolerated. InferHaven flips that default. Out of the box, the stack is offline. Inference runs against models on your hardware. The network is an opt-in, not an opt-out.
One command to a fully local stack
$ docker compose up -d
[+] Running 4/4
✓ ollama Started
✓ workspace Started
✓ code-server Started
✓ caddy Started
$ haven pull qwen2.5-coder:14b
pulling manifest... downloading 8.5GB ✓
That’s the whole setup. Four services, one network, caddy as a secure gateway for all requests, ssh secured via your key.
The compromise the cloud forces you into
Don’t get me wrong, coding assistants are great products. They are also a tradeoff many teams accept without weighing thouroughly.
Default: cloud
- Your code → vendor API
- Telemetry on by default
- Per-token billing
- Model choice locked to vendor catalog
- Outages take your tooling down
Default: local
- Your code stays on disk
- Zero telemetry, ever
- Flat hardware cost
- Any GGUF or safetensors model
- Offline-capable; no vendor dependency
We don’t just sell privacy. We ship privacy by default.
— An InferHaven essential
When you want the cloud, you still get it
Offline-by-default is not offline-only. The same workspace lets you configure your ANTHROPIC_API_KEY. OPENAI_API_KEY, GEMINI_API_KEY, OPENROUTER_API_KEY, and the bundled harnesses (Claude Code, OpenCode, Aider, QwenCode, Amp, Gemini, pi, Goose, Continue, avante.nvim) discover them automatically. You pick which calls go where, per-task, per-harness. Avoid cross project pollution by default while using many differnet models and providers no problem.
What’s next
This blog will be the running log of how InferHaven is built, what’s shipping, and the design decisions we made along the way. Float your boat up to the dock, clone the repo, run docker compose up -d, and let the lighthouse lead you to your haven.
— Ethan L.