Most AI editors force a trade-off: either cloud speed and quality, or local privacy and control. CodeGrex does both. Its routing engine evaluates request type, latency targets, privacy mode, and model availability before selecting where each completion or chat request should run.
1) Local-first when available
If your selected Ollama model is running, short completions can execute locally for ultra-low latency. This is ideal for inline suggestions, repetitive boilerplate, and disconnected environments.
2) Cloud fallback for heavier reasoning
When a task needs longer context windows or stronger reasoning, CodeGrex can escalate to your cloud provider automatically. You can define exactly when this happens from routing settings.
3) Privacy-aware routing policies
Routing rules respect runtime privacy mode. In Offline mode, cloud routes are disabled. In Sensitive mode, completions can remain local while chat selectively uses approved providers.
4) Resilient failover
If a model is unavailable, CodeGrex retries with exponential backoff and switches to a fallback route. Sessions continue instead of failing hard.
5) Performance budgets built in
You can tune first-token latency targets and prefer local completions to keep your typing loop fast. The router optimizes for responsiveness first, then model cost.
Recommended setup
- Enable `Prefer Local Completions`.
- Set runtime mode to `Hybrid`.
- Keep one fast local model and one high-capability cloud model configured.