Run models locally.
CodeGrex doesn't bundle a model in the installer. Instead, run any local model on your own hardware in one of two ways — your code never leaves your machine.
Set up Ollama
Ollama runs models locally and exposes a simple API that CodeGrex talks to automatically. It's the fastest way to get going.
Install Ollama
Download the installer for your OS and run it. Ollama starts a local server at http://localhost:11434.
Pull a model that fits your machine
Pick based on your available RAM/VRAM. Bigger models are higher quality but slower.
ollama pull qwen2.5-coder:1.5b
Lightweight — completions on modest laptops
ollama pull qwen2.5-coder:7b
Balanced — recommended default for chat + edits
ollama pull deepseek-coder:6.7b
Strong code reasoning
ollama pull qwen2.5-coder:14b
Higher quality — needs a capable machine
Browse the full catalog at ollama.com/library .
Use it in CodeGrex
CodeGrex auto-detects Ollama at the default URL — just select your local model from the model picker. To pull or switch models without leaving the editor, open the Command Palette and run CodeGrex: Manage Local Models.
To set a default explicitly, configure it in Settings:
"codegrex.ollama.baseUrl": "http://localhost:11434", "codegrex.ollama.model": "qwen2.5-coder:7b"
Use any OpenAI-compatible endpoint
Already running a local server that speaks the OpenAI API? Point CodeGrex at it by setting codegrex.openai.baseUrl. CodeGrex automatically treats localhost endpoints as local providers.
| Server | Base URL | Notes |
|---|---|---|
| LM Studio | http://localhost:1234/v1 | Friendly desktop GUI — easiest non-Ollama option |
| llama.cpp (llama-server) | http://localhost:8080/v1 | Lightweight, runs GGUF models |
| vLLM | http://localhost:8000/v1 | High-throughput GPU serving |
| Jan | http://localhost:1337/v1 | Open-source desktop app with a local API server |
| LocalAI | http://localhost:8080/v1 | Self-hosted drop-in OpenAI replacement |
Example: LM Studio
- Open LM Studio, download a model, and start its local server.
- In CodeGrex Settings, set the base URL below.
- Select the model in the CodeGrex model picker.
"codegrex.openai.baseUrl": "http://localhost:1234/v1"
Settings reference
codegrex.ollama.baseUrldefault: http://localhost:11434Where your Ollama server is reachable.
codegrex.ollama.modeldefault: llama3.2Default Ollama model tag to use (e.g. qwen2.5-coder:7b).
codegrex.openai.baseUrldefault: https://api.openai.com/v1Point this at any OpenAI-compatible server to use it as a provider.
Fully private by design
- Prompts and code stay on your machine — local models never send data to CodeGrex or any third party.
- Great for air-gapped environments and sensitive codebases.
- Mix and match: use local models for sensitive work and cloud models for heavier tasks.
Need cloud models too?