Back to docs Models

Run models locally.

CodeGrex doesn't bundle a model in the installer. Instead, run any local model on your own hardware in one of two ways — your code never leaves your machine.

Option 1 · Recommended

Set up Ollama

Ollama runs models locally and exposes a simple API that CodeGrex talks to automatically. It's the fastest way to get going.

1

Install Ollama

Download the installer for your OS and run it. Ollama starts a local server at http://localhost:11434.

Download Ollama
2

Pull a model that fits your machine

Pick based on your available RAM/VRAM. Bigger models are higher quality but slower.

Qwen2.5-Coder 1.5B~4 GB RAM
ollama pull qwen2.5-coder:1.5b

Lightweight — completions on modest laptops

Qwen2.5-Coder 7B~8 GB RAM
ollama pull qwen2.5-coder:7b

Balanced — recommended default for chat + edits

DeepSeek Coder 6.7B~8 GB RAM
ollama pull deepseek-coder:6.7b

Strong code reasoning

Qwen2.5-Coder 14B16 GB+ RAM / GPU
ollama pull qwen2.5-coder:14b

Higher quality — needs a capable machine

Browse the full catalog at ollama.com/library .

3

Use it in CodeGrex

CodeGrex auto-detects Ollama at the default URL — just select your local model from the model picker. To pull or switch models without leaving the editor, open the Command Palette and run CodeGrex: Manage Local Models.

To set a default explicitly, configure it in Settings:

"codegrex.ollama.baseUrl": "http://localhost:11434",
"codegrex.ollama.model": "qwen2.5-coder:7b"
Option 2 · Advanced

Use any OpenAI-compatible endpoint

Already running a local server that speaks the OpenAI API? Point CodeGrex at it by setting codegrex.openai.baseUrl. CodeGrex automatically treats localhost endpoints as local providers.

ServerBase URLNotes
LM Studio http://localhost:1234/v1Friendly desktop GUI — easiest non-Ollama option
llama.cpp (llama-server) http://localhost:8080/v1Lightweight, runs GGUF models
vLLM http://localhost:8000/v1High-throughput GPU serving
Jan http://localhost:1337/v1Open-source desktop app with a local API server
LocalAI http://localhost:8080/v1Self-hosted drop-in OpenAI replacement

Example: LM Studio

  1. Open LM Studio, download a model, and start its local server.
  2. In CodeGrex Settings, set the base URL below.
  3. Select the model in the CodeGrex model picker.
"codegrex.openai.baseUrl": "http://localhost:1234/v1"
Reference

Settings reference

codegrex.ollama.baseUrldefault: http://localhost:11434

Where your Ollama server is reachable.

codegrex.ollama.modeldefault: llama3.2

Default Ollama model tag to use (e.g. qwen2.5-coder:7b).

codegrex.openai.baseUrldefault: https://api.openai.com/v1

Point this at any OpenAI-compatible server to use it as a provider.

Fully private by design

  • Prompts and code stay on your machine — local models never send data to CodeGrex or any third party.
  • Great for air-gapped environments and sensitive codebases.
  • Mix and match: use local models for sensitive work and cloud models for heavier tasks.

Need cloud models too?