Back to docs Models

Run models locally.

CodeGrex doesn't bundle a model in the installer. Instead, run any local model on your own hardware in one of two ways — your code never leaves your machine.

Option 1 — Ollama

Recommended · easiest setup

Option 2 — OpenAI-compatible

LM Studio, llama.cpp, vLLM…

Option 1 · Recommended

Set up Ollama

Ollama runs models locally and exposes a simple API that CodeGrex talks to automatically. It's the fastest way to get going.

Install Ollama

Download the installer for your OS and run it. Ollama starts a local server at http://localhost:11434.

Download Ollama

Pull a model that fits your machine

Pick based on your available RAM/VRAM. Bigger models are higher quality but slower.

Qwen2.5-Coder 1.5B~4 GB RAM

ollama pull qwen2.5-coder:1.5b

Lightweight — completions on modest laptops

Qwen2.5-Coder 7B~8 GB RAM

ollama pull qwen2.5-coder:7b

Balanced — recommended default for chat + edits

DeepSeek Coder 6.7B~8 GB RAM

ollama pull deepseek-coder:6.7b

Strong code reasoning

Qwen2.5-Coder 14B16 GB+ RAM / GPU

ollama pull qwen2.5-coder:14b

Higher quality — needs a capable machine

Browse the full catalog at ollama.com/library .

Use it in CodeGrex

CodeGrex auto-detects Ollama at the default URL — just select your local model from the model picker. To pull or switch models without leaving the editor, open the Command Palette and run CodeGrex: Manage Local Models.

To set a default explicitly, configure it in Settings:

"codegrex.ollama.baseUrl": "http://localhost:11434",
"codegrex.ollama.model": "qwen2.5-coder:7b"

Option 2 · Advanced

Use any OpenAI-compatible endpoint

Already running a local server that speaks the OpenAI API? Point CodeGrex at it by setting codegrex.openai.baseUrl. CodeGrex automatically treats localhost endpoints as local providers.

Server	Base URL	Notes
LM Studio	`http://localhost:1234/v1`	Friendly desktop GUI — easiest non-Ollama option
llama.cpp (llama-server)	`http://localhost:8080/v1`	Lightweight, runs GGUF models
vLLM	`http://localhost:8000/v1`	High-throughput GPU serving
Jan	`http://localhost:1337/v1`	Open-source desktop app with a local API server
LocalAI	`http://localhost:8080/v1`	Self-hosted drop-in OpenAI replacement

Example: LM Studio

Open LM Studio, download a model, and start its local server.
In CodeGrex Settings, set the base URL below.
Select the model in the CodeGrex model picker.

"codegrex.openai.baseUrl": "http://localhost:1234/v1"

Reference

Settings reference

codegrex.ollama.baseUrldefault: http://localhost:11434

Where your Ollama server is reachable.

codegrex.ollama.modeldefault: llama3.2

Default Ollama model tag to use (e.g. qwen2.5-coder:7b).

codegrex.openai.baseUrldefault: https://api.openai.com/v1

Point this at any OpenAI-compatible server to use it as a provider.

Fully private by design

Prompts and code stay on your machine — local models never send data to CodeGrex or any third party.
Great for air-gapped environments and sensitive codebases.
Mix and match: use local models for sensitive work and cloud models for heavier tasks.

Need cloud models too?

Cloud models Download CodeGrex