Free Local AI in VS Code with Ollama and Cline¶
This guide shows how to set up the currently strongest free combination of VS Code, Ollama, and the Cline agent for local AI-assisted development.
We use the DeepSeek-R1 model because its built-in chain-of-thought reasoning produces significantly fewer logical errors than a standard language model of the same size. On reasoning-heavy tasks it is broadly competitive with much larger cloud-based models.
Note
Everything runs locally on your machine. No data leaves your system. No API key or subscription required.
Step 1 — Install Ollama (the local model server)¶
Ollama runs AI models as a local HTTP server that other tools (like Cline) can query.
Go to https://ollama.com and download the installer for your OS (Windows, macOS, or Linux).
Run the installer and start the application. A small Ollama icon should appear in your system tray.
Step 2 — Download a model¶
Open a terminal (Command Prompt or PowerShell on Windows, Terminal on macOS/Linux) and run one of the following commands.
Use ollama pull to download the model without starting an interactive chat
session. This is the right command for a one-time setup.
For machines with 16 GB RAM or more (CPU) or 10 GB+ VRAM (GPU):
ollama pull deepseek-r1:14b
For laptops with 8 GB RAM/VRAM:
ollama pull deepseek-r1:7b
Note
VRAM requirements depend on quantization. The 14b model in 4-bit quantization (Q4, the Ollama default) requires approximately 8–10 GB VRAM. A GPU with 8 GB VRAM can therefore run the 14b model, but may be slow. 16 GB VRAM runs it comfortably. For CPU-only inference, 16 GB RAM is the practical minimum for 14b.
Alternative — if DeepSeek-R1 is too slow on your hardware:
ollama pull qwen2.5-coder:7b
qwen2.5-coder:7b is noticeably faster than the 14b DeepSeek model and is
optimised specifically for code generation tasks.
Wait for the download to complete before continuing.
Step 3 — Install the Cline extension in VS Code¶
We use Cline (formerly known as Claude Dev) because it is fully compatible with local, open-source model APIs and provides real agentic capabilities — reading and writing files, running terminal commands, and making multi-step decisions — rather than simple autocomplete.
Open Visual Studio Code.
Click the Extensions icon in the left sidebar (four squares).
Search for
Cline.Click Install.
Step 4 — Connect Cline to Ollama¶
Click the new Cline icon in the VS Code sidebar (small robot head).
Click the gear icon (Settings) at the bottom of the Cline panel.
Set the following options:
Setting
Value
API Provider
OllamaBase URL
http://localhost:11434(Ollama’s default; leave as-is)Model ID
Select the model you downloaded, e.g.
deepseek-r1:14bCustom Instructions (optional): in the Custom Instructions field you can add a persistent system prompt, for example:
You are a senior software engineer. Write clean, well-structured code with concise inline comments. Prefer explicit error handling over silent fallbacks.
Step 5 — Agent-based workflow¶
Unlike autocomplete tools, Cline operates as an agent: it plans a sequence of actions, proposes each action to you for approval, and then executes it. Actions include creating and editing files, running terminal commands, and reading project context.
Example task:
In the Cline chat, type:
Create a React web app with a simple dashboard page. Save all files into a new folder called "my-app". Install the required dependencies via the terminal.
What happens next:
Cline analyses the task and proposes a plan.
Before creating any file or running any command, Cline asks for your approval.
Click Approve for each step you want to proceed with.
Watch the agent build the folder structure, write the files, and run
npm install.
You remain in control at every step. Cline never executes an action without your explicit confirmation.
Troubleshooting¶
- Cline cannot connect to Ollama
Verify that Ollama is running (check the system tray icon). Confirm the Base URL is
http://localhost:11434and that the model name in the Model ID field exactly matches the name shown byollama listin the terminal.- The model is very slow
Reduce the model size (e.g. switch from
14bto7b) or tryqwen2.5-coder:7b. Performance depends heavily on whether inference runs on GPU or CPU. Check the Ollama logs to see which device is being used.- Cline asks for an API key
Make sure you selected
Ollamaas the API Provider, notOpenAIorAnthropic. Ollama does not require an API key.