Free Local AI in VS Code with Ollama and Cline

This guide shows how to set up the currently strongest free combination of VS Code, Ollama, and the Cline agent for local AI-assisted development.

We use the DeepSeek-R1 model because its built-in chain-of-thought reasoning produces significantly fewer logical errors than a standard language model of the same size. On reasoning-heavy tasks it is broadly competitive with much larger cloud-based models.

Note

Everything runs locally on your machine. No data leaves your system. No API key or subscription required.


Step 1 — Install Ollama (the local model server)

Ollama runs AI models as a local HTTP server that other tools (like Cline) can query.

  1. Go to https://ollama.com and download the installer for your OS (Windows, macOS, or Linux).

  2. Run the installer and start the application. A small Ollama icon should appear in your system tray.


Step 2 — Download a model

Open a terminal (Command Prompt or PowerShell on Windows, Terminal on macOS/Linux) and run one of the following commands.

Use ollama pull to download the model without starting an interactive chat session. This is the right command for a one-time setup.

For machines with 16 GB RAM or more (CPU) or 10 GB+ VRAM (GPU):

ollama pull deepseek-r1:14b

For laptops with 8 GB RAM/VRAM:

ollama pull deepseek-r1:7b

Note

VRAM requirements depend on quantization. The 14b model in 4-bit quantization (Q4, the Ollama default) requires approximately 8–10 GB VRAM. A GPU with 8 GB VRAM can therefore run the 14b model, but may be slow. 16 GB VRAM runs it comfortably. For CPU-only inference, 16 GB RAM is the practical minimum for 14b.

Alternative — if DeepSeek-R1 is too slow on your hardware:

ollama pull qwen2.5-coder:7b

qwen2.5-coder:7b is noticeably faster than the 14b DeepSeek model and is optimised specifically for code generation tasks.

Wait for the download to complete before continuing.


Step 3 — Install the Cline extension in VS Code

We use Cline (formerly known as Claude Dev) because it is fully compatible with local, open-source model APIs and provides real agentic capabilities — reading and writing files, running terminal commands, and making multi-step decisions — rather than simple autocomplete.

  1. Open Visual Studio Code.

  2. Click the Extensions icon in the left sidebar (four squares).

  3. Search for Cline.

  4. Click Install.


Step 4 — Connect Cline to Ollama

  1. Click the new Cline icon in the VS Code sidebar (small robot head).

  2. Click the gear icon (Settings) at the bottom of the Cline panel.

  3. Set the following options:

    Setting

    Value

    API Provider

    Ollama

    Base URL

    http://localhost:11434 (Ollama’s default; leave as-is)

    Model ID

    Select the model you downloaded, e.g. deepseek-r1:14b

  4. Custom Instructions (optional): in the Custom Instructions field you can add a persistent system prompt, for example:

    You are a senior software engineer.
    Write clean, well-structured code with concise inline comments.
    Prefer explicit error handling over silent fallbacks.
    

Step 5 — Agent-based workflow

Unlike autocomplete tools, Cline operates as an agent: it plans a sequence of actions, proposes each action to you for approval, and then executes it. Actions include creating and editing files, running terminal commands, and reading project context.

Example task:

In the Cline chat, type:

Create a React web app with a simple dashboard page.
Save all files into a new folder called "my-app".
Install the required dependencies via the terminal.

What happens next:

  1. Cline analyses the task and proposes a plan.

  2. Before creating any file or running any command, Cline asks for your approval.

  3. Click Approve for each step you want to proceed with.

  4. Watch the agent build the folder structure, write the files, and run npm install.

You remain in control at every step. Cline never executes an action without your explicit confirmation.


Troubleshooting

Cline cannot connect to Ollama

Verify that Ollama is running (check the system tray icon). Confirm the Base URL is http://localhost:11434 and that the model name in the Model ID field exactly matches the name shown by ollama list in the terminal.

The model is very slow

Reduce the model size (e.g. switch from 14b to 7b) or try qwen2.5-coder:7b. Performance depends heavily on whether inference runs on GPU or CPU. Check the Ollama logs to see which device is being used.

Cline asks for an API key

Make sure you selected Ollama as the API Provider, not OpenAI or Anthropic. Ollama does not require an API key.