.. Tips and Tricks — Local AI with VS Code, Ollama, and Cline. Free Local AI in VS Code with Ollama and Cline =============================================== This guide shows how to set up the currently strongest free combination of VS Code, Ollama, and the Cline agent for local AI-assisted development. We use the **DeepSeek-R1** model because its built-in chain-of-thought reasoning produces significantly fewer logical errors than a standard language model of the same size. On reasoning-heavy tasks it is broadly competitive with much larger cloud-based models. .. note:: Everything runs locally on your machine. No data leaves your system. No API key or subscription required. -------------------------------------------------------------------------------- Step 1 — Install Ollama (the local model server) ------------------------------------------------- Ollama runs AI models as a local HTTP server that other tools (like Cline) can query. 1. Go to ``_ and download the installer for your OS (Windows, macOS, or Linux). 2. Run the installer and start the application. A small Ollama icon should appear in your system tray. -------------------------------------------------------------------------------- Step 2 — Download a model -------------------------- Open a terminal (**Command Prompt** or **PowerShell** on Windows, **Terminal** on macOS/Linux) and run one of the following commands. Use ``ollama pull`` to download the model without starting an interactive chat session. This is the right command for a one-time setup. **For machines with 16 GB RAM or more (CPU) or 10 GB+ VRAM (GPU):** .. code-block:: shell ollama pull deepseek-r1:14b **For laptops with 8 GB RAM/VRAM:** .. code-block:: shell ollama pull deepseek-r1:7b .. note:: VRAM requirements depend on quantization. The 14b model in 4-bit quantization (Q4, the Ollama default) requires approximately 8–10 GB VRAM. A GPU with 8 GB VRAM can therefore run the 14b model, but may be slow. 16 GB VRAM runs it comfortably. For CPU-only inference, 16 GB RAM is the practical minimum for 14b. **Alternative — if DeepSeek-R1 is too slow on your hardware:** .. code-block:: shell ollama pull qwen2.5-coder:7b ``qwen2.5-coder:7b`` is noticeably faster than the 14b DeepSeek model and is optimised specifically for code generation tasks. Wait for the download to complete before continuing. -------------------------------------------------------------------------------- Step 3 — Install the Cline extension in VS Code ------------------------------------------------- We use **Cline** (formerly known as *Claude Dev*) because it is fully compatible with local, open-source model APIs and provides real agentic capabilities — reading and writing files, running terminal commands, and making multi-step decisions — rather than simple autocomplete. 1. Open Visual Studio Code. 2. Click the **Extensions** icon in the left sidebar (four squares). 3. Search for ``Cline``. 4. Click **Install**. -------------------------------------------------------------------------------- Step 4 — Connect Cline to Ollama --------------------------------- 1. Click the new **Cline** icon in the VS Code sidebar (small robot head). 2. Click the **gear icon** (Settings) at the bottom of the Cline panel. 3. Set the following options: .. list-table:: :header-rows: 1 :widths: 30 70 * - Setting - Value * - **API Provider** - ``Ollama`` * - **Base URL** - ``http://localhost:11434`` (Ollama's default; leave as-is) * - **Model ID** - Select the model you downloaded, e.g. ``deepseek-r1:14b`` 4. **Custom Instructions** (optional): in the Custom Instructions field you can add a persistent system prompt, for example:: You are a senior software engineer. Write clean, well-structured code with concise inline comments. Prefer explicit error handling over silent fallbacks. -------------------------------------------------------------------------------- Step 5 — Agent-based workflow ------------------------------ Unlike autocomplete tools, Cline operates as an **agent**: it plans a sequence of actions, proposes each action to you for approval, and then executes it. Actions include creating and editing files, running terminal commands, and reading project context. **Example task:** In the Cline chat, type: .. code-block:: text Create a React web app with a simple dashboard page. Save all files into a new folder called "my-app". Install the required dependencies via the terminal. **What happens next:** 1. Cline analyses the task and proposes a plan. 2. Before creating any file or running any command, Cline asks for your approval. 3. Click **Approve** for each step you want to proceed with. 4. Watch the agent build the folder structure, write the files, and run ``npm install``. You remain in control at every step. Cline never executes an action without your explicit confirmation. -------------------------------------------------------------------------------- Troubleshooting --------------- **Cline cannot connect to Ollama** Verify that Ollama is running (check the system tray icon). Confirm the Base URL is ``http://localhost:11434`` and that the model name in the Model ID field exactly matches the name shown by ``ollama list`` in the terminal. **The model is very slow** Reduce the model size (e.g. switch from ``14b`` to ``7b``) or try ``qwen2.5-coder:7b``. Performance depends heavily on whether inference runs on GPU or CPU. Check the Ollama logs to see which device is being used. **Cline asks for an API key** Make sure you selected ``Ollama`` as the API Provider, not ``OpenAI`` or ``Anthropic``. Ollama does not require an API key.