Local LLM Guide · 2026

Ollama Guide: Run LLMs Locally

Run Llama, Mistral, Gemma, and more on your own hardware — free, private, offline, no API costs. This guide covers installation, the best models to run, Python integration, and a browser UI.

Why Run LLMs Locally?

Privacy

Your data never leaves your machine. Essential for sensitive documents, legal text, medical records, or proprietary code.

No cost

Zero API fees. Run as many queries as you want with no rate limits or pay-per-token billing.

Offline

Works without internet. Useful for air-gapped environments or unreliable connections.

Customization

Fine-tune models locally, swap them instantly, and integrate with any tool without API restrictions.

Installing Ollama

macOS / Linux

curl -fsSL https://ollama.com/install.sh | sh

Windows

Download installer from ollama.com/download

Then pull and run a model:

ollama run llama3.1:8b

Best Models to Run Locally

ModelRAM RequiredBest For
llama3.2:3b2GB RAMFast, low-end hardware
llama3.1:8b5GB RAMGood balance of quality/speed
mistral:7b4GB RAMCoding + chat
qwen2.5-coder4GB RAMCode generation
gemma2:9b6GB RAMMultimodal tasks

Using Ollama with Python

pip install ollama
import ollama

response = ollama.chat(
    model="llama3.1:8b",
    messages=[
        {"role": "user", "content": "Explain transformers in 2 sentences."},
    ],
)

print(response["message"]["content"])

Ollama + Open WebUI

Open WebUI gives you a ChatGPT-style browser interface over your local Ollama models. Install with Docker:

docker run -d -p 3000:8080 \
  -v open-webui:/app/backend/data \
  --name open-webui \
  ghcr.io/open-webui/open-webui:main

Then visit http://localhost:3000 for a full chat UI with model switching, history, and file uploads.

Frequently Asked Questions

What hardware do I need for Ollama?

Any modern Mac, Windows PC, or Linux machine with at least 8GB RAM can run 7B models. 16GB RAM is comfortable for 13B models. An Apple Silicon Mac (M1/M2/M3) with unified memory is the best consumer hardware for local LLMs — the GPU and CPU share the same memory pool, making 7B models fast without a discrete GPU.

Is Ollama free?

Yes, completely free and open-source. There are no API costs, no tokens to buy, and no rate limits. The only cost is your electricity and the initial model download (2–8GB per model). Models are stored locally and can be used offline.

Can I use Ollama with LangChain?

Yes. LangChain has first-class Ollama support via the ChatOllama and OllamaEmbeddings classes. This lets you build RAG pipelines and agents that run entirely locally — no API keys or costs required. See our LangChain tutorial for a full example.

What's the best model to run with Ollama?

For general use: llama3.1:8b (best quality/speed balance on most hardware). For coding: qwen2.5-coder or mistral:7b. For low-end hardware (8GB RAM): llama3.2:3b. For the best quality if you have 16GB+ RAM: llama3.1:13b or gemma2:9b.

Build with local LLMs + LangChain

Combine Ollama with LangChain to build fully local RAG pipelines and agents — no API costs, complete privacy.

LangChain Tutorial →