What is fine-tuning an LLM?

Fine-tuning is a training process that adapts a pre-trained language model to a specific task, domain, or style by training it further on a curated dataset. Unlike prompting, fine-tuning modifies the model's weights to bake in behaviour, tone, or knowledge permanently.

What is LoRA fine-tuning?

LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning technique that trains small adapter matrices instead of all model weights. It reduces GPU memory requirements by 90%+ and training time significantly, making fine-tuning accessible without expensive hardware.

When should I fine-tune instead of using prompt engineering?

Fine-tune when you need consistent style or tone across all outputs, when you have 500+ high-quality task examples, when the task is well-defined and unlikely to change, or when you need to reduce prompt length and latency at scale. Use prompt engineering first — it is faster and cheaper to iterate.

What hardware do I need to fine-tune an LLM?

With QLoRA, you can fine-tune a 7B parameter model on a single consumer GPU with 16–24 GB VRAM (e.g. RTX 3090/4090). For 13B models, 2x A100s or cloud instances (RunPod, Lambda Labs, Google Colab Pro) are common. Full fine-tuning of large models requires multi-GPU setups.

Topic Hub

Fine-Tuning LLMs

Fine-tuning adapts a pre-trained language model to your specific domain, style, or task. Learn when it makes sense, how to prepare your data, and how to use efficient techniques like LoRA and QLoRA to train models on consumer hardware.

What it is

What Is Fine-Tuning?

Pre-trained LLMs like Llama 3, Mistral, and Gemma are trained on vast general-purpose datasets. Fine-tuning is a second training stage on a smaller, task-specific dataset that adjusts the model's weights to excel at your particular use case — whether that's writing in your brand voice, following a specific output format, or mastering a technical domain.

Modern fine-tuning is far more accessible than it sounds. Parameter-efficient methods like LoRA mean you can fine-tune a high-quality 7B model on a single consumer GPU in a few hours.

Instruction Tuning

Train the model to follow instructions using (instruction, response) pairs. Creates chat-capable, assistant-style models.

LoRA

Low-Rank Adaptation trains small adapter matrices. Reduces GPU memory 10×+ with minimal quality loss.

QLoRA

Quantized LoRA combines 4-bit quantization with LoRA adapters. Fine-tune 7B models on a single 16 GB GPU.

RLHF

Reinforcement Learning from Human Feedback aligns model behaviour with human preferences. Used in ChatGPT, Claude, Gemini.

DPO

Direct Preference Optimisation is a simpler RLHF alternative. Trains on preference pairs without a reward model.

Domain Adaptation

Continue pre-training on domain-specific text (medical, legal, code) so the model learns specialised vocabulary and facts.

Why it matters

When Fine-Tuning Is the Right Choice

Fine-tuning is not always the answer — but when it is, it delivers capabilities that prompting simply cannot match:

Style and tone consistency — bake your brand voice or writing style directly into the model
Format adherence — models learn to reliably output specific JSON schemas, code formats, or document structures
Reduced prompt length — move few-shot examples into weights, cutting token costs at scale
Domain expertise — models learn specialised terminology, reasoning patterns, and domain knowledge
Privacy — run a fine-tuned open-source model entirely on your own infrastructure

Fine-tuning vs. prompting: when to use each

Scenario	Recommendation
Prototype or early product	Prompt engineering first
Need consistent output format	Fine-tuning (+ structured output)
Knowledge changes frequently	RAG, not fine-tuning
Need specific persona or style	Fine-tuning
Fewer than 100 examples	Few-shot prompting
500+ high-quality examples	Fine-tuning likely worth it
Cost-sensitive at high volume	Fine-tune a smaller model

Where it fits in the AI roadmap

Phase 6 of the AI Engineering Roadmap

Fine-Tuning LLMs is Phase 6 of the AI roadmap for developers. It comes after you have mastered the core LLM development stack:

Phase 3 Prompt Engineering → understand what models respond to

Phase 4–5 RAG + Agents → build production AI applications

Phase 6 ★ Fine-Tuning LLMs ← you are here

Phase 7 Real-world Projects → ship production AI systems

Fine-tuning is an advanced topic — you will get the most out of it once you understand model behaviour through prompting, have a clear task definition, and have a quality dataset. Do not skip phases 3–5.

Tutorials on this site

Fine-Tuning Guides & Deep-Dives

From understanding transformer internals to running LoRA fine-tuning on open-source models — practical guides for every stage of the process.

Guide Fine-Tuning LLMs: Complete Guide to Instruction Tuning and LoRA Foundation Transformer Architecture Explained: How LLMs Actually Work Guide Open Source LLMs Guide: Llama, Mistral, Gemma, and How to Run Them Locally Guide LLM Inference and Serving: vLLM, Ollama, and Production Optimization Guide Deploying AI Applications: From Development to Production

Continue Learning

Hub Prompt Engineering — Master prompting before fine-tuning Hub RAG Systems — Often a better alternative to fine-tuning for knowledge tasks Hub AI Agents — Build autonomous systems on top of fine-tuned models Path LLM Engineer Learning Path — Deep track for model-level work

Ready to go deeper into LLMs?

Follow the complete AI engineering roadmap — from API basics to fine-tuning your own models.

View the full AI roadmap →