AI Foundations: Core Concepts Developers Learn Fast (2026)

Who This Guide Is For

You're a software developer who can write code but feels lost when you see terms like "gradient descent," "embeddings," or "transformer architecture." This guide builds the mental models you need before diving into any specific AI technique.

The AI Hierarchy: What Each Term Actually Means

These terms are often confused. Here's the precise breakdown:

plaintext

Artificial Intelligence (broadest)
  └── Machine Learning (learns from data)
        └── Deep Learning (multi-layer neural networks)
              └── Large Language Models (trained on text at scale)
                    └── Foundation Models (generalize across tasks)

Artificial Intelligence — any technique that makes a computer exhibit intelligent behavior. Rule-based systems from the 1980s were "AI."

Machine Learning — the subset of AI where the system learns from data rather than being explicitly programmed. You show it examples; it figures out the rules.

Deep Learning — ML using neural networks with many layers. The "deep" refers to depth of layers, not complexity. Enabled modern breakthroughs in vision, speech, and language.

Large Language Models (LLMs) — deep learning models trained on massive text corpora to predict the next token. GPT-4, Claude, Llama are all LLMs.

The Core Paradigm Shift

Traditional programming:

plaintext

Rules + Data → Output

Machine learning:

plaintext

Data + Output → Rules (learned model)

This is the fundamental inversion. Instead of writing if score > 0.7: classify_as_positive, you show the model thousands of examples and it learns what "positive" means.

What "Training" Actually Means

Training is optimization. You have:

A model with millions of parameters (numbers/weights)
A dataset of (input, expected output) pairs
A loss function that measures how wrong the model is

Training adjusts the parameters to minimize the loss. The algorithm that does this is called gradient descent — it nudges each parameter slightly in the direction that reduces error.

Python

# Conceptually, training looks like this:
for batch in dataset:
    prediction = model(batch.input)       # forward pass
    loss = loss_fn(prediction, batch.target)  # how wrong?
    loss.backward()                       # compute gradients
    optimizer.step()                      # nudge parameters

After training on millions of examples, the parameters encode learned patterns. A trained model is essentially a compressed representation of patterns in data.

Key Concepts Every AI Developer Needs

1. Tokens (not words)

LLMs don't process words — they process tokens, which are chunks of characters. "unbelievable" might be split into ["un", "believ", "able"]. Most English words are 1–3 tokens. A token is roughly 4 characters or 0.75 words.

Why it matters: LLM APIs charge per token, and models have context window limits in tokens (e.g., 128k tokens ≈ ~100k words).

2. Embeddings (not vectors)

An embedding is a list of numbers (a vector) that represents meaning. Similar concepts have similar embeddings.

Python

# "cat" might be represented as:
[0.23, -0.45, 0.87, 0.12, ...]  # 1536 numbers

# "dog" would be close:
[0.21, -0.42, 0.85, 0.15, ...]

# "javascript" would be far:
[-0.67, 0.34, -0.12, 0.78, ...]

Embeddings enable semantic search: "find text similar to this query" by comparing vectors.

3. Parameters vs. Hyperparameters

Parameters — learned during training (weights, biases). A GPT-4 class model has ~1 trillion parameters.

Hyperparameters — set by the developer before training (learning rate, batch size, number of layers). You tune these; the model learns the rest.

When using APIs like OpenAI, temperature and max_tokens are hyperparameters you control at inference time.

4. Inference vs. Training

Training — expensive, done once (or periodically), requires GPUs for weeks.

Inference — using the trained model to generate predictions. What happens when you call client.chat.completions.create(...). Much cheaper and faster than training.

As an application developer, you almost always do inference — you call a pre-trained model's API.

5. Context Window

The maximum amount of text an LLM can "see" at once. Like working memory. If you send a 200k-token document to a model with a 128k context window, it can't read the whole thing in one call.

The Three Ways to Use AI in Applications

1. Prompt Engineering (easiest) Call a pre-trained model via API with carefully crafted prompts. No training required.

Python

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Summarize this: ..."}]
)

2. Fine-tuning (medium) Take a pre-trained model and train it further on your specific data. Adapts the model's behavior for your use case.

3. Training from scratch (most expensive) Build and train a model on your own data from random initialization. Done by research labs and large companies, rarely by application developers.

As an app developer, you'll spend 95% of your time on option 1, occasionally on option 2, and almost never on option 3.

The AI Application Stack

plaintext

User Interface (Streamlit / React / API)
        ↓
Application Logic (Python)
        ↓
AI Orchestration (LangChain / raw API calls)
        ↓
LLM API (OpenAI / Anthropic / local Ollama)
        ↓
Optional: Vector DB (ChromaDB / Pinecone)
Optional: Tools (web search, code execution, DB)

You'll be building in the top two layers. The AI model is a service you consume.

Mental Models That Will Save You Time

The stochastic parrot — LLMs don't "understand" in the human sense. They predict statistically likely continuations of text. This explains both their power and their failures (hallucinations).

The context is everything — LLMs have no persistent memory. Every API call starts fresh. The conversation history in chatbots is just text prepended to each new request.

Garbage in, garbage out — prompt quality determines output quality more than model choice. A well-crafted prompt on GPT-4o-mini often beats a vague prompt on GPT-4.

Temperature = creativity dial — temperature 0 = deterministic/factual, temperature 1 = creative/varied. Use low temperature for analysis, high for generation.

Learning Progression

This guide is the start. The recommended next steps:

Python for AI → Python for AI Complete Guide — set up your environment
Build your first AI app → Build an AI Chatbot — 2-hour project
Understand the API → OpenAI API Complete Guide — master the tools
Go deeper on LLMs → How LLMs Work — what's happening under the hood

FAQ

Do I need to know math to use AI? For application development (API calls, prompt engineering): no. For fine-tuning or understanding model internals: basic calculus and linear algebra help but aren't required to start.

Python or JavaScript for AI? Python. The entire AI ecosystem (PyTorch, HuggingFace, LangChain, scikit-learn) is Python-first. JavaScript has decent SDKs for API calls, but you'll hit limits quickly.

Do I need a GPU? For training: yes, expensive. For inference via API: no — the cloud provider handles it. For running local models (Ollama): a modern Mac with M-series chip or a GPU with 8GB+ VRAM works.