AI Foundations for Developers: Core Concepts You Must Know
Who This Guide Is For
You're a software developer who can write code but feels lost when you see terms like "gradient descent," "embeddings," or "transformer architecture." This guide builds the mental models you need before diving into any specific AI technique.
The AI Hierarchy: What Each Term Actually Means
These terms are often confused. Here's the precise breakdown:
Artificial Intelligence (broadest)
└── Machine Learning (learns from data)
└── Deep Learning (multi-layer neural networks)
└── Large Language Models (trained on text at scale)
└── Foundation Models (generalize across tasks)
Artificial Intelligence — any technique that makes a computer exhibit intelligent behavior. Rule-based systems from the 1980s were "AI."
Machine Learning — the subset of AI where the system learns from data rather than being explicitly programmed. You show it examples; it figures out the rules.
Deep Learning — ML using neural networks with many layers. The "deep" refers to depth of layers, not complexity. Enabled modern breakthroughs in vision, speech, and language.
Large Language Models (LLMs) — deep learning models trained on massive text corpora to predict the next token. GPT-4, Claude, Llama are all LLMs.
The Core Paradigm Shift
Traditional programming:
Rules + Data → Output
Machine learning:
Data + Output → Rules (learned model)
This is the fundamental inversion. Instead of writing if score > 0.7: classify_as_positive, you show the model thousands of examples and it learns what "positive" means.
What "Training" Actually Means
Training is optimization. You have:
- A model with millions of parameters (numbers/weights)
- A dataset of (input, expected output) pairs
- A loss function that measures how wrong the model is
Training adjusts the parameters to minimize the loss. The algorithm that does this is called gradient descent — it nudges each parameter slightly in the direction that reduces error.
# Conceptually, training looks like this:
for batch in dataset:
prediction = model(batch.input) # forward pass
loss = loss_fn(prediction, batch.target) # how wrong?
loss.backward() # compute gradients
optimizer.step() # nudge parameters
After training on millions of examples, the parameters encode learned patterns. A trained model is essentially a compressed representation of patterns in data.
Key Concepts Every AI Developer Needs
1. Tokens (not words)
LLMs don't process words — they process tokens, which are chunks of characters. "unbelievable" might be split into ["un", "believ", "able"]. Most English words are 1–3 tokens. A token is roughly 4 characters or 0.75 words.
Why it matters: LLM APIs charge per token, and models have context window limits in tokens (e.g., 128k tokens ≈ ~100k words).
2. Embeddings (not vectors)
An embedding is a list of numbers (a vector) that represents meaning. Similar concepts have similar embeddings.
# "cat" might be represented as:
[0.23, -0.45, 0.87, 0.12, ...] # 1536 numbers
# "dog" would be close:
[0.21, -0.42, 0.85, 0.15, ...]
# "javascript" would be far:
[-0.67, 0.34, -0.12, 0.78, ...]
Embeddings enable semantic search: "find text similar to this query" by comparing vectors.
3. Parameters vs. Hyperparameters
Parameters — learned during training (weights, biases). A GPT-4 class model has ~1 trillion parameters.
Hyperparameters — set by the developer before training (learning rate, batch size, number of layers). You tune these; the model learns the rest.
When using APIs like OpenAI, temperature and max_tokens are hyperparameters you control at inference time.
4. Inference vs. Training
Training — expensive, done once (or periodically), requires GPUs for weeks.
Inference — using the trained model to generate predictions. What happens when you call client.chat.completions.create(...). Much cheaper and faster than training.
As an application developer, you almost always do inference — you call a pre-trained model's API.
5. Context Window
The maximum amount of text an LLM can "see" at once. Like working memory. If you send a 200k-token document to a model with a 128k context window, it can't read the whole thing in one call.
The Three Ways to Use AI in Applications
1. Prompt Engineering (easiest) Call a pre-trained model via API with carefully crafted prompts. No training required.
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Summarize this: ..."}]
)
2. Fine-tuning (medium) Take a pre-trained model and train it further on your specific data. Adapts the model's behavior for your use case.
3. Training from scratch (most expensive) Build and train a model on your own data from random initialization. Done by research labs and large companies, rarely by application developers.
As an app developer, you'll spend 95% of your time on option 1, occasionally on option 2, and almost never on option 3.
The AI Application Stack
User Interface (Streamlit / React / API)
↓
Application Logic (Python)
↓
AI Orchestration (LangChain / raw API calls)
↓
LLM API (OpenAI / Anthropic / local Ollama)
↓
Optional: Vector DB (ChromaDB / Pinecone)
Optional: Tools (web search, code execution, DB)
You'll be building in the top two layers. The AI model is a service you consume.
Mental Models That Will Save You Time
The stochastic parrot — LLMs don't "understand" in the human sense. They predict statistically likely continuations of text. This explains both their power and their failures (hallucinations).
The context is everything — LLMs have no persistent memory. Every API call starts fresh. The conversation history in chatbots is just text prepended to each new request.
Garbage in, garbage out — prompt quality determines output quality more than model choice. A well-crafted prompt on GPT-4o-mini often beats a vague prompt on GPT-4.
Temperature = creativity dial — temperature 0 = deterministic/factual, temperature 1 = creative/varied. Use low temperature for analysis, high for generation.
Learning Progression
This guide is the start. The recommended next steps:
- Python for AI → Python for AI Complete Guide — set up your environment
- Build your first AI app → Build an AI Chatbot — 2-hour project
- Understand the API → OpenAI API Complete Guide — master the tools
- Go deeper on LLMs → How LLMs Work — what's happening under the hood
FAQ
Do I need to know math to use AI? For application development (API calls, prompt engineering): no. For fine-tuning or understanding model internals: basic calculus and linear algebra help but aren't required to start.
Python or JavaScript for AI? Python. The entire AI ecosystem (PyTorch, HuggingFace, LangChain, scikit-learn) is Python-first. JavaScript has decent SDKs for API calls, but you'll hit limits quickly.
Do I need a GPU? For training: yes, expensive. For inference via API: no — the cloud provider handles it. For running local models (Ollama): a modern Mac with M-series chip or a GPU with 8GB+ VRAM works.