Machine Learning Basics for Developers: A Practical Introduction
Learning Objectives
By the end of this guide you will be able to:
- Explain what machine learning is and how it differs from traditional programming
- Identify the three main types of ML and when to use each
- Set up a Python ML environment and run a complete training pipeline
- Understand the train/validate/test workflow
- Know where to go next on the ML path
What Is Machine Learning?
In traditional programming you write explicit rules: if X then Y. Machine learning flips that model. Instead of writing rules, you feed the system examples (data) and let it discover the rules itself.
Traditional: Input + Rules → Output
Machine Learning: Input + Output → Rules (model)
The "rules" learned by the model are represented as numeric parameters — millions or billions of numbers that encode patterns from training data.
Three Types of Machine Learning
1. Supervised Learning
You provide labeled examples. The model learns to map inputs to outputs.
- Regression — predict a continuous value (e.g., house price)
- Classification — predict a category (e.g., spam / not spam)
Most production ML is supervised learning.
2. Unsupervised Learning
No labels. The model finds structure in raw data.
- Clustering — group similar items (K-Means, DBSCAN)
- Dimensionality reduction — compress high-dimensional data (PCA, t-SNE)
Useful for customer segmentation, anomaly detection, data exploration.
3. Reinforcement Learning
An agent takes actions in an environment and receives rewards or penalties. It learns a policy that maximizes cumulative reward.
Used in game-playing AI, robotics, and recommendation systems.
Core Concepts Every Developer Needs
Features and Labels
- Feature — an input variable (e.g., number of bedrooms, user age)
- Label — the target you are predicting (e.g., sale price, churn = yes/no)
Training vs Inference
- Training — the process of fitting model parameters on labeled data
- Inference — using a trained model to make predictions on new data
Overfitting and Underfitting
- Overfitting — model memorizes training data, performs poorly on new data
- Underfitting — model is too simple, misses patterns in training data
- Sweet spot — good generalization: performs well on unseen data
Hyperparameters vs Parameters
- Parameters — learned from data (weights in a neural network)
- Hyperparameters — set by you before training (learning rate, number of trees)
Step-by-Step: Your First ML Model in Python
Step 1 — Install Dependencies
pip install scikit-learn pandas numpy matplotlib
Step 2 — Load and Explore Data
import pandas as pd
from sklearn.datasets import load_iris
data = load_iris(as_frame=True)
df = data.frame
print(df.head())
print(df.describe())
print(df['target'].value_counts())
Step 3 — Split Data
from sklearn.model_selection import train_test_split
X = df.drop('target', axis=1)
y = df['target']
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
print(f"Train: {len(X_train)}, Test: {len(X_test)}")
Step 4 — Train a Model
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
Step 5 — Evaluate
from sklearn.metrics import accuracy_score, classification_report
y_pred = model.predict(X_test)
print(f"Accuracy: {accuracy_score(y_test, y_pred):.3f}")
print(classification_report(y_test, y_pred, target_names=data.target_names))
Step 6 — Make Predictions
import numpy as np
sample = np.array([[5.1, 3.5, 1.4, 0.2]]) # one flower measurement
prediction = model.predict(sample)
print(f"Predicted class: {data.target_names[prediction[0]]}")
The ML Workflow at a Glance
1. Define the problem
2. Collect and label data
3. Explore and clean data (EDA)
4. Engineer features
5. Select and train a model
6. Evaluate on validation set
7. Tune hyperparameters
8. Final test on held-out test set
9. Deploy and monitor
Never touch the test set until you are ready for final evaluation. Treat it as a proxy for production data.
Essential Algorithms to Know
| Algorithm | Type | Good For |
|---|---|---|
| Linear/Logistic Regression | Supervised | Baselines, interpretability |
| Decision Tree | Supervised | Tabular data, explainability |
| Random Forest | Supervised | Robust tabular predictions |
| Gradient Boosting (XGBoost) | Supervised | Winning Kaggle competitions |
| K-Means | Unsupervised | Clustering |
| PCA | Unsupervised | Dimensionality reduction |
| Neural Networks | Supervised | Images, text, audio |
Always start with a simple baseline (logistic regression or a decision tree) before reaching for deep learning.
Troubleshooting Common Issues
Low accuracy on training data (underfitting)
- Try a more complex model
- Add more features
- Reduce regularization
High training accuracy, low test accuracy (overfitting)
- Get more data
- Add regularization (L1/L2, dropout)
- Reduce model complexity
- Use cross-validation
Training is very slow
- Use vectorized operations (NumPy/pandas), not Python loops
- Reduce dataset size for prototyping
- Use GPU acceleration for deep learning
Class imbalance
- Use
class_weight='balanced'in scikit-learn - Oversample the minority class (SMOTE)
- Choose appropriate metrics (F1, AUC-ROC instead of accuracy)
FAQ
Do I need a math degree to learn ML? No, but you need to be comfortable with: basic linear algebra (vectors, matrices), statistics (mean, variance, probability), and calculus concepts (derivatives, gradients). You can learn these as you go.
Python or R? Python. The ecosystem (scikit-learn, PyTorch, TensorFlow, LangChain) is vastly larger and it's what industry uses.
How much data do I need? For classical ML (random forests, XGBoost), hundreds to thousands of labeled examples can work. For deep learning, typically tens of thousands or more.
What is the difference between ML and deep learning? Deep learning is a subset of ML that uses multi-layer neural networks. It excels at unstructured data (images, text, audio) but requires more data and compute than classical ML.
What to Learn Next
- Supervised learning in depth → Machine Learning Roadmap
- Model evaluation and metrics → model-evaluation-and-metrics guide
- Feature engineering → feature-engineering-guide
- Become an AI engineer → How to Become an AI Engineer