ChatGPT Clone: Build the Full Stack App from Scratch (2026)
·
4 min read
·
AI Learning Hub
Project Overview
Build a fully functional conversational AI application — the kind of thing that powers ChatGPT, Claude.ai, and similar interfaces. By the end you will have a working chat app with streaming token-by-token responses, persistent conversation history, multiple named conversations, and a configurable system prompt per session.
This is the most important first project for any LLM developer. It teaches you the core request/response cycle, how streaming works, how conversation context is managed, and how to wire a backend to a frontend — all the foundations you need before tackling RAG and agents.
Learning Outcomes
After completing this project you will be able to:
- Explain how the chat completions message array works and why the full history must be sent on every request
- Implement token streaming end-to-end from the OpenAI API to the browser using Server-Sent Events
- Build a FastAPI async backend that handles long-lived streaming responses without blocking
- Design a minimal conversation persistence layer and understand its token-limit implications
- Use system prompts to control model persona, output format, and response constraints
- Handle context window overflow by implementing a rolling message truncation strategy
Tech Stack
| Layer | Technology | Purpose |
|---|---|---|
| Frontend | React + Vite | Chat UI, SSE stream consumption |
| Backend | FastAPI + uvicorn | API endpoints, streaming response |
| Database | SQLite via SQLModel | Conversation and message persistence |
| LLM API | OpenAI gpt-4o-mini | Low-cost, fast, reliable chat completions |
| Streaming | Server-Sent Events | HTTP-native token streaming to the browser |
| HTTP client | openai Python SDK | Handles retries and async streaming |
| Language | Python 3.11+ | Core backend implementation |
Architecture
plaintext
Browser (React)
│ EventSource (SSE stream)
▼
FastAPI backend ←→ SQLite
│ (conversations, messages)
▼
openai.chat.completions.create(stream=True)
│
▼
OpenAI API → gpt-4o-miniKey design decisions:
- Server-Sent Events over WebSockets — simpler for one-directional token streams, works over standard HTTP
- Full message history on every request — how all LLM chat works; teaches context window awareness
- SQLite for persistence — zero infrastructure, trivially swappable for Postgres later
- Background message saving — save the completed assistant message after the stream finishes, not during
Implementation
Step 1: Project setup
Bash
# Backend
python -m venv .venv && source .venv/bin/activate
pip install fastapi uvicorn openai sqlmodel python-dotenv
echo "OPENAI_API_KEY=sk-..." > .env
# Frontend
npm create vite@latest frontend -- --template react
cd frontend && npm installStep 2: Database models
Python
# models.py
from sqlmodel import SQLModel, Field, Relationship
from typing import Optional, List
from datetime import datetime
class Conversation(SQLModel, table=True):
id: Optional[int] = Field(default=None, primary_key=True)
title: str = Field(default="New conversation")
system_prompt: str = Field(default="You are a helpful assistant.")
created_at: datetime = Field(default_factory=datetime.utcnow)
messages: List["Message"] = Relationship(back_populates="conversation")
class Message(SQLModel, table=True):
id: Optional[int] = Field(default=None, primary_key=True)
conversation_id: int = Field(foreign_key="conversation.id")
role: str # "user" | "assistant"
content: str
created_at: datetime = Field(default_factory=datetime.utcnow)
conversation: Optional[Conversation] = Relationship(back_populates="messages")Step 3: FastAPI endpoints
Python
# main.py
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from fastapi.middleware.cors import CORSMiddleware
from sqlmodel import Session, select
from openai import AsyncOpenAI
import asyncio, json
app = FastAPI()
app.add_middleware(CORSMiddleware, allow_origins=["http://localhost:5173"],
allow_methods=["*"], allow_headers=["*"])
client = AsyncOpenAI()
@app.post("/conversations")
def create_conversation(title: str = "New conversation", system_prompt: str = "You are a helpful assistant."):
with Session(engine) as session:
conv = Conversation(title=title, system_prompt=system_prompt)
session.add(conv)
session.commit()
session.refresh(conv)
return conv
@app.get("/conversations")
def list_conversations():
with Session(engine) as session:
return session.exec(select(Conversation).order_by(Conversation.created_at.desc())).all()
@app.get("/conversations/{conv_id}/messages")
def get_messages(conv_id: int):
with Session(engine) as session:
return session.exec(select(Message).where(Message.conversation_id == conv_id)).all()Step 4: Streaming chat endpoint
Python
@app.post("/conversations/{conv_id}/chat")
async def chat(conv_id: int, user_message: str):
with Session(engine) as session:
conv = session.get(Conversation, conv_id)
history = session.exec(select(Message)
.where(Message.conversation_id == conv_id)
.order_by(Message.created_at)).all()
# Build message array — this is the core of how LLM chat works
messages = [{"role": "system", "content": conv.system_prompt}]
# Truncate if approaching context limit (~3000 tokens budget for history)
for msg in history[-20:]:
messages.append({"role": msg.role, "content": msg.content})
messages.append({"role": "user", "content": user_message})
async def generate():
full_response = []
stream = await client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
stream=True,
)
async for chunk in stream:
delta = chunk.choices[0].delta.content or ""
if delta:
full_response.append(delta)
yield f"data: {json.dumps({'token': delta})}\n\n"
# Save both messages after stream completes
with Session(engine) as session:
session.add(Message(conversation_id=conv_id, role="user", content=user_message))
session.add(Message(conversation_id=conv_id, role="assistant", content="".join(full_response)))
session.commit()
yield "data: [DONE]\n\n"
return StreamingResponse(generate(), media_type="text/event-stream")Step 5: React chat UI
jsx
// App.jsx (key streaming logic)
const sendMessage = async (text) => {
setMessages(prev => [...prev, { role: 'user', content: text }]);
setMessages(prev => [...prev, { role: 'assistant', content: '' }]);
const source = new EventSource(
`/api/conversations/${activeConv}/chat?user_message=${encodeURIComponent(text)}`
);
source.onmessage = (e) => {
if (e.data === '[DONE]') { source.close(); return; }
const { token } = JSON.parse(e.data);
setMessages(prev => {
const updated = [...prev];
updated[updated.length - 1].content += token;
return updated;
});
};
};Step 6: Run and test
Bash
# Start backend
uvicorn main:app --reload --port 8000
# Start frontend
cd frontend && npm run dev
# Test streaming works correctly with curl
curl -N "http://localhost:8000/conversations/1/chat?user_message=Hello"Extension Ideas
- Model selector — let users switch between gpt-4o-mini, gpt-4o, and o1 per conversation
- Conversation title generation — auto-generate a title from the first message using a separate LLM call
- Token counter — display live token usage and estimated cost per message
- Export conversation — download a conversation as Markdown or JSON
- Response regeneration — add a "try again" button that deletes the last assistant message and re-runs the request
What to Learn Next
- RAG → RAG Document Assistant
- Agents → AI Code Review Assistant
- Prompt engineering → Prompt Engineering Guide