How LLMs Work & Why They Fail

Week 1 of 3 — AI Course • Grades 10–11

50 Minutes | Theory + Concepts + Discussion

Week 1: Theory ← You are here

Week 2: Hands-On Activities

Week 3: Exam

Overview

Today's Lesson

1

What is an LLM? — The big picture (8 min)

2

Tokens & Training — How LLMs actually work (12 min)

3

Context Windows — AI's short-term memory (8 min)

4

Why LLMs Fail — Hallucinations & other limits (15 min)

5

Discussion & Exit Ticket (7 min)

Part 1 — What is an LLM?

Before We Start...

Opening Question

Who has used ChatGPT, Claude, or Gemini? What did you use it for?

And — has anyone ever seen it give a wrong or weird answer?

By the end of today you'll understand exactly why those wrong answers happen — and be able to explain it to someone else.

Part 1 — What is an LLM?

Large Language Model

An AI trained on massive amounts of text to predict what word (or token) comes next

💡 Think: autocomplete on your phone — but trained on billions of pages of text and vastly more sophisticated

ChatGPT

by OpenAI

Claude

by Anthropic

Gemini

by Google

Llama

by Meta (open source)

All of these work on the same core principle.

Part 2 — Tokens & Training

Tokens: The Building Blocks

LLMs don't read words like we do.

They read tokens.

1 token ≈ ¾ of a word, or about 4 characters

Part 2 — Tokens & Training

Tokens in Practice

"Hello world"

= 2 tokens

"Unbelievable"

= 3 tokens
un · believ · able

"I love pizza"

= 4 tokens

"Artificial intelligence"

≈ 5–6 tokens

Quick Activity — Pair up

Guess the token count: "The quick brown fox jumps over the lazy dog"

Answer: ~10 tokens. Why does this matter? Every model has a token limit.

Part 2 — Tokens & Training

Why Tokens Matter

Every LLM has a token limit — how much it can read and generate at once
Token limits affect pricing — companies charge per token used
Token limits create context windows — we'll get to that shortly
The AI isn't reading the way you read — it's processing statistical units

Think About It

If a model costs $0.01 per 1,000 tokens, and your app sends 500 messages a day averaging 200 tokens each — how much does that cost per month?

Answer: 500 × 200 = 100,000 tokens/day × 30 = 3M tokens = $30/month. Real product design involves this kind of thinking.

Part 2 — Tokens & Training

How LLMs Are Trained

1. Collect Data

Books, web, code,
Wikipedia, Reddit…

→

2. Learn Patterns

Predict next token,
billions of times

→

3. Fine-Tune

Specialise for
a specific task

Class Question

"If we train an LLM on internet data — what problems could that cause?"

Expected: bias, misinformation, offensive content, outdated information, underrepresentation of some languages

Part 2 — Tokens & Training

What "Learning Patterns" Actually Means

"The cat sat on the ___"

The model guesses: mat, floor, chair, table

It gets feedback on which predictions are statistically likely.

It repeats this billions of times across billions of sentences.

⚠️ Key Point

The model is learning statistical patterns, not facts about the world. This distinction explains almost every limitation we'll cover today.

Part 2 — Tokens & Training

Fine-Tuning

After basic training, companies specialise the model for specific tasks or behaviours:

ChatGPT / Claude

Fine-tuned to be helpful, harmless, and honest in conversation

GitHub Copilot

Fine-tuned specifically for writing and explaining code

Medical LLMs

Fine-tuned on clinical notes, research papers, patient data

Your project

You'd fine-tune (or prompt-engineer) for your specific use case

Part 3 — Context Windows

Context Windows

🧠 Imagine a friend who can only remember the last 10 sentences you said — everything before that is gone.

LLMs have no persistent memory between conversations by default
Each conversation has a token limit — this is the context window
Once exceeded, the model forgets the beginning
The app might save history — but the model doesn't remember you

Part 3 — Context Windows

Context Window — In Practice

Example: You have a 10,000-word tutoring session with a model that has an 8,000-word limit.

→ The model silently forgets the first 2,000 words.

→ If you refer back to something from the start, the model has no idea what you mean.

Design Challenge — 3 minutes

You're building an AI study buddy. A student uses it for 2 hours. How do you stop the model losing important context?

Ideas to draw out: summarise key facts at intervals, save student name/topic/goals separately, prompt the model to recap periodically

Part 4 — Why LLMs Fail

Hallucinations

When AI confidently states false information

⚠️ Most Important Point Today

The AI is not lying. It has no concept of truth. It is always doing the same thing — predicting the most plausible next token. Sometimes that token is wrong.

Real Example 1

Lawyers submitted a brief with completely made-up court cases that ChatGPT generated

Real Example 2

Students receive realistic-looking but entirely fabricated academic citations

Part 4 — Why LLMs Fail

3 Causes of Hallucinations

1

Pattern matching without understanding

LLM sees "[Country]'s capital is [City]" enough times. Ask about a made-up country — it invents a plausible city.

2

Gaps in training data

If the event, place, or person isn't well represented in training data, the model guesses from similar things it does know.

3

Ambiguous prompts

"Tell me about the Paris incident" — which one? The model assumes and presents assumptions as facts.

Part 4 — Why LLMs Fail

Spot the Hallucination 🎯

Class Activity

I'll show you 3 statements. For each: Real fact or Hallucination? Discuss with a partner for 30 seconds, then we vote.

Statement 1: "The Eiffel Tower was built between 1887 and 1889." → ✓ Real

Statement 2: "The Great Wall of China is visible from space with the naked eye." → ✗ Myth — widely repeated, so LLMs repeat it too

Statement 3: "Einstein won the Nobel Prize for his theory of relativity." → ⚠️ Partially wrong — he won it for the photoelectric effect. Most dangerous type of hallucination.

Part 4 — Why LLMs Fail

Other Key Limitations

📅 No real-time info
Training has a cutoff date. No live news, weather, prices, or recent events without external tools.

🤖 No true understanding
Pattern recognition ≠ comprehension. No lived experience, emotions, or common sense reasoning.

⚖️ Biased outputs
Reflects biases in training data — gender, culture, and language. English hugely overrepresented vs other languages.

🧠 Context window limits
Forgets long conversations. Critical to design around if building real products.

Part 4 — Why LLMs Fail

Common Misconceptions

❌ Wrong ideas

"LLMs lie when they hallucinate"
"LLMs understand language like humans"
"More training data = no hallucinations"
"The AI remembers our past chats"

✓ Correct understanding

LLMs predict plausible text — they don't know truth
Pattern recognition ≠ comprehension
Gaps + ambiguous prompts still cause hallucinations
Context window = short-term only, resets each session

Part 4 — Why LLMs Fail

A Question Closer to Home

Class Discussion

"Why might an LLM perform significantly better in English than in Arabic?"

Answer: The internet has vastly more English-language content than Arabic. The model learned from what existed — so its English patterns are far richer, more nuanced, and more accurate than its Arabic ones.

Bigger implication

This means AI tools built on these models may serve Arabic speakers worse — a real equity issue in AI development.

Summary

Week 1 — Everything in One Slide

How LLMs Work

Tokens: unit of text (~¾ word)

Training: data → patterns → fine-tune

Context window: short-term memory with a hard token limit

Why They Fail

Hallucinations: 3 causes — patterns, data gaps, bad prompts

No real-time info: knowledge cutoff

Bias + no true understanding

Exit Ticket

Before You Go ✍️

Exit Ticket — write your answers on a piece of paper

Answer all 3 in 5 minutes:

1

In one sentence: what is a context window and what happens when it's exceeded?

2

Name one cause of hallucinations and give a real-world example of why it's dangerous.

3

Why might an AI product work better for some groups of people than others?

📅 Next Week

You'll test a real AI tool, try to make it hallucinate, and explore what jobs exist in AI. Come ready to experiment.