Intro to LLMs

Domain experts can make models better before model weights change.

This course teaches the mechanics behind that claim. Start with a visible neural network, move into GPT-2, then use cost, evals, context, and prompt optimization to decide when training is worth it.

Thesis

A content designer, PM, support lead, salesperson, or workflow owner often knows more about the task than the person who can train the model. University makes that expertise legible: define the task, inspect outputs, write rubrics, shape context, and only then escalate to training.

rung 1
capture
rung 2
rubric
rung 3
prompt
rung 4
structure
rung 5
route
rung 6
SFT
rung 7
serve
Order

The course now starts with Neural Network Playground before GPT-2. First, learners see features, weights, training, overfitting, and model size. Then Transformer Explainer shows how language becomes tokens, embeddings, attention, logits, and sampled text.

Curriculum
01

Why domain experts matter

Understand how model behavior can improve before anyone changes weights.

  • Domain experts can shape context, evals, rubrics, examples, and task interfaces before training starts.
  • Technical teams often reach for temperature, raw benchmarks, or model choice before task evidence.
  • Product judgment belongs in the rubric: define success, read outputs, write examples, and decide what a good answer means.
02

What machine learning is

Separate hand-written rules from functions learned from examples.

  • Data, labels, features, loss functions, train/test splits, and held-out evaluation.
  • Use Neural Network Playground to see how data transforms create learnable patterns.
  • Why models generalize, why they overfit, and why product teams need evaluation.
03

How a neural network works

Build an intuition for weights, neurons, layers, parameters, and training.

  • A weight is the strength of a connection; a parameter count is the number of learned knobs.
  • Training is the process of reducing error, not a model realizing it is wrong.
  • Local minima, hyperparameters, and capability jumps: why small networks can struggle before finding a better strategy.
04

Scaling laws and cost

Connect model size, data, training time, hardware, latency, and unit economics.

  • Bigger models, more data, and longer training usually help, but each gain costs more than the last.
  • Orders of magnitude: why a million, billion, and trillion are not just bigger versions of the same number.
  • Why input, output, cached tokens, memory bandwidth, and model size show up as product cost.
05

Tokens and embeddings

Understand why tokens are vocabulary units, not meaning by themselves.

  • Tokenization turns text into pieces the model can process; those pieces do not inherently carry semantic meaning.
  • Embeddings let the model place tokens in a learned space where relationships can emerge.
  • Why tokenizer changes affect cost, quality, prompt behavior, and weird failures like letter counting.
06

Transformers and generation

Trace one prompt through embeddings, attention, logits, and next-token prediction.

  • Use Transformer Explainer to watch GPT-2 process tokens inside the browser.
  • Self-attention is where token relationships begin shaping meaning; causal masking prevents peeking at future tokens.
  • Temperature, top-k, and top-p change sampling behavior, but the real goal is raising the probability of the right next token.
07

Optimization without retraining

Use context and evaluation before paying for model training.

  • Context engineering can move an answer from unlikely to obvious without changing weights.
  • Prompt search, examples, schemas, tool descriptions, routing, and response constraints are all cheaper than retraining.
  • A good prompt optimizer runs candidates against rubrics and held-out tasks instead of relying on taste alone.
08

Post-training and specialist models

Know when to move from prompt optimization to SFT, adapters, rewards, and routing.

  • SFT teaches imitation from good traces; adapters can change a smaller set of weights than full retraining.
  • Teacher-student distillation can move expensive frontier behavior into a cheaper specialist route.
  • Serving validation matters: a higher score must also survive latency, cost, monitoring, and rollback.
09

The Understudy loop

Capture repeated LLM work, evaluate it, optimize it, and only then climb the training ladder.

  • Capture source prompts, model outputs, rubrics, and examples while keeping sensitive data local by default.
  • Generate prompt families, score them, preserve winners, and inspect failure cases.
  • Escalate only when the cheaper rung stops working: prompt, structure, route, SFT, RL, then serving validation.
Demos
first demo

Neural Network Playground

Train a small neural network in the browser while changing datasets, features, layers, activations, and regularization.

open demo
second demo

Transformer Explainer

Trace tokens through embeddings, attention, transformer blocks, logits, and sampling.

open lab
first-party demo

Tokens, context, and temperature

Type a support prompt, inspect token chunks, move temperature, and see why context changes the distribution before sampling does.

open demo
first-party demo

Scaling intuition

Build intuition for orders of magnitude, model size, sequence length, and why scaling wins get expensive quickly.

open demo
first-party demo

Embedding neighborhoods

Click a hand-built embedding space, compare nearest neighbors, and see why relationships can behave like directions.

open demo
first-party demo

KV cache conveyor

Watch a repeated prefix become cached keys and values, then compare fresh context with cached continuation.

open demo
first-party demo

Prompt optimizer walkthrough

Start with one weak notification prompt, score variants against a rubric, and inspect the winning prompt.

open demo

The demos are self-hosted copies of TensorFlow Playground and Transformer Explainer. Upstream license notices and local hosting notes are preserved in this app.

Economics

Model behavior is only half the product decision. This calculator makes the unit economics visible: input tokens, output tokens, cached tokens, and call volume all compound into product margin.

cost calculator
price shape
illustrative, last checked May 2026
each call
$0.01
daily
$118
monthly
$3,546
Glossary
Token

A model vocabulary piece. It might be a word, part of a word, a space, punctuation, or another learned fragment.

Embedding

A learned numeric representation where token relationships can start carrying useful meaning.

Attention

A mechanism for deciding which earlier tokens matter when updating the current token representation.

Parameter

A learned knob in the model. Larger networks have more knobs and more possible strategies.

Temperature

A sampling control that changes how sharply the model favors high-probability next tokens. It is not a fix for unclear context.

Context engineering

Changing the information and structure given to the model so the desired answer becomes more likely.

Post-training

Improving behavior after pretraining through prompts, examples, SFT, adapters, preferences, rewards, or routing.

KV cache

A saved representation of previous token computation that lets a model continue a generation without rereading the whole prefix.

Tools
Tokenization viewerPaste text, inspect token pieces, and see context shift the next-token distribution before temperature changes sampling.live
Scaling intuitionCompare millions, billions, and trillions with seconds, model size, active tokens, and relative compute.live
Embedding neighborhoodsClick through a toy embedding space to inspect neighbors, clusters, and vector analogies.live
KV cache conveyorAnimate prefix processing, cached keys and values, and the difference between fresh input and cached continuation.live
Prompt optimizer walkthroughRun a canned Understudy task, generate prompt families, score outputs with a rubric, and export the winning prompt.live
Final Lab

The practical endpoint is an Understudy run that works on a fresh machine: install prerequisites, choose a provider key, generate prompt candidates, score them with a rubric, inspect failures, and export the winning prompt. The browser walkthrough is live now; the local CLI lab should become the capstone.

step 1
setup
step 2
optimize
step 3
export
Feedback

This is intentionally soft-launched. The most useful feedback is where the mental model breaks: which demo feels too abstract, which concept needs a bridge, and which exercise would make the lesson stick.

send feedback
Sources