LLMs in 10 Minutes — Chapter I
A Non-Technical Primer for Legal Professionals. What an LLM is, how it processes tokens, what training data means, and three essential questions before using AI output in legal work.
A Non-Technical Primer for Legal Professionals
What is a Large Language Model?
A Large Language Model (LLM) is a type of AI trained on enormous quantities of text — books, websites, legal databases, academic papers, and more. During training, it learns statistical patterns: which words tend to follow which other words, in what contexts, in what combinations.
It does not understand language the way a human does. It predicts it.
Key concept: An LLM is a very sophisticated autocomplete engine. When you type a prompt, it generates the most statistically likely continuation — it does not retrieve information from a database, look anything up, or apply reasoning the way a trained professional does.
The Token, Not the Word
LLMs do not process words — they process tokens. A token is roughly a word-fragment: for example, unenforceable might be split into un, enforce, able. The model predicts each next token in sequence.
This matters to you because:
- Long prompts may be truncated if they exceed the model’s context window (its short-term memory).
- Unusual legal terms may be poorly handled if they were rare in the training data.
- Formatting instructions are token-heavy — keep your prompts focused.
What Training Data Means
The model was trained on text that existed up to a certain date — its knowledge cutoff. It has no access to:
- Legislation amended after that date
- Cases decided after that date
- Your client’s documents, unless you paste them in
- Internal firm knowledge or your jurisdiction’s unreported decisions
- Any live database, court portal, or registry
Think of the model as a very well-read junior who finished their reading six to eighteen months ago, has not practised, and cannot check anything. Fluent. Impressive. Needs supervision.
Three Questions Before You Use AI Output
Apply these before using any AI-generated content in legal work:
- Is this output fact-sensitive? If yes: verify every specific claim independently.
- Does this rely on recent law? If yes: check the current version of the statute or rule.
- Is this output going to a client or a court? If yes: full professional review required — not a skim.
Legitimate Uses vs. High-Risk Uses
| Good uses of LLMs | High-risk uses of LLMs |
|---|---|
| Drafting first versions of standard letters | Citing cases or statutes without verification |
| Explaining legal concepts in plain English | Advising on specific facts without review |
| Summarising long documents you have read | Processing confidential client data on free tools |
| Generating a list of issues to consider | Treating AI output as a finished work product |
| Improving the clarity of your own drafts | Using AI for advice in fast-moving legal areas |
Quick Reference Glossary
| Term | Plain English meaning |
|---|---|
| LLM | Large Language Model — the underlying AI technology |
| Token | A chunk of text the model processes (roughly a word-fragment) |
| Context window | How much text the model can hold in working memory at once |
| Training cutoff | The date beyond which the model has no knowledge |
| Prompt | The instruction or question you give the model |
| Hallucination | A confident, fluent, but factually incorrect output |
Part of the AI Foundations for Lawyers series — Chapter I.