llms.txt vs llms-full.txt: what's the difference?
They sound nearly identical. They live next to each other at the root of your site. They serve the same broad audience. But they do fundamentally different jobs — and knowing which one to ship (or whether to ship both) is the difference between AI models understanding your site and AI models actually answering questions about it.
The TL;DR
If you only read one paragraph: llms.txt is a small Markdown
file that tells AI models what your site is and where the important
pages live. llms-full.txt is a much larger Markdown file
that contains the actual text of those pages already extracted,
cleaned, and concatenated. The first is a table of contents; the second
is the book.
Marketing sites and small blogs usually only need llms.txt.
Documentation sites, API references, knowledge bases, and anything
technical where you actually want models to be able to answer
questions about your content should ship both.
Two files, two jobs
The reason there are two files instead of one comes down to a tradeoff every LLM has to make: context window space is expensive, but fetching pages is slow. Different situations call for different sides of that tradeoff.
When a user asks ChatGPT "what does Lab451 do," the model needs the
smallest possible amount of context to answer accurately. It doesn't need
your full pricing page, your terms of service, or every blog post — it
needs a sentence or two. llms.txt is exactly that: a tiny
file the model can fetch in milliseconds, parse in a few tokens, and use
to give a quick, accurate one-paragraph answer.
When a user asks ChatGPT "what's the best way to set up llms-full.txt
for a 500-page documentation site," the model needs much more. It needs
to understand your full documentation, find the relevant sections, and
synthesize specifics. Following links one by one from llms.txt
would mean ten or twenty separate fetches and a lot of wasted context.
llms-full.txt sidesteps that entire dance: download once,
answer in detail.
The two files don't compete — they complement each other. The model can grab whichever is the right tool for the question being asked. Some models check both in sequence; some pick one based on the query depth. Either way, having both available means you've covered both ends of the tradeoff.
Side by side
| llms.txt | llms-full.txt | |
|---|---|---|
| Purpose | Map of your site | Full text of your site |
| Format | Structured Markdown (H1 + blockquote + H2 link lists) | Free-form Markdown (concatenated page bodies) |
| Typical size | 1–10 KB | 100 KB – several MB |
| Token cost when read | ~250 – 2,500 tokens | ~25,000 – 1,000,000+ tokens |
| Update frequency | When site structure changes | Every meaningful content change |
| Best for | Any site (marketing, blog, SaaS landing) | Docs, API references, knowledge bases, technical content |
| Model behavior | Read once, follow links as needed | Read once, answer from memory |
| Hosted at | /llms.txt |
/llms-full.txt |
| Required? | Recommended for everyone | Recommended for content-heavy sites |
Format differences in detail
Both files are Markdown, both are plain text, both live at the root of your domain. The differences are structural.
llms.txt: prescriptive structure
llms.txt follows a tight, parseable shape. There's exactly
one H1, an optional blockquote summary, optional free-form Markdown, and
then H2 sections each containing nothing but link lists. A spec-compliant
parser can extract these elements deterministically. Here's the canonical
shape:
# Lab451
> Lab451 generates llms.txt, llms-full.txt, sitemap.xml, and robots.txt
> for any public website in about 30 seconds.
## Docs
- [Getting started](https://lab451.org/docs/quickstart): Generate your first set of files
- [API reference](https://lab451.org/docs/api): Endpoints, authentication, rate limits
## Optional
- [Terms of service](https://lab451.org/terms)
- [Privacy policy](https://lab451.org/privacy)
Notice what's not there: no page content. No paragraphs from the Getting Started doc. No code samples from the API reference. Just titles, URLs, and one-line descriptions. The file is a finger pointing at the pages, not the pages themselves.
llms-full.txt: prescriptive content, flexible structure
llms-full.txt takes a different approach. The format is much
looser — there's no required shape — but the content requirements
are stricter. It should contain:
- The full text of every important page on your site
- Cleaned of navigation, headers, footers, cookie banners, and chrome
- Converted to clean Markdown (or at minimum, structured plain text)
- Concatenated into a single file with clear section breaks between pages
Here's a shortened example of what one looks like in practice:
# Lab451 — Full Documentation
---
## Getting Started
Lab451 generates the four files AI models need to understand your site.
You give it a URL, choose a file type, and click Generate. The crawler
maps your site, extracts content, and produces spec-compliant output.
To get started, visit lab451.org and paste your domain. The free plan
handles sites up to 50 pages without an account...
---
## API Reference
### Authentication
All API requests require a Bearer token in the Authorization header.
Get your token from the Account page under "API Keys"...
### Endpoints
#### POST /api/generate
Generates a single file type for a given domain. Required parameters:
- `domain` — the target URL (must include https://)
- `fileType` — one of: llms, llms-full, sitemap, robots
- `maxPages` — page cap (defaults to plan limit)
...
The --- horizontal rule between sections is convention, not
requirement. What matters is that each page is identifiable as its own
chunk, the headings reflect the original page hierarchy, and the model
can navigate to any subsection by scanning H2s and H3s.
Size, budget, and context windows
The size difference between the two files is enormous, and it has real consequences for how models consume them.
A typical llms.txt file for a small-to-medium site is
between 1 KB and 10 KB. That's roughly 250 to 2,500 tokens — a tiny
fraction of any modern model's context window. Reading
llms.txt is essentially free, which is why models will
cheerfully fetch it on almost any query that touches your domain.
llms-full.txt is a different beast. A documentation site
with 200 pages of meaningful content might produce a 500 KB file —
around 125,000 tokens. That fits comfortably in modern long-context
models (GPT-4o, Claude 4, Gemini 2.5 Pro all handle this easily), but
it's a real chunk of context the model has to weigh against everything
else in the conversation.
The practical limits as of mid-2026:
| File size | Token equivalent | Status |
|---|---|---|
| Under 50 KB | ~12,500 tokens | Read fully by every major model |
| 50 KB – 500 KB | ~12,500 – 125,000 tokens | Read fully by long-context models; chunked or summarized by smaller ones |
| 500 KB – 2 MB | ~125,000 – 500,000 tokens | Read partially; models may retrieve only relevant sections |
| Over 2 MB | 500,000+ tokens | Usually retrieval-only; rarely loaded whole |
The honest takeaway: if your llms-full.txt is over a
megabyte, you're approaching the practical ceiling. Beyond that, models
increasingly fall back to retrieval-style consumption (grep for relevant
chunks) rather than holistic reading. That's not necessarily bad — it
still works — but it changes the equation. For the largest sites, the
answer isn't "bigger llms-full.txt"; it's "smarter chunking and
well-named sections."
Rule of thumb: aim for an llms-full.txt
under 500 KB if possible. If you're over that, scrutinize what's
actually in there. Old blog posts, deprecated docs, terms of service,
and changelogs rarely earn their place. The point is to give models
the content that actually answers questions, not every word you've
ever published.
Which AI models read which file
As of mid-2026, the picture is uneven but converging. Some crawlers
explicitly fetch both files; some only fetch llms.txt; some
only honor llms-full.txt when explicitly linked. The
practical state of play:
| Crawler | llms.txt | llms-full.txt |
|---|---|---|
| ChatGPT (GPTBot, OAI-SearchBot) | Yes | Yes (when discoverable) |
| Claude (ClaudeBot) | Yes | Yes |
| Perplexity (PerplexityBot) | Yes | Yes |
| Google (Googlebot, Google-Extended) | Indexed | Indexed |
| Bing / Copilot (Bingbot) | Indexed | Indexed |
| Grok (xAI-Bot) | Yes | Partial |
| Mistral, Meta | Partial | Partial |
"Yes" means the crawler reliably fetches the file and there's reasonable evidence it's used. "Indexed" means the file gets indexed alongside other site content but its specific use is unclear. "Partial" means fetching happens but isn't consistent across all queries.
The pragmatic conclusion: any crawler that fetches llms.txt
will follow a link from it to llms-full.txt if one is
listed in the Optional section. So even if a model doesn't crawl
llms-full.txt as a well-known URL, mentioning it from your
llms.txt ensures it gets discovered.
When to ship just llms.txt
There are good reasons to skip llms-full.txt entirely. Ship
only llms.txt if any of these apply:
-
You're a marketing site. Your homepage, About, Pricing,
and Contact pages don't need to be read in detail by an AI model. Users
asking AI about you want a one-paragraph summary;
llms.txtdelivers that perfectly. -
You're a small blog. Individual posts are better read
in their original context (where you have analytics, ads, related-posts
widgets). Pointing models at the posts via
llms.txtis enough; serving the text again inllms-full.txtjust duplicates content without strategic benefit. -
Your content is rapidly changing. A news site,
stock-tracker, or live-event dashboard would have to regenerate
llms-full.txtconstantly. The maintenance overhead outweighs the benefit; better to let crawlers hit the live pages viallms.txt. -
Your content is gated. If most of your pages require
authentication or payment, you can't put their full text in a
publicly-served
llms-full.txt. Listing the public-facing summaries inllms.txtis the right level of disclosure. -
Your content is primarily visual or interactive.
Tools, calculators, configurators, and data visualizations don't
flatten into text well. Pointing models at the tool URL via
llms.txtis fine; trying to describe the UI inllms-full.txtis usually worse than nothing.
When to ship both
The case for shipping both is strongest when you have text-heavy content where the value is in the specifics — exact API parameters, exact installation steps, exact configuration syntax. Specifically, ship both when:
-
You have documentation. If a user could plausibly
ask an AI "how do I do X with your product," and the answer requires
pointing to specific function signatures, command flags, or exact
configuration syntax, you want
llms-full.txt. Models will quote your docs back at users; you want them quoting from a canonical, clean source. -
You have an API. API references thrive in
llms-full.txt. The whole point is that a model can pull up your endpoint table, parameter list, and response format in a single fetch and answer questions accurately. -
You have a knowledge base or help center. Support
articles often answer "how do I X" questions that AI assistants
now field on your behalf. Putting them in
llms-full.txtmeans AI gives the same answer your support team would. -
You have evergreen technical content. Tutorials,
guides, and walkthroughs benefit from being in
llms-full.txtfor the same reason — the value is in the specifics, and you want models quoting your version rather than a stale paraphrase. -
You want to be the canonical source on a topic.
Being in
llms-full.txtraises the odds that a model quotes your phrasing when summarizing the topic. If you've written the definitive guide on something, having it inllms-full.txtis the difference between "according to Lab451" and "according to a guide I read somewhere."
How they work together at inference time
Watching a real model handle a query that triggers both files is instructive. Here's a simplified trace of how a request like "how do I add llms.txt to a Next.js site" might flow through a system that supports both files:
- The model needs to answer a specific technical question. It identifies potentially relevant sites (Lab451 included).
-
It fetches
lab451.org/llms.txtfirst — small, fast, cheap. From this, it learns Lab451 is an llms.txt generator and that there's a "Docs" section containing relevant pages. -
It sees that
llms.txtmentions a/llms-full.txtin the Optional section. The model decides — based on query complexity — to fetch it. -
It pulls down
lab451.org/llms-full.txt, finds the "Adding llms.txt to your site" section, finds the Next.js example, and quotes the relevant configuration directly. - The user gets a precise answer with a citation pointing back to lab451.org.
Without llms-full.txt, step 3 would instead trigger a
chain of fetches — first the Docs index, then the "Adding llms.txt"
page, then the Next.js page — each one a separate request with its
own HTML parsing, navigation chrome stripping, and context cost.
The model probably still gets there, but it takes longer, costs more
context, and is more likely to grab the wrong content along the way.
Both files are tools. llms.txt is the cheap, fast tool
that handles 80% of queries. llms-full.txt is the
heavier tool that handles the 20% where specifics matter. Shipping
both means the model can pick the right one.
Generating llms-full.txt without going insane
The reason a lot of sites only ship llms.txt isn't that
they don't want both — it's that maintaining llms-full.txt
by hand is miserable. Concatenating every page's content into a single
file, stripping nav and chrome, keeping it in sync as the site evolves —
that's a job for a script, not a human.
A few practical approaches:
Static-site generators with built-in support
Mintlify, Fern, Docusaurus (via plugin), and most modern docs platforms
now ship llms-full.txt generation out of the box. If your
docs already build to static HTML, check whether your generator can also
emit llms-full.txt in the same build step. This is by far
the lowest-effort path.
Build-time scripts
For sites without built-in support, a build script can extract Markdown
from your content directory, strip frontmatter or normalize it, and
concatenate everything into a single file at /llms-full.txt.
This works especially well for Hugo, Eleventy, Astro, and Next.js sites
where content already lives as Markdown files.
Crawl-based generation
For sites that don't have source Markdown — WordPress sites, headless CMS sites, or anything served dynamically — the right answer is to crawl your own site and convert each page's rendered HTML to clean Markdown. This is what Lab451 does: point it at your domain and it produces both files in one pass without you needing to write or maintain any scripts.
Hybrid: generate, then curate
The highest-quality llms-full.txt files combine automated
generation with editorial review. Auto-generate the file from your
docs source, then have a human pass over it once to remove anything
that shouldn't be there (deprecated content, internal-only notes,
accidental duplicates). This is overkill for most sites, but for
teams whose AI presence matters strategically, it's worth the
quarterly hour.
Common mistakes
A few specific failure modes to avoid:
Putting full page content in llms.txt
The biggest mistake by far. llms.txt is a map, not the
territory. If your llms.txt contains paragraphs of body
content, you've confused it with llms-full.txt. Move the
body content into llms-full.txt, and put only links and
descriptions in llms.txt.
Linking to llms-full.txt only from llms-full.txt
llms-full.txt isn't typically discovered by crawlers as
a well-known URL the way llms.txt is. The way crawlers
find it is by reading llms.txt and following a link.
So if you ship both, make sure llms.txt mentions
llms-full.txt in its Optional section.
Leaving navigation chrome in llms-full.txt
If your llms-full.txt generator pulls page HTML and
converts to Markdown without first stripping the header, footer,
sidebar, and cookie banner, you end up with a file where 30% of the
content is "Home | About | Contact" repeated 200 times. Models will
still parse it, but they waste context on noise. Clean extraction is
table stakes.
Including pages that shouldn't be there
Search result pages, tag archives, pagination, login pages, and
user-account pages don't belong in llms-full.txt. Filter
them out at generation time. The rule of thumb: if a page has unique,
canonical, evergreen content that someone would want to read, it
belongs. Otherwise it doesn't.
Letting it go stale
An llms-full.txt that's six months out of date is worse
than no llms-full.txt at all — models will confidently
quote outdated pricing, deprecated API endpoints, and old product
names. Tie regeneration to your deploy pipeline if you can, or set a
monthly cron job. Make staleness impossible.
Frequently asked questions
Can I have llms-full.txt without llms.txt?
Technically yes; practically no. llms-full.txt is
discovered via llms.txt in most crawler implementations.
Without an llms.txt linking to it, your
llms-full.txt sits unread. Always ship them together.
Is llms-full.txt just a single big Markdown file?
Yes. The whole point is that a model can fetch one URL and get everything. Splitting it across multiple files defeats the purpose. If your content is genuinely too big for one file, the answer is smarter content curation, not file splitting.
What if my site is mostly visual or interactive?
Skip llms-full.txt. There's no useful way to flatten a
Figma template gallery or an interactive calculator into Markdown.
llms.txt alone, pointing at descriptive pages, is the
right choice.
Does llms-full.txt hurt my SEO?
No. llms-full.txt isn't indexed as a web page in the
traditional sense — it's a resource file, like robots.txt
or sitemap.xml. Google and Bing index its existence but
don't treat it as a duplicate of your pages. The content within isn't
competing with your real pages for rankings.
Should I gate llms-full.txt behind authentication?
No. If you want models to read it, it has to be publicly fetchable.
If you have content you don't want models to see, leave it out of
llms-full.txt rather than trying to gate the file itself.
A gated llms-full.txt is the same as no
llms-full.txt.
What about pricing or rapidly-changing content?
Two options. The simpler one: leave volatile content out of
llms-full.txt entirely, and rely on the model to fetch
the live page via the llms.txt link when asked. The more
sophisticated one: regenerate llms-full.txt on a
short cadence (hourly cron for pricing, daily for changelogs) so it
stays close-enough to current.
How big is too big?
Practical ceiling: around 2 MB. Past that, most models stop reading the file whole and switch to retrieval-style access on chunks. That still works, but you lose the "everything in one shot" property that's the file's whole point. Aim under 500 KB if you can.
Does Lab451 generate both?
Yes — Lab451's "Select All" mode generates llms.txt,
llms-full.txt, sitemap.xml, and
robots.txt in a single pass. Sites under 50 pages are
free, no signup required.
Generate both files in 30 seconds
Lab451 produces spec-compliant llms.txt and
llms-full.txt — plus sitemap.xml and
robots.txt — for any public website. Click Select
All, paste your URL, hit Generate.