The complete llms.txt guide for 2026
Everything you need to know about llms.txt — what it is, where it came
from, what AI models actually do with it, how to write one correctly, and the
mistakes that quietly tank the whole exercise.
What is llms.txt?
llms.txt is a plain-text file you put at the root of your domain — at
https://yoursite.com/llms.txt — that tells large language models
what your site is about, which pages matter, and how to navigate it. It's
written in Markdown. It's small. It exists for one reason: AI models have
very limited context windows, and they can't read your whole site, so you
give them a curated map instead.
The standard was proposed by Jeremy Howard of Answer.AI in September 2024, and over the following eighteen months it became the de facto convention for what's now called Generative Engine Optimization, or GEO — the practice of getting your site cited by ChatGPT, Claude, Perplexity, Gemini, and the other large language models that have started to replace the traditional search box.
If you've ever written a robots.txt or a sitemap.xml,
you already understand the shape of llms.txt. It's the same idea —
a small file at a well-known URL that gives automated systems structured
hints about your site — except the audience is language models rather
than search-engine crawlers, and the format is Markdown rather than text
directives or XML.
Why it exists (the actual problem)
When you ask ChatGPT "what is FastHTML and how do I get started," ChatGPT might do three things behind the scenes: search the web, pick a handful of pages, and try to read enough of them to give you a sensible answer. The second and third steps are where modern sites fall over.
A typical modern web page is 200KB to several megabytes of HTML, CSS, JavaScript, third-party scripts, cookie banners, navigation chrome, and ads. An LLM's context window — even a generous one — is small relative to that. The model has to either render the page through some kind of browser-like pipeline (slow, expensive, fragile) or strip the markup down to text (loses structure, mangles tables, drops semantics). Either way, by the time the actual content reaches the model, it's degraded.
Worse, the model has no idea which pages on your site are important.
Is your homepage the canonical statement of what you do? Is the About
page the one to read for context? Is your documentation under /docs,
/help, /wiki, or somewhere else? Without explicit
guidance, the model guesses. It often guesses wrong.
llms.txt solves both problems at once. It's a small,
pre-cleaned, Markdown-formatted document that says: "This is what this
site is. Here are the pages that matter. Here's where to go for deeper
information." The model spends a fraction of its context budget and walks
away with an accurate picture of your site.
Who actually reads llms.txt
This is the question every skeptical post about llms.txt opens with, and it deserves an honest answer. As of mid-2026, support is uneven but growing. Here's the realistic state of play:
| Crawler | User-agent | Status |
|---|---|---|
| ChatGPT / OpenAI | GPTBot, OAI-SearchBot | Fetches llms.txt |
| Claude / Anthropic | ClaudeBot, Claude-Web | Fetches llms.txt |
| Perplexity | PerplexityBot | Fetches llms.txt |
| Google (Gemini, AI Overviews) | Googlebot, Google-Extended | Indexes the file; weight unclear |
| Microsoft Copilot / Bing | Bingbot | Indexes the file; weight unclear |
| Grok / xAI | xAI-Bot | Fetches llms.txt |
| Mistral | MistralBot | Partial |
| Meta / Llama | Meta-ExternalAgent | Partial |
None of these companies have published a formal commitment that says
"we use llms.txt and weight it at X." What you'll see in practice is
that crawlers fetch the file, log it, and use it as one signal among
many. The honest framing is closer to schema.org markup in
2014 than to robots.txt in 2024: not strictly required, not
universally honored, but adopted quickly enough that not having
one is starting to look like a tell.
The defensible position in 2026: ship an
llms.txt because it costs you almost nothing, it makes
your site more legible to anything that does read it, and the
downside if no model ever reads it is zero. It's a hedge with no
premium.
The spec, line by line
The formal specification at llmstxt.org is short — short enough to walk through end to end. Here's the structure, with every section explained.
# Site or project name
> A one-paragraph blockquote summary.
> This is the only section besides the H1 that is parsed structurally.
Optional free-form Markdown details about the project. Paragraphs,
lists, anything except headings.
## Docs
- [Page name](https://example.com/page): Optional one-line description
- [Another page](https://example.com/another): What it covers
## Examples
- [Example](https://example.com/example): One-line context
## Optional
- [Less critical link](https://example.com/extra): Skippable if context is tight
The structural rules are precise and worth memorizing:
- Exactly one H1. This is the only required element. It's the name of the site or project, not a tagline.
- An optional blockquote. This is your "elevator pitch" summary. Models often quote it verbatim when asked "what is this site about." Make it good. One or two sentences, plain English.
- Free-form Markdown after the blockquote can contain paragraphs, lists, and emphasis — anything except additional headings, until you hit the H2 sections.
- H2 sections containing link lists. Each H2 is a category (Docs, Tutorials, API Reference, Blog, Examples). Each list item is a Markdown link, optionally followed by a colon and a one-line description.
- The "Optional" H2 is special. Links here can be skipped by parsers that need a shorter context. Use it for secondary material — appendices, deeper references, anything not essential to understanding what your site is.
What's not allowed: images, HTML, tables, code blocks, additional H1s, or nested headings inside the H2 link sections. Parsers expect Markdown text and Markdown lists only. The simpler you keep it, the more reliably it's read.
llms.txt vs llms-full.txt
The base llms.txt file is a map. It tells a model
where to go, but to follow the links the model still has to fetch each
page. For documentation sites and other content where you want the model
to have everything in one shot, there's a companion file:
llms-full.txt.
llms-full.txt contains the actual content of every
page on your site, concatenated into a single Markdown document. No
navigation, no boilerplate, no chrome — just the words. A model can
download one file and have a complete picture of your product without
making fifty follow-up requests.
Here's a simplified mental model of the two:
| llms.txt | llms-full.txt | |
|---|---|---|
| Purpose | Table of contents | Full text |
| Typical size | 1–10 KB | 100 KB – several MB |
| Best for | Any site | Docs, knowledge bases, technical content |
| Update frequency | When structure changes | Every meaningful content change |
| Model behavior | Follow links as needed | Read once, answer in detail |
For a marketing site or a small blog, llms.txt alone is
plenty. For a documentation site, an API reference, a product manual, or
anything you actually want models to be able to answer questions
about, ship both. The pairing was popularized by Mintlify and
Anthropic in early 2025 and has stuck.
How it relates to robots.txt and sitemap.xml
These three files cover three different audiences with three different purposes. They complement each other; they don't replace each other.
| File | Audience | Question it answers |
|---|---|---|
robots.txt |
All automated crawlers | "What am I allowed to fetch?" |
sitemap.xml |
Search-engine indexers | "What pages exist and when did they change?" |
llms.txt |
Language models at inference time | "What is this site and what should I read?" |
A common confusion: llms.txt doesn't block anything.
It's not a permissions file. If you want to keep crawlers off certain
paths, that goes in robots.txt. If you want them to know
where your pages live for indexing, that's sitemap.xml.
llms.txt is purely a here's what we are document.
One subtle interaction: your llms.txt should not link to
pages you've blocked in robots.txt. Some crawlers will
silently drop the inconsistent links; others might log an error and
deprioritize the whole file. Keep the three in agreement.
Writing a good llms.txt
The mechanical rules above are necessary but not sufficient. A
good llms.txt — one that actually changes how
models talk about your site — follows a few principles that aren't in
the spec.
Lead with the summary that you want quoted back
The blockquote at the top is the single most important sentence in the file. When a model is asked "what is yoursite.com," there's a strong chance it will paraphrase your blockquote. Write it the way you'd want it to appear in a Perplexity answer or a ChatGPT citation. No marketing fluff, no superlatives, no "leading platform for." Plain language, what the site does, who it's for.
# Lab451
> Lab451 generates llms.txt, llms-full.txt, sitemap.xml, and robots.txt
> for any public website in about 30 seconds. Free for sites under 300
> pages. No signup required for the basic flow.
Compare that to what a marketing draft would have produced ("Lab451 is the leading AI-discoverability platform empowering websites to maximize their generative engine presence"). The first version is what a model will repeat. The second is what a model will quietly rewrite into the first.
Group links by intent, not by site architecture
The H2 sections shouldn't mirror your top nav. They should mirror the questions a user might ask a model. If someone asks "how do I get started with X," the model should find a section called Getting Started or Quickstart. If they ask "what does X cost," there should be a Pricing section. Think of the H2s as the answers to the questions you'd expect, not as a sitemap.
One-line descriptions earn their keep
The [Title](url): description pattern is optional, but the
description is often what tips a model toward citing one page over
another. Keep descriptions to a single line. State what the page covers,
not how good it is.
## Docs
- [Getting started](https://lab451.org/docs/quickstart):
Generate your first set of files in under a minute.
- [API reference](https://lab451.org/docs/api):
Endpoints, authentication, rate limits, response formats.
- [WordPress integration](https://lab451.org/docs/wordpress):
Drop-in instructions for self-hosted and managed WordPress sites.
Use the Optional section for ballast
Old blog posts, deep references, legal pages, anything you'd be happy for a model to read if it had spare context but wouldn't lose sleep over. The Optional section is permission to include them without crowding the must-reads.
Keep the file under 10 KB if you can
The spec doesn't set a size limit, but in practice the entire
llms.txt for most sites should fit comfortably under 10 KB.
If yours is larger, you're probably listing pages that belong in the
sitemap, not the llms.txt. The map is curated; the sitemap is
comprehensive.
Real examples worth copying
Three llms.txt files in the wild that get the format right and are worth studying:
- FastHTML — the original reference implementation by Jeremy Howard. Clean H1, tight blockquote, sensible H2 grouping, well-used Optional section. The canonical example.
- Anthropic Docs — large documentation site, organized by product surface. Notice how the H2s map to how a developer would ask for help, not how the docs are filed.
- Lab451 — short and direct. Useful as a template for marketing sites that don't need the full content treatment.
If you study these in a text editor (right-click, View Source) you'll notice they all share something: there's no clever formatting. No bullet points masquerading as paragraphs, no headings used for emphasis, no nested structures. The spec rewards restraint.
Common mistakes
The errors that quietly tank an llms.txt are mostly format
violations the model parses silently and then ignores. Here's the list,
in rough order of how often we see them.
Treating the H1 as a tagline
The H1 is supposed to be the name of the site or project, full stop.
"# The world's best widget for marketers" is not a name.
"# WidgetCo" is. Save the positioning for the blockquote.
Skipping the blockquote
Without the blockquote, models lose the structural cue that tells them "this is the summary." They fall back to inferring it from the link descriptions or the page contents, which is the slow, lossy path you were trying to avoid in the first place.
Linking to pages that 404 or redirect
Sounds obvious; happens constantly. A model that follows a link in your
llms.txt and hits a 404 will deprioritize the whole file.
Treat llms.txt as production output and check it the same
way you check your sitemap.
Adding headings inside the H2 sections
The structure is: H1, optional blockquote, optional free-form Markdown, then H2s with link lists. You can't have an H3 inside an H2 link list. If you need to group further, make more H2s.
Stuffing keywords into descriptions
The traditional SEO instinct is to pack the description with
target keywords. In an llms.txt context, this backfires —
models are trained to be suspicious of unnatural language and may
discount the file. Write descriptions the way you'd explain the page
to a colleague.
Forgetting to update it
An llms.txt that's six months out of date and links to a
pricing page that no longer exists is worse than no llms.txt
at all. Models will cheerfully repeat your outdated information. Build
regeneration into the same workflow you use for the sitemap.
Hosting, caching, and deployment
The file lives at /llms.txt at the root of your domain.
Not /static/llms.txt, not /.well-known/llms.txt,
not /seo/llms.txt. Crawlers look at the root. If you can
serve /robots.txt, you can serve /llms.txt —
the deployment story is identical.
A few practical notes:
-
Content-Type: serve as
text/plain; charset=utf-8ortext/markdown; charset=utf-8. Either works. Don't serve asapplication/octet-stream— some crawlers will refuse it. -
Caching: a
Cache-Control: public, max-age=3600header is fine. The file changes infrequently; an hour of cache saves you nothing in bandwidth but prevents stale-serving headaches after updates. - HTTPS: serve over HTTPS. Most crawlers will follow from HTTP, but some increasingly treat insecure responses as a quality signal.
-
Subdomains: each subdomain gets its own
llms.txt.blog.example.com/llms.txtanddocs.example.com/llms.txtare independent files. Models do not look "up" to the apex. -
Multilingual sites: the convention is still
settling. The pragmatic answer for now is one
llms.txtper language subdirectory (/en/llms.txt,/fr/llms.txt) plus a default at the root. Don't try to mix languages in one file.
Measuring whether it works
The honest answer here is that GEO measurement is in roughly the same place that SEO measurement was in 2002. There's no Google Search Console for "how often does ChatGPT cite you." There are a few proxies, and that's about it.
Things you can measure:
-
Crawler hits on llms.txt in your server logs. Filter
by user-agent (
GPTBot,ClaudeBot,PerplexityBot, etc.). Frequency is your strongest signal that the file is being consumed. -
Referrer traffic from AI chat interfaces.
chat.openai.com,perplexity.ai,claude.ai,gemini.google.com. Numbers will be small but rising. Tag them in your analytics. -
Direct citation checks. Periodically ask each major
LLM "what is yoursite.com" and see what they say. Save the answers.
Track changes after you update
llms.txt. This is manual, annoying, and the most reliable signal you'll get in 2026. - Brand mention monitoring. Tools like Mention, Brandwatch, and newer GEO-focused services (Profound, Goodie, Otterly) scrape LLM responses at scale. The category is young; the tools are improving fast.
What you can't reliably measure: ranking, share-of-voice against competitors, or click-through rate from AI answers. The industry hasn't built that infrastructure yet. Anyone who claims they have is selling something.
Where this is heading in 2026 and beyond
A few trends worth watching, beyond the basic adoption curve:
Markdown shadow pages
The same proposal that introduced llms.txt also suggests
that important pages should be available at their URL with .md
appended — so https://example.com/docs/intro.html would
also be reachable at https://example.com/docs/intro.html.md
as a clean Markdown version. The FastHTML and nbdev ecosystems already
do this. Expect more documentation generators to follow.
Pay-per-crawl economics
Cloudflare's mid-2025 launch of bot-payment infrastructure introduced
the idea that AI crawlers might pay micropayments for the content they
fetch. By late 2026, expect llms.txt files to start
including pricing metadata for their referenced URLs — a 402 Payment
Required handshake for the high-value pages. This is speculative but
plausible.
Verification and signed manifests
As llms.txt adoption grows, so does the incentive for sites to lie in
them. A pricing page that claims to be free, an outdated API spec
presented as current — the trust problem is obvious. Expect 2027-era
extensions that allow signed llms.txt files
(probably via DNS TXT records or HTTPS certificate metadata) for sites
that want to make verifiable claims about themselves.
Convergence with schema.org
There's an obvious overlap between llms.txt's "tell models
what this site is" purpose and schema.org's "tell search engines what
this entity is" purpose. The two haven't merged; they're solving
related problems with different tools. Watch for proposals that
reference schema entities from inside llms.txt, or
schema.org WebSite objects that point to llms.txt
locations.
Frequently asked questions
Do I need an llms.txt if I already have a sitemap.xml?
Yes. They serve different audiences and answer different questions. The sitemap tells indexers what exists. The llms.txt tells models what matters. Most sites should have both.
Will llms.txt help me rank in Google search?
Almost certainly not directly. Google has been clear that Googlebot doesn't use llms.txt as a ranking signal. Where it might help is in Google's AI Overviews, where Gemini does seem to weight clean, structured content sources. The honest answer: small, indirect, and not the main reason to ship one.
Should I block AI crawlers in robots.txt instead of writing an llms.txt?
That's a separate decision. If you don't want AI models reading your content, block them in robots.txt and skip llms.txt entirely. If you do want them reading your content, write an llms.txt that makes them efficient at it. The two strategies aren't on a spectrum; they're opposite ends.
Can I dynamically generate llms.txt server-side?
Yes, and many large sites do. The file is just text — there's no requirement that it be a static file. A common pattern is to generate it at build time, cache it for an hour, and regenerate on content changes. Just make sure the response is fast (under 200ms) and stable (returns the same content for the same URL within a reasonable window).
What's the difference between llms.txt and AI.txt?
ai.txt was a 2023 proposal from Spawning that focused on
consent — telling AI training pipelines whether they could use your
content. llms.txt, proposed a year later by Answer.AI,
focuses on structure — telling AI models how to use your site
at inference time. They're orthogonal; some sites have both. The
industry's center of gravity is firmly on llms.txt.
How often should I regenerate llms.txt?
Whenever your site's structure changes meaningfully — new product pages, restructured documentation, retired sections. For active sites, monthly is a reasonable cadence. For mostly-static sites, regenerate when you ship a notable update. Tying it to your existing sitemap-regeneration workflow is the cleanest pattern.
Do I need to submit llms.txt anywhere?
No. Unlike sitemap.xml (which you can submit to Google Search Console),
there's no "submit to AI" button. Crawlers find /llms.txt
by convention, same as they find /robots.txt. Put it at
the right URL and the right crawlers will fetch it on their next
visit.
Generate yours in 30 seconds
Lab451 produces a spec-compliant llms.txt — plus
llms-full.txt, sitemap.xml, and
robots.txt — for any public website. Free for sites
under 300 pages. No signup, no plugin, no OAuth dance.