All posts
May 10, 2026 7 min read

What Is llms.txt and Why Do AI Engines Read It First

llms.txt is a plain text file that tells AI engines what your site is about. Here's what it contains, who reads it, and how to create one.

What Is llms.txt and Why Do AI Engines Read It First

When Perplexity answers a question in your industry, it reads a file most sites don't know exists. Not your homepage. Not your sitemap. A plain text document called llms.txt, placed at your site's root, written specifically for AI language models to parse.

Most sites don't have one. The ones that do get cited more accurately — and more often.

A plain text file with one job: tell AI what your site is

llms.txt is a convention, not a formal standard. Jeremy Howard proposed the format in September 2024, and it spread quickly because it solved a real problem: AI language models needed a structured way to understand a site without crawling every page to piece it together.

The file sits at yourdomain.com/llms.txt. It uses Markdown formatting. At minimum, it contains a brief description of the site and links to key pages. Optionally, it declares what the AI can and cannot do with the content — search indexing, citations, training data.

What makes it different from every other SEO file is the intent. It's not defensive. You're not blocking anything. You're briefing the AI the same way you'd brief a journalist before an interview — here's who we are, here's what matters, here's where to look.

robots.txt tells crawlers where not to go. llms.txt tells AI what you are.

robots.txt has existed since 1994. It was designed for web spiders that traverse links and build search indexes — the architecture of the early web. The file is fundamentally a blocklist: /admin is off-limits, /api is off-limits, /staging is off-limits. Everything else is fair game.

AI language models work differently. They're not just indexing pages; they're building a representation of what your site means — what problems it solves, what claims it makes, what expertise it demonstrates. A blocklist doesn't help with that. What helps is a structured briefing.

llms.txt fills that gap. A site with a clear, accurate llms.txt is easier for a model to cite correctly in its answers. A site without one gets summarized by inference — the model reads what it can find, guesses at the rest, and produces a description that's often incomplete or slightly wrong. That wrong description then gets cited in AI answers, which get seen by thousands of people who never click through to verify it.

The two files aren't alternatives. They work together. robots.txt sets the access rules; llms.txt sets the narrative.

The format is straightforward. The content is what matters.

A minimal llms.txt looks like this:

# Indexora

> Indexora helps websites become discoverable to AI answer engines by generating
> technical SEO files and scoring AI-readiness via the GEO Score Analyzer.

## Key Pages
- [How It Works](/how-it-works): Overview of the GEO Score Analyzer and file generator
- [Pricing](/pricing): Plans and what's included
- [Blog](/blog): Technical articles on GEO optimization and AI visibility
- [FAQ](/faq): Common questions about AI crawling and file generation

## Permissions
- AI may use this content for search result generation and citations
- AI may not use this content for model training without a separate agreement

The # heading is the site name. The > blockquote is the elevator pitch — two to three sentences that capture what the site does. This is the passage most likely to be cited verbatim when someone asks an AI what Indexora is.

The ## Key Pages section gives the model structured navigation rather than making it guess which pages matter. Include the pages you'd want a journalist to read: your homepage, product pages, and any cornerstone content. Skip checkout flows, legal pages, and anything that exists for operations rather than communication.

Permissions are optional but worth including. As AI content licensing becomes contested — publishers are increasingly suing AI companies over training data — having an explicit statement in your llms.txt creates a record of what you did and didn't authorize.

GPTBot, ClaudeBot, and PerplexityBot already check for it

Three crawlers are most likely to read your llms.txt when they visit: OpenAI's GPTBot (used for ChatGPT's browsing capability and real-time answers), Anthropic's ClaudeBot, and PerplexityBot. All three have been observed prioritizing structured files at the domain root before parsing individual pages.

There's a catch worth understanding: the file only helps if those crawlers can reach your site. A robots.txt that contains User-agent: GPTBot followed by Disallow: / cancels out whatever llms.txt says. The crawlers respect robots.txt first. If you're blocked there, llms.txt is never read.

This is a more common problem than it sounds. Cloudflare's "AI Scrapers and Crawlers" security feature — enabled by default on many zones — adds exactly those disallow rules to your robots.txt automatically. The site owner often doesn't know it's there. The result: an llms.txt file that no AI crawler ever reads, on a site whose owner believes they're AI-optimized.

Check your live robots.txt before assuming your setup is clean.

The file takes 10 minutes to write manually

Open a text editor. Paste the structure above. Fill in your site name, a two-sentence description of what you do, and five to eight links to your most important pages with one-line descriptions. Save as llms.txt. Upload to your server root via FTP, your hosting panel's file manager, or your deployment pipeline.

If you're on a framework like Next.js, the file goes in your public/ directory. Vercel, Netlify, and Cloudflare Pages all serve it correctly without any configuration.

If your site has more than 50 pages and you're not sure which ones an AI should prioritize, Indexora's GEO Score Analyzer builds the file for you — it reads your existing sitemap and meta descriptions and outputs a structured llms.txt as part of the AI-Ready File Generator package.

Check if yours exists before assuming it doesn't

Open a browser tab and go to yourdomain.com/llms.txt. One of three things happens: 404 (the gap), a blank or near-empty file (worse than nothing — it signals the format without providing any information), or a properly structured file.

If you get a 404, the fix is a text file and an upload. The format is documented publicly. There's no plugin required, no service to pay for, and no technical barrier beyond knowing what to write.

The format is young enough that a well-written llms.txt still stands out. Models encountering a clear, accurate briefing document at a domain root give that content higher confidence when constructing answers. That advantage closes as adoption grows and every site ships a generic template.

The window is open. It won't stay that way.


Frequently Asked Questions

What is llms.txt? llms.txt is a plain text file placed at a website's root that gives AI language models a structured summary of the site's content, key pages, and content permissions. It was proposed by Jeremy Howard in 2024 and is read by AI crawlers like GPTBot, ClaudeBot, and PerplexityBot before they parse the rest of the site.

How is llms.txt different from robots.txt? robots.txt tells web crawlers which pages they cannot access — it is a blocklist. llms.txt is a briefing document that proactively tells AI engines what the site is about, what its key pages are, and how its content can be used. The two files serve different purposes and work together, not as alternatives.

Which AI engines read llms.txt? OpenAI's GPTBot (used by ChatGPT), Anthropic's ClaudeBot, and PerplexityBot have all been documented reading llms.txt files. Google's AI crawlers are also expected to support the format as adoption grows.

Do I need an llms.txt file for my website? Not required, but increasingly important. Sites with a well-structured llms.txt are easier for AI engines to cite accurately. Sites without one get summarized by whatever the model can infer from crawling, which is often incomplete or wrong. If you want to appear correctly in AI-generated answers, llms.txt is one of the most direct ways to influence that.

How do I create an llms.txt file? Write a plain Markdown file with your site name as an H1, a brief description as a blockquote, a list of key pages with short descriptions, and optional content permissions. Save it as llms.txt and upload it to your server root so it's accessible at yourdomain.com/llms.txt. The full spec is at llmstxt.org.