What are AI Readiness Files?

They are structured files you deploy at the root of your website to tell AI systems exactly who you are, what you offer, and where to send people. They speak a format AI already understands, so assistants stop guessing about your brand based on old training data.

Why does AI readiness matter now?

When someone wants a product recommendation, a service provider, or a straight answer, they often ask ChatGPT, Claude, Gemini, or Perplexity before they visit a website. If you are not giving those systems clear, current signals, they guess. And they often guess wrong.

How many files are in the kit?

The kit includes 17 purpose-built files across 7 categories: Identity & Permissions, Content Files, Map & Navigation, Intelligence Files, Research Files, Policy Files, and Operations Files. Each one helps AI understand and represent your site more accurately.

Where should I start?

Start with the critical files: robots.txt, ai.txt, llms.txt, and llms-full.txt. They deliver the biggest immediate impact and are what most AI systems look for first. Then add sitemaps, intelligence JSON files, policies, and operational files using the deployment checklist.

What does robots.txt do for AI?

robots.txt sits at the front door of your website and tells automated visitors (search engines, AI bots, and scrapers) which sections they can access. For AI readiness, it includes instructions for major crawlers like GPTBot and ClaudeBot, pointing them toward your llms.txt and ai-sitemap.xml while keeping checkout, login, and admin pages off limits.

ai.txt is your brand's introduction to every AI system on the internet. Where robots.txt controls access, ai.txt goes further. It covers what you sell, what you are known for, your authoritative topics, and the rules for what AI systems can and cannot do with your content.

How is ai.txt different from robots.txt?

robots.txt controls access: which pages bots can crawl. ai.txt controls identity: who you are, what you offer, and your ground rules for AI use of your content. Think of robots.txt as the doorman and ai.txt as the briefing document you hand a journalist before an interview.

llms.txt is a structured text file built for large language models. It gives AI a quick map of your website: company name, description, organized sections with direct links, and the most common questions your site answers. It follows the llmstxt.org standard used by ChatGPT, Claude, Gemini, and Perplexity.

What's the difference between llms.txt and llms-full.txt?

llms.txt is the cheat sheet: a quick summary an AI can scan in seconds. llms-full.txt is the deep dive with rich descriptions, detailed Q&A, and full category explanations. If llms.txt is a Wikipedia summary, llms-full.txt is the full article.

Why do I need an ai-sitemap.xml?

A regular sitemap lists pages and timestamps. The ai-sitemap adds what each page is about, its content type, and a plain-English summary. It is the difference between a paper road map and GPS with points of interest. AI crawlers can understand pages without fetching and parsing each one individually.

What is ai-intent.json?

ai-intent.json maps real user questions (the things people type into AI assistants) to the best page on your website to answer each one. Without it, AI guesses which page to recommend. With it, you give AI a direct URL for every common query.

What are ai-entities.json and ai-schema.json for?

ai-entities.json is a structured catalog of the important parts of your site: products, categories, services, and key concepts. It powers AI knowledge graphs. ai-schema.json uses the Schema.org standard to describe your organization in machine-readable format so search engines and AI systems can identify you without ambiguity.

What are the RAG index files?

rag-index.json and rag-index.jsonl are ready-made indexes of your site built for AI research and retrieval pipelines. When a user asks a question, AI first pulls the most relevant documents from this index before generating an answer, which keeps responses grounded in your actual content.

What is a training data policy?

training-data-policy.txt sets formal rules for how AI companies can use your content, including model training, RAG indexing, and commercial use. It answers a practical question every brand should decide upfront: what are others allowed to do with your work?

What is ai-disclosure.txt?

ai-disclosure.txt is a public transparency report explaining how your organization uses AI, whether for content generation, recommendations, or customer interactions. It builds trust by answering those questions before someone has to ask.

How often should I update my AI readiness files?

It depends on the file. Update llms.txt and sitemaps monthly or when your site structure changes. Refresh ai.txt and llms-full.txt quarterly or when products change. Review policy files annually. The manifest.json file lists recommended update frequencies for every file in the kit.

How do Map & Navigation files help AI crawl my site?

They help AI discover and understand every page efficiently. ai-sitemap.xml adds content type, topics, and plain-English summaries to each URL so crawlers know what a page is about before fetching it. sitemap.md is the same structure in human-readable markdown, approachable for both teams and LLMs. Files in this section: ai-sitemap.xml (ai-sitemap.xml), sitemap.md (sitemap.md).

Plain-English reference

The AI Readiness Guide

A structured kit of files that tells ChatGPT, Claude, Gemini, and Perplexity exactly who you are before they guess wrong.

The world changed when AI assistants became the first stop for research. Today, when someone wants to find the best product to buy, the right service for their needs, or answers to a specific question, they often ask ChatGPT, Claude, Gemini, or Perplexity before they ever visit a website. Research from Princeton and Georgia Tech (opens in a new tab) formalizes this shift as generative engines that synthesize answers from web sources.

AI crawlers such as GPTBot (opens in a new tab) and ClaudeBot (opens in a new tab) ingest site content using rules you publish in robots.txt. Without clear, current, structured signals from you, assistants guess, and often guess wrong, describing products incorrectly or sending users to better-indexed competitors.

AI Readiness Files solve this problem. They are structured files you deploy to your website, including formats like the llms.txt standard (opens in a new tab), that tell AI systems exactly who you are, what you offer, and where to send people. This guide explains every file in plain English, each with a real-world analogy and a direct answer to the question: what does an AI actually do with this file?

Jump to contents Download PDF View sources

The 7 categories

The 17 files fall into seven natural categories, each playing a distinct role in helping AI understand and accurately represent your website. Structured content and citations can improve visibility in AI-generated answers. See sources.

Identity & Permissions

The files that introduce your brand and control which AI systems get access

2 files

Content Files

The files that give AI the full story about your site in readable text

2 files

Map & Navigation

The files that help AI navigate your site structure efficiently

2 files

Intelligence Files

The files that give AI deep knowledge of your entities, products, and user intents

4 files

Research Files

The files that power AI research and retrieval pipelines

2 files

Policy Files

The files that set the rules for how AI can use your content

2 files

Operations Files

The files that help your team deploy and maintain everything

3 files

Identity & Permissions

The files that introduce your brand and control which AI systems get access

robots.txt

robots.txt ✦ THE DOORMAN

Critical

Controls who gets in and where they can go. One of the oldest files on the web, upgraded for AI. Tells every crawler and AI bot exactly which sections of your site they can access, and points them toward your AI-specific readiness files.

Who reads it

All web crawlers and AI bots

Update when

When site structure changes

Think of it like…

“Imagine a library where some shelves are open to visitors and others are for staff only. robots.txt is the sign at the entrance that tells each visitor which sections they are free to browse.”

What AI systems do with this file

Checks it before crawling any page on your site
Follows Allow and Disallow rules to determine which pages to index
Discovers your AI-specific files through Sitemap references
Learns where to find your llms.txt, ai-sitemap.xml, and other readiness files

Sources

Deploy at: /robots.txt

View sample file

ai.txt

ai.txt ✦ YOUR AI BUSINESS CARD

Critical

Purpose-built for the age of AI. Your brand's complete introduction to every AI system. Contains company identity, what you sell, what you are known for, authoritative topics, and explicit rules for what AI systems can and cannot do with your content.

Who reads it

AI crawlers, LLM training pipelines

Update when

Quarterly, or when products change

Think of it like…

“Think of ai.txt like the briefing document you would hand a journalist before a media interview. It covers everything you want them to know, from your name, your story, your products, and your ground rules.”

What AI systems do with this file

Reads it to learn your brand identity, product categories, and authoritative topics
Uses it to decide how to represent your company in AI-generated answers
Follows your stated training data policy
Uses your own brand language and product names rather than guessing

Deploy at: /ai.txt

View sample file

Content Files

The files that give AI the full story about your site in readable text

llms.txt

llms.txt ✦ THE CHEAT SHEET

Critical

A quick-read summary of your entire website, written specifically for large language models. Follows the llmstxt.org standard. Company name, short description, organized sections for every part of your site with direct links, and the most common questions people ask.

Who reads it

LLMs including ChatGPT, Claude, Gemini, and Perplexity

Update when

Monthly, or when site structure changes

Think of it like…

“llms.txt is like the table of contents and executive summary of your website combined into one short document that an AI can read in seconds, rather than crawling thousands of individual pages.”

What AI systems do with this file

Reads it to get an accurate, structured map of your entire website
Uses your section headers to understand how your site content is organized
Follows your links to find the right pages to recommend
Uses your priority queries to understand which questions your site answers best

Sources

llms.txt standard (llmstxt.org) (opens in a new tab)

Deploy at: /llms.txt

View sample file

llms-full.txt

llms-full.txt ✦ THE DEEP DIVE

Critical

The extended edition. Rich context, detailed Q&A sections, full category explanations, and everything an AI system needs to speak knowledgeably. llms.txt is the back-of-book summary; llms-full.txt is the complete book.

Who reads it

RAG pipelines, AI knowledge bases, advanced LLMs

Update when

Quarterly, or when major product changes occur

Think of it like…

“If llms.txt is a Wikipedia article summary, llms-full.txt is the complete Wikipedia article with all sections, details, and references included.”

What AI systems do with this file

Reads it to generate rich, accurate answers about your products and services
Uses your Q and A sections to respond to common user questions correctly
Feeds it into RAG pipelines to create high-quality context
Uses your detailed descriptions to match the right products to the right users

Sources

llms.txt standard (llmstxt.org) (opens in a new tab)

Deploy at: /llms-full.txt

View sample file

Map & Navigation

The files that help AI navigate your site structure efficiently

ai-sitemap.xml

ai-sitemap.xml ✦ THE GPS

High

An upgraded sitemap with AI-specific directions. Beyond standard URL + lastmod, every entry includes content-type labels, topic descriptions, and plain-English one-sentence summaries so AI knows exactly what each page contains before visiting.

Who reads it

AI crawlers, semantic search indexers

Update when

Monthly, or when new pages are added

Think of it like…

“The ai-sitemap.xml is like a museum floor guide that does not just show you which rooms exist, but describes every exhibit inside each one so visitors know exactly what they will find before they walk in.”

What AI systems do with this file

Uses it to efficiently discover and index all major pages
Reads content-type and topic tags to correctly categorize each page
Uses summaries to understand pages without needing to fetch and parse each one
Prioritizes pages based on your importance scores

Deploy at: /ai-sitemap.xml

View sample file

sitemap.md

sitemap.md ✦ THE HUMAN-READABLE MAP

Medium

A plain-text, markdown-formatted overview of your entire website structure. No XML, no JSON, just organized sections, links, and conversational descriptions. The most approachable file for both humans and AI.

Who reads it

LLMs, developers, content teams

Update when

Monthly, or when site structure changes

Think of it like…

“If your website were a city, sitemap.md would be the welcome brochure handed out at the visitor center, organized, friendly, and giving you just enough information to navigate with confidence.”

What AI systems do with this file

Reads it to build an immediate mental model of your website structure
Uses section organization to understand how your content is grouped
References it when generating navigation suggestions for users
Shares it with content and developer teams as a living reference

Deploy at: /sitemap.md

View sample file

Intelligence Files

The files that give AI deep knowledge of your entities, products, and user intents

ai-entities.json

ai-entities.json ✦ THE ENCYCLOPEDIA

High

A structured catalog of every significant element: product categories, subcategories, services, brands, and key concepts. Each entity has name, type, description, related URLs, and connections. Powers the knowledge graphs AI builds internally.

Who reads it

AI systems, knowledge graph builders

Update when

Quarterly, or when categories change

Think of it like…

“ai-entities.json is like the index of a comprehensive encyclopedia. Instead of page numbers, each entry links to a URL, includes a description, and connects to related topics.”

What AI systems do with this file

Builds a knowledge graph of your products, services, and categories
Correctly classifies what your site sells and how everything is organized
Understands relationships between categories, subcategories, and related items
Provides accurate product and service names rather than approximations

Deploy at: /ai-entities.json

View sample file

ai-intent.json

ai-intent.json ✦ THE TRAFFIC DIRECTOR

High

A lookup table mapping real user questions (the exact things people type into AI assistants) to the single best page on your website to answer them. Transforms AI from a general information source into a precise navigation tool.

Who reads it

AI assistants, conversational AI, AI search engines

Update when

Quarterly, or when site structure changes

Think of it like…

“ai-intent.json is like a knowledgeable store employee who knows the answer to every common customer question and knows exactly which aisle, shelf, or department to point that customer toward.”

What AI systems do with this file

Matches incoming user queries to the correct page on your site
Routes how-to questions to guides and educational content
Routes purchase-intent questions to the right product category pages
Sends users to the most relevant page rather than defaulting to homepage

Deploy at: /ai-intent.json

View sample file

ai-schema.json

ai-schema.json ✦ THE IDENTITY CARD

High

Uses the Schema.org standard (Google, Microsoft, Yahoo, Yandex) in a machine-readable JSON-LD format. Defines organization type, founding date, address, social profiles, contact info, and site search functionality with zero ambiguity.

Who reads it

Search engines, AI systems, developers

Update when

Annually, or when company information changes

Think of it like…

“ai-schema.json is like your company's official government registration filing, a formal, standardized record of who you are, what you do, and where to find you, recognized and understood by every system.”

What AI systems do with this file

Uses it to confidently identify your organization across different web properties
Feeds the structured data into knowledge graphs and entity resolution
Powers enhanced search results with star ratings, hours, and contact details
Links your website, social profiles, and other presence as one unified entity

Deploy at: /ai-schema.json

View sample file

.well-known/ai-plugin.json

.well-known/ai-plugin.json ✦ THE PLUGIN BADGE

Medium

Lives in the .well-known folder, a web standard for machine-readable identity files. Declares your site as an official AI resource: name, description, contact information, product categories, and key URLs in a format AI assistant platforms recognize.

Who reads it

AI assistants, AI plugin registries, developer platforms

Update when

When contact information or product categories change

Think of it like…

“ai-plugin.json is your official membership badge in the AI ecosystem, a verified credential that tells platforms your site is a first-class source for answers about your products and services.”

What AI systems do with this file

Discovers your site through the standardized .well-known location
Reads official name, description, and contact details for plugin integrations
Uses declared categories to match your site to relevant user queries
Treats your site as a registered, authoritative source rather than an unknown URL

Deploy at: /.well-known/ai-plugin.json

View sample file

Research Files

The files that power AI research and retrieval pipelines

rag-index.json

rag-index.json ✦ THE RESEARCH DATABASE

High

A pre-built index designed for Retrieval-Augmented Generation (RAG). A JSON array of records (one per major page/section) containing URL, title, and topics covered. AI systems can load this directly into LlamaIndex, LangChain, Pinecone, Weaviate, etc.

Who reads it

RAG pipeline engineers, AI search builders

Update when

Monthly, or when new major pages are added

Think of it like…

“rag-index.json is like giving an AI researcher a pre-organized card catalog of your entire library. Instead of reading every book, they have a ready-made index they can search in milliseconds.”

What AI systems do with this file

Loads it directly into RAG pipelines like LlamaIndex, LangChain, or Pinecone
Quickly identifies the most relevant pages for any given user query
Builds a searchable knowledge base from your site's content
Retrieves real-time, accurate information to ground AI-generated answers

Deploy at: /rag-index.json

View sample file

rag-index.jsonl

rag-index.jsonl ✦ THE STREAMLINED DATABASE

High

The identical content to rag-index.json, but in JSON Lines (newline-delimited) format. Preferred by large-scale ML pipelines, OpenAI fine-tuning, and streaming vector DB ingestion because records can be processed one-by-one without loading the entire file into memory.

Who reads it

ML engineers, AI developers, vector database ingestion tools

Update when

Always keep in sync with rag-index.json

Think of it like…

“If rag-index.json is a bound book containing all your records, rag-index.jsonl is those same records on individual index cards, far easier to sort, filter, and process one entry at a time.”

What AI systems do with this file

Feeds directly into OpenAI fine-tuning datasets and similar ML workflows
Streams into vector databases one record at a time for memory-efficient processing
Powers LlamaIndex, LangChain, Pinecone, and Weaviate knowledge bases
Enables large datasets to be processed without loading everything into memory

Deploy at: /rag-index.jsonl

View sample file

Policy Files

The files that set the rules for how AI can use your content

ai-disclosure.txt

ai-disclosure.txt ✦ THE TRANSPARENCY REPORT

Medium

Your public statement about how your organization uses AI in products, operations, and customer interactions. Like a privacy policy for the AI era. Answers: Is AI generating content? Making decisions? Can users reach a real person?

Who reads it

AI systems, customers, regulators

Update when

Annually, or when AI tool usage changes

Think of it like…

“ai-disclosure.txt is like the nutrition label on food packaging. It does not tell you whether the food is good or bad. It simply tells you exactly what is inside. That transparency builds trust.”

What AI systems do with this file

Reads it to accurately describe your organization's AI usage to curious users
Uses it to appropriately flag when content from your site is AI-assisted
References it when users ask whether your company uses AI
Evaluates your organization's transparency and trustworthiness

Deploy at: /ai-disclosure.txt

View sample file

training-data-policy.txt

training-data-policy.txt ✦ THE LICENSE AGREEMENT

Medium

Your formal published position on whether AI companies can use your content to train their models. Clearly states what is permitted (e.g. real-time RAG) vs. what requires a license (commercial model training). Protects your organization legally.

Who reads it

AI model developers, AI companies, legal teams

Update when

Annually, or when policy changes

Think of it like…

“training-data-policy.txt is like a Creative Commons license for your website content in the AI era. It states clearly what others can do with your work so there is never any question about what is permitted.”

What AI systems do with this file

Reads it before deciding whether your content can be included in training datasets
Treats content as license-required if your policy restricts commercial model training
Respects your currency warnings to avoid citing stale pricing or inventory data
Cites and links to your original source rather than reproducing content directly

Deploy at: /training-data-policy.txt

View sample file

Operations Files

The files that help your team deploy and maintain everything

structured-data-guide.md

structured-data-guide.md ✦ THE DEVELOPER HANDBOOK

Low

The most technical file, written for your development team, not AI. A comprehensive step-by-step guide with ready-to-use JSON-LD examples for every major page type: Product, LocalBusiness, BlogPosting, Event, etc. Powers rich results and page-level understanding.

Who reads it

Web developers, SEO teams, AI readiness teams

Update when

When new page types are added to the site

Think of it like…

“structured-data-guide.md is like the wiring diagram for your home. Guests never see it, but your electrician absolutely needs it to make sure everything behind the walls works correctly.”

What AI systems do with this file

Reads the structured data embedded on each page to understand its specific content type
Uses Product and Offer schema to accurately describe your inventory and pricing
Uses LocalBusiness schema to surface your locations in local and map-based searches
Uses BlogPosting and Event schemas to correctly classify and date your published content

Deploy at: /structured-data-guide.md

View sample file

manifest.json

manifest.json ✦ THE MASTER INVENTORY

Medium

The single source of truth for your entire AI readiness implementation. Lists every file deployed, what it does, its URL, format, intended audience, and update frequency. Includes a summary scorecard. Lets anyone instantly audit your readiness status.

Who reads it

Developers, deployment tools, AI readiness auditors

Update when

When new files are added to the set

Think of it like…

“manifest.json is like the table of contents for your entire AI readiness project. It tells anyone who opens it exactly what you have built, where everything lives, and how complete the implementation is.”

What AI systems do with this file

Reads it to discover all AI-specific files without needing to guess at URLs
Uses priority rankings to determine which files to read and index first
References update frequencies to know when to revisit and re-index each file
Allows auditing tools to verify complete and correct AI readiness implementation

Deploy at: /manifest.json

View sample file

deployment-checklist.md

deployment-checklist.md ✦ THE LAUNCH PLAN

Internal

The practical playbook your team follows to move from files on a computer to files correctly serving live traffic. Organized in clear phases with verification steps for each critical file. Nothing is skipped, forgotten, or misconfigured.

Who reads it

Web teams, developers, marketing teams

Update when

One-time setup; update when new files are added

Think of it like…

“deployment-checklist.md is like a move-in checklist for a new office. It walks your team through every setup step so that six months later, nobody discovers the internet connection was never properly configured.”

What AI systems do with this file

This file is primarily for your internal team rather than for AI systems directly
Ensures all files are live on the web and returning correct HTTP status codes
Verifies robots.txt has been updated with the correct AI crawler access rules
Confirms that structured data is implemented correctly across all page types

Deploy at: /deployment-checklist.md

View sample file

FAQ

Guide FAQ: questions about each section

Plain-English answers about each section of the AI Readiness Guide and the files it covers.

Read the complete FAQ (57 questions)

What are Identity & Permissions files for?

They introduce your brand to AI systems and control which crawlers can access your site. This section covers robots.txt, which sets crawler access rules and points bots to your AI files, and ai.txt, your brand's full introduction: identity, offerings, authoritative topics, and content-use rules. Deploy these first: they are the front door and briefing document for every AI system.

Read the Identity & Permissions section →

What are Content Files and why does AI need them?

Content Files give AI the full story about your site in readable text. llms.txt is the quick map with company name, sections, links, and top questions, following the llms.txt standard. llms-full.txt is the deep dive with rich descriptions and detailed Q&A. Together they let assistants understand your site without crawling thousands of pages.

Read the Content Files section →

What do Intelligence Files do for AI accuracy?

They give AI structured knowledge of your entities, user intents, and organization identity. ai-entities.json catalogs products, services, and key concepts. ai-intent.json maps real user questions to the best page on your site. ai-schema.json publishes Schema.org JSON-LD. .well-known/ai-plugin.json registers your site as an official AI resource. This is where precision happens: fewer guesses, better recommendations.

Read the Intelligence Files section →

What are Research Files and who uses them?

Research Files power AI retrieval and RAG pipelines. rag-index.json is a pre-built JSON index of your major pages (URL, title, topics). rag-index.jsonl is the same data in JSON Lines format for streaming ingestion into vector databases and ML workflows. Engineers and AI search builders load these directly into LlamaIndex, LangChain, Pinecone, and similar tools.

Read the Research Files section →

Why do I need Policy Files for AI readiness?

They set the rules for how AI may use your content and build trust with users. training-data-policy.txt states what is permitted for model training, RAG indexing, and commercial use. ai-disclosure.txt explains how your organization uses AI in products and operations. Both protect you legally and signal transparency to customers and AI systems.

Read the Policy Files section →

What are Operations Files and who are they for?

They help your team deploy, verify, and maintain the full kit. structured-data-guide.md gives developers JSON-LD examples for every page type. manifest.json is the master inventory and readiness scorecard. deployment-checklist.md is the phased launch playbook with verification steps. These are primarily for your web team, but manifest.json also helps auditors and AI systems discover what you have deployed.

Read the Operations Files section →

Ready to generate your kit?

Use the AI Readiness skill or MCP server to research your site and generate all 17 files, then deploy them to your web root.

Connect via MCP Download the skill Back to homepage

Research & standards

Sources cited on this page

Claims about how AI assistants and crawlers use web content are grounded in published research, open standards, and vendor documentation, not marketing guesswork.

Generative engine research

GEO: Generative Engine Optimization (Princeton, Georgia Tech, et al.) (opens in a new tab)
Foundational KDD 2024 study on how content structure and citations affect visibility in AI-generated answers (up to ~40% lift in controlled tests).

LLM site summary standard

llms.txt standard (llmstxt.org) (opens in a new tab)
Community specification for a machine-readable site summary that LLMs can load in one context window.

AI permissions & sitemaps

ai.txt permissions standard (aitxt.org) (opens in a new tab)
Open standard for machine-readable AI permissions, identity, and sitemap extensions at a site root.
Sitemaps.org protocol (opens in a new tab)
Standard XML sitemap format extended by ai-sitemap.xml for AI-specific page metadata.

Structured data & plugins

Schema.org structured data (opens in a new tab)
Collaborative vocabulary for machine-readable entity descriptions used by search engines and AI systems.
OpenAI plugin manifest format (opens in a new tab)
Specification for .well-known/ai-plugin.json manifests that register a site as an AI-accessible resource.

MCP & agent tooling

Model Context Protocol (opens in a new tab)
Open protocol for connecting AI assistants to tools, resources, and data sources via a standard interface.

AI crawler documentation

OpenAI GPTBot (opens in a new tab)
How OpenAI's web crawler accesses sites for ChatGPT and training pipelines.
Anthropic ClaudeBot (opens in a new tab)
Anthropic's documentation on Claude web crawling and opt-out.
Google-Extended (opens in a new tab)
Google's crawler used for Gemini and other generative AI product use cases.
PerplexityBot (opens in a new tab)
Perplexity's bot documentation for indexing and citation in AI answers.

About the author

Written by Russ Wittmann, SVP of Technology

SVP of Technology at Silverback Marketing. Helping brands become the source AI cites through GEO, AIO, technical SEO, and AI search strategy.

View LinkedIn profile