Plain-English reference

The AI Readiness Guide

A structured kit of files that tells ChatGPT, Claude, Gemini, and Perplexity exactly who you are before they guess wrong.

The world changed when AI assistants became the first stop for research. Today, when someone wants to find the best product to buy, the right service for their needs, or answers to a specific question, they often ask ChatGPT, Claude, Gemini, or Perplexity before they ever visit a website. Research from Princeton and Georgia Tech (opens in a new tab) formalizes this shift as generative engines that synthesize answers from web sources.

AI crawlers such as GPTBot (opens in a new tab) and ClaudeBot (opens in a new tab) ingest site content using rules you publish in robots.txt. Without clear, current, structured signals from you, assistants guess, and often guess wrong, describing products incorrectly or sending users to better-indexed competitors.

AI Readiness Files solve this problem. They are structured files you deploy to your website, including formats like the llms.txt standard (opens in a new tab), that tell AI systems exactly who you are, what you offer, and where to send people. This guide explains every file in plain English, each with a real-world analogy and a direct answer to the question: what does an AI actually do with this file?

The 7 categories

The 17 files fall into seven natural categories, each playing a distinct role in helping AI understand and accurately represent your website. Structured content and citations can improve visibility in AI-generated answers. See sources.

Identity & Permissions

The files that introduce your brand and control which AI systems get access

robots.txt

robots.txt THE DOORMAN

Critical

Controls who gets in and where they can go. One of the oldest files on the web, upgraded for AI. Tells every crawler and AI bot exactly which sections of your site they can access, and points them toward your AI-specific readiness files.

Who reads it

All web crawlers and AI bots

Update when

When site structure changes

Think of it like…

Imagine a library where some shelves are open to visitors and others are for staff only. robots.txt is the sign at the entrance that tells each visitor which sections they are free to browse.

What AI systems do with this file

  • Checks it before crawling any page on your site
  • Follows Allow and Disallow rules to determine which pages to index
  • Discovers your AI-specific files through Sitemap references
  • Learns where to find your llms.txt, ai-sitemap.xml, and other readiness files

Deploy at: /robots.txt

View sample file

ai.txt

ai.txt YOUR AI BUSINESS CARD

Critical

Purpose-built for the age of AI. Your brand's complete introduction to every AI system. Contains company identity, what you sell, what you are known for, authoritative topics, and explicit rules for what AI systems can and cannot do with your content.

Who reads it

AI crawlers, LLM training pipelines

Update when

Quarterly, or when products change

Think of it like…

Think of ai.txt like the briefing document you would hand a journalist before a media interview. It covers everything you want them to know, from your name, your story, your products, and your ground rules.

What AI systems do with this file

  • Reads it to learn your brand identity, product categories, and authoritative topics
  • Uses it to decide how to represent your company in AI-generated answers
  • Follows your stated training data policy
  • Uses your own brand language and product names rather than guessing

Deploy at: /ai.txt

View sample file

Content Files

The files that give AI the full story about your site in readable text

llms.txt

llms.txt THE CHEAT SHEET

Critical

A quick-read summary of your entire website, written specifically for large language models. Follows the llmstxt.org standard. Company name, short description, organized sections for every part of your site with direct links, and the most common questions people ask.

Who reads it

LLMs including ChatGPT, Claude, Gemini, and Perplexity

Update when

Monthly, or when site structure changes

Think of it like…

llms.txt is like the table of contents and executive summary of your website combined into one short document that an AI can read in seconds, rather than crawling thousands of individual pages.

What AI systems do with this file

  • Reads it to get an accurate, structured map of your entire website
  • Uses your section headers to understand how your site content is organized
  • Follows your links to find the right pages to recommend
  • Uses your priority queries to understand which questions your site answers best

Deploy at: /llms.txt

View sample file

llms-full.txt

llms-full.txt THE DEEP DIVE

Critical

The extended edition. Rich context, detailed Q&A sections, full category explanations, and everything an AI system needs to speak knowledgeably. llms.txt is the back-of-book summary; llms-full.txt is the complete book.

Who reads it

RAG pipelines, AI knowledge bases, advanced LLMs

Update when

Quarterly, or when major product changes occur

Think of it like…

If llms.txt is a Wikipedia article summary, llms-full.txt is the complete Wikipedia article with all sections, details, and references included.

What AI systems do with this file

  • Reads it to generate rich, accurate answers about your products and services
  • Uses your Q and A sections to respond to common user questions correctly
  • Feeds it into RAG pipelines to create high-quality context
  • Uses your detailed descriptions to match the right products to the right users

Deploy at: /llms-full.txt

View sample file

Map & Navigation

The files that help AI navigate your site structure efficiently

ai-sitemap.xml

ai-sitemap.xml THE GPS

High

An upgraded sitemap with AI-specific directions. Beyond standard URL + lastmod, every entry includes content-type labels, topic descriptions, and plain-English one-sentence summaries so AI knows exactly what each page contains before visiting.

Who reads it

AI crawlers, semantic search indexers

Update when

Monthly, or when new pages are added

Think of it like…

The ai-sitemap.xml is like a museum floor guide that does not just show you which rooms exist, but describes every exhibit inside each one so visitors know exactly what they will find before they walk in.

What AI systems do with this file

  • Uses it to efficiently discover and index all major pages
  • Reads content-type and topic tags to correctly categorize each page
  • Uses summaries to understand pages without needing to fetch and parse each one
  • Prioritizes pages based on your importance scores

Deploy at: /ai-sitemap.xml

View sample file

sitemap.md

sitemap.md THE HUMAN-READABLE MAP

Medium

A plain-text, markdown-formatted overview of your entire website structure. No XML, no JSON, just organized sections, links, and conversational descriptions. The most approachable file for both humans and AI.

Who reads it

LLMs, developers, content teams

Update when

Monthly, or when site structure changes

Think of it like…

If your website were a city, sitemap.md would be the welcome brochure handed out at the visitor center, organized, friendly, and giving you just enough information to navigate with confidence.

What AI systems do with this file

  • Reads it to build an immediate mental model of your website structure
  • Uses section organization to understand how your content is grouped
  • References it when generating navigation suggestions for users
  • Shares it with content and developer teams as a living reference

Deploy at: /sitemap.md

View sample file

Intelligence Files

The files that give AI deep knowledge of your entities, products, and user intents

ai-entities.json

ai-entities.json THE ENCYCLOPEDIA

High

A structured catalog of every significant element: product categories, subcategories, services, brands, and key concepts. Each entity has name, type, description, related URLs, and connections. Powers the knowledge graphs AI builds internally.

Who reads it

AI systems, knowledge graph builders

Update when

Quarterly, or when categories change

Think of it like…

ai-entities.json is like the index of a comprehensive encyclopedia. Instead of page numbers, each entry links to a URL, includes a description, and connects to related topics.

What AI systems do with this file

  • Builds a knowledge graph of your products, services, and categories
  • Correctly classifies what your site sells and how everything is organized
  • Understands relationships between categories, subcategories, and related items
  • Provides accurate product and service names rather than approximations

Deploy at: /ai-entities.json

View sample file

ai-intent.json

ai-intent.json THE TRAFFIC DIRECTOR

High

A lookup table mapping real user questions (the exact things people type into AI assistants) to the single best page on your website to answer them. Transforms AI from a general information source into a precise navigation tool.

Who reads it

AI assistants, conversational AI, AI search engines

Update when

Quarterly, or when site structure changes

Think of it like…

ai-intent.json is like a knowledgeable store employee who knows the answer to every common customer question and knows exactly which aisle, shelf, or department to point that customer toward.

What AI systems do with this file

  • Matches incoming user queries to the correct page on your site
  • Routes how-to questions to guides and educational content
  • Routes purchase-intent questions to the right product category pages
  • Sends users to the most relevant page rather than defaulting to homepage

Deploy at: /ai-intent.json

View sample file

ai-schema.json

ai-schema.json THE IDENTITY CARD

High

Uses the Schema.org standard (Google, Microsoft, Yahoo, Yandex) in a machine-readable JSON-LD format. Defines organization type, founding date, address, social profiles, contact info, and site search functionality with zero ambiguity.

Who reads it

Search engines, AI systems, developers

Update when

Annually, or when company information changes

Think of it like…

ai-schema.json is like your company's official government registration filing, a formal, standardized record of who you are, what you do, and where to find you, recognized and understood by every system.

What AI systems do with this file

  • Uses it to confidently identify your organization across different web properties
  • Feeds the structured data into knowledge graphs and entity resolution
  • Powers enhanced search results with star ratings, hours, and contact details
  • Links your website, social profiles, and other presence as one unified entity

Deploy at: /ai-schema.json

View sample file

.well-known/ai-plugin.json

.well-known/ai-plugin.json THE PLUGIN BADGE

Medium

Lives in the .well-known folder, a web standard for machine-readable identity files. Declares your site as an official AI resource: name, description, contact information, product categories, and key URLs in a format AI assistant platforms recognize.

Who reads it

AI assistants, AI plugin registries, developer platforms

Update when

When contact information or product categories change

Think of it like…

ai-plugin.json is your official membership badge in the AI ecosystem, a verified credential that tells platforms your site is a first-class source for answers about your products and services.

What AI systems do with this file

  • Discovers your site through the standardized .well-known location
  • Reads official name, description, and contact details for plugin integrations
  • Uses declared categories to match your site to relevant user queries
  • Treats your site as a registered, authoritative source rather than an unknown URL

Deploy at: /.well-known/ai-plugin.json

View sample file

Research Files

The files that power AI research and retrieval pipelines

rag-index.json

rag-index.json THE RESEARCH DATABASE

High

A pre-built index designed for Retrieval-Augmented Generation (RAG). A JSON array of records (one per major page/section) containing URL, title, and topics covered. AI systems can load this directly into LlamaIndex, LangChain, Pinecone, Weaviate, etc.

Who reads it

RAG pipeline engineers, AI search builders

Update when

Monthly, or when new major pages are added

Think of it like…

rag-index.json is like giving an AI researcher a pre-organized card catalog of your entire library. Instead of reading every book, they have a ready-made index they can search in milliseconds.

What AI systems do with this file

  • Loads it directly into RAG pipelines like LlamaIndex, LangChain, or Pinecone
  • Quickly identifies the most relevant pages for any given user query
  • Builds a searchable knowledge base from your site's content
  • Retrieves real-time, accurate information to ground AI-generated answers

Deploy at: /rag-index.json

View sample file

rag-index.jsonl

rag-index.jsonl THE STREAMLINED DATABASE

High

The identical content to rag-index.json, but in JSON Lines (newline-delimited) format. Preferred by large-scale ML pipelines, OpenAI fine-tuning, and streaming vector DB ingestion because records can be processed one-by-one without loading the entire file into memory.

Who reads it

ML engineers, AI developers, vector database ingestion tools

Update when

Always keep in sync with rag-index.json

Think of it like…

If rag-index.json is a bound book containing all your records, rag-index.jsonl is those same records on individual index cards, far easier to sort, filter, and process one entry at a time.

What AI systems do with this file

  • Feeds directly into OpenAI fine-tuning datasets and similar ML workflows
  • Streams into vector databases one record at a time for memory-efficient processing
  • Powers LlamaIndex, LangChain, Pinecone, and Weaviate knowledge bases
  • Enables large datasets to be processed without loading everything into memory

Deploy at: /rag-index.jsonl

View sample file

Policy Files

The files that set the rules for how AI can use your content

ai-disclosure.txt

ai-disclosure.txt THE TRANSPARENCY REPORT

Medium

Your public statement about how your organization uses AI in products, operations, and customer interactions. Like a privacy policy for the AI era. Answers: Is AI generating content? Making decisions? Can users reach a real person?

Who reads it

AI systems, customers, regulators

Update when

Annually, or when AI tool usage changes

Think of it like…

ai-disclosure.txt is like the nutrition label on food packaging. It does not tell you whether the food is good or bad. It simply tells you exactly what is inside. That transparency builds trust.

What AI systems do with this file

  • Reads it to accurately describe your organization's AI usage to curious users
  • Uses it to appropriately flag when content from your site is AI-assisted
  • References it when users ask whether your company uses AI
  • Evaluates your organization's transparency and trustworthiness

Deploy at: /ai-disclosure.txt

View sample file

training-data-policy.txt

training-data-policy.txt THE LICENSE AGREEMENT

Medium

Your formal published position on whether AI companies can use your content to train their models. Clearly states what is permitted (e.g. real-time RAG) vs. what requires a license (commercial model training). Protects your organization legally.

Who reads it

AI model developers, AI companies, legal teams

Update when

Annually, or when policy changes

Think of it like…

training-data-policy.txt is like a Creative Commons license for your website content in the AI era. It states clearly what others can do with your work so there is never any question about what is permitted.

What AI systems do with this file

  • Reads it before deciding whether your content can be included in training datasets
  • Treats content as license-required if your policy restricts commercial model training
  • Respects your currency warnings to avoid citing stale pricing or inventory data
  • Cites and links to your original source rather than reproducing content directly

Deploy at: /training-data-policy.txt

View sample file

Operations Files

The files that help your team deploy and maintain everything

structured-data-guide.md

structured-data-guide.md THE DEVELOPER HANDBOOK

Low

The most technical file, written for your development team, not AI. A comprehensive step-by-step guide with ready-to-use JSON-LD examples for every major page type: Product, LocalBusiness, BlogPosting, Event, etc. Powers rich results and page-level understanding.

Who reads it

Web developers, SEO teams, AI readiness teams

Update when

When new page types are added to the site

Think of it like…

structured-data-guide.md is like the wiring diagram for your home. Guests never see it, but your electrician absolutely needs it to make sure everything behind the walls works correctly.

What AI systems do with this file

  • Reads the structured data embedded on each page to understand its specific content type
  • Uses Product and Offer schema to accurately describe your inventory and pricing
  • Uses LocalBusiness schema to surface your locations in local and map-based searches
  • Uses BlogPosting and Event schemas to correctly classify and date your published content

Deploy at: /structured-data-guide.md

View sample file

manifest.json

manifest.json THE MASTER INVENTORY

Medium

The single source of truth for your entire AI readiness implementation. Lists every file deployed, what it does, its URL, format, intended audience, and update frequency. Includes a summary scorecard. Lets anyone instantly audit your readiness status.

Who reads it

Developers, deployment tools, AI readiness auditors

Update when

When new files are added to the set

Think of it like…

manifest.json is like the table of contents for your entire AI readiness project. It tells anyone who opens it exactly what you have built, where everything lives, and how complete the implementation is.

What AI systems do with this file

  • Reads it to discover all AI-specific files without needing to guess at URLs
  • Uses priority rankings to determine which files to read and index first
  • References update frequencies to know when to revisit and re-index each file
  • Allows auditing tools to verify complete and correct AI readiness implementation

Deploy at: /manifest.json

View sample file

deployment-checklist.md

deployment-checklist.md THE LAUNCH PLAN

Internal

The practical playbook your team follows to move from files on a computer to files correctly serving live traffic. Organized in clear phases with verification steps for each critical file. Nothing is skipped, forgotten, or misconfigured.

Who reads it

Web teams, developers, marketing teams

Update when

One-time setup; update when new files are added

Think of it like…

deployment-checklist.md is like a move-in checklist for a new office. It walks your team through every setup step so that six months later, nobody discovers the internet connection was never properly configured.

What AI systems do with this file

  • This file is primarily for your internal team rather than for AI systems directly
  • Ensures all files are live on the web and returning correct HTTP status codes
  • Verifies robots.txt has been updated with the correct AI crawler access rules
  • Confirms that structured data is implemented correctly across all page types

Deploy at: /deployment-checklist.md

View sample file

FAQ

Guide FAQ: questions about each section

Plain-English answers about each section of the AI Readiness Guide and the files it covers.

Read the complete FAQ (57 questions)
What are Identity & Permissions files for?

They introduce your brand to AI systems and control which crawlers can access your site. This section covers robots.txt, which sets crawler access rules and points bots to your AI files, and ai.txt, your brand's full introduction: identity, offerings, authoritative topics, and content-use rules. Deploy these first: they are the front door and briefing document for every AI system.

Read the Identity & Permissions section →
What are Content Files and why does AI need them?

Content Files give AI the full story about your site in readable text. llms.txt is the quick map with company name, sections, links, and top questions, following the llms.txt standard. llms-full.txt is the deep dive with rich descriptions and detailed Q&A. Together they let assistants understand your site without crawling thousands of pages.

Read the Content Files section →
How do Map & Navigation files help AI crawl my site?

They help AI discover and understand every page efficiently. ai-sitemap.xml adds content type, topics, and plain-English summaries to each URL so crawlers know what a page is about before fetching it. sitemap.md is the same structure in human-readable markdown, approachable for both teams and LLMs. Files in this section: ai-sitemap.xml (ai-sitemap.xml), sitemap.md (sitemap.md).

Read the Map & Navigation section →
What do Intelligence Files do for AI accuracy?

They give AI structured knowledge of your entities, user intents, and organization identity. ai-entities.json catalogs products, services, and key concepts. ai-intent.json maps real user questions to the best page on your site. ai-schema.json publishes Schema.org JSON-LD. .well-known/ai-plugin.json registers your site as an official AI resource. This is where precision happens: fewer guesses, better recommendations.

Read the Intelligence Files section →
What are Research Files and who uses them?

Research Files power AI retrieval and RAG pipelines. rag-index.json is a pre-built JSON index of your major pages (URL, title, topics). rag-index.jsonl is the same data in JSON Lines format for streaming ingestion into vector databases and ML workflows. Engineers and AI search builders load these directly into LlamaIndex, LangChain, Pinecone, and similar tools.

Read the Research Files section →
Why do I need Policy Files for AI readiness?

They set the rules for how AI may use your content and build trust with users. training-data-policy.txt states what is permitted for model training, RAG indexing, and commercial use. ai-disclosure.txt explains how your organization uses AI in products and operations. Both protect you legally and signal transparency to customers and AI systems.

Read the Policy Files section →
What are Operations Files and who are they for?

They help your team deploy, verify, and maintain the full kit. structured-data-guide.md gives developers JSON-LD examples for every page type. manifest.json is the master inventory and readiness scorecard. deployment-checklist.md is the phased launch playbook with verification steps. These are primarily for your web team, but manifest.json also helps auditors and AI systems discover what you have deployed.

Read the Operations Files section →

Ready to generate your kit?

Use the AI Readiness skill or MCP server to research your site and generate all 17 files, then deploy them to your web root.

Research & standards

Sources cited on this page

Claims about how AI assistants and crawlers use web content are grounded in published research, open standards, and vendor documentation, not marketing guesswork.

Generative engine research

LLM site summary standard

AI permissions & sitemaps

Structured data & plugins

MCP & agent tooling

AI crawler documentation