What are AI Readiness Files?

They are structured files you deploy at the root of your website to tell AI systems exactly who you are, what you offer, and where to send people. They speak a format AI already understands, so assistants stop guessing about your brand based on old training data.

Where should I start?

Start with the critical files: robots.txt, ai.txt, llms.txt, and llms-full.txt. They deliver the biggest immediate impact and are what most AI systems look for first. Then add sitemaps, intelligence JSON files, policies, and operational files using the deployment checklist.

ai.txt is your brand's introduction to every AI system on the internet. Where robots.txt controls access, ai.txt goes further. It covers what you sell, what you are known for, your authoritative topics, and the rules for what AI systems can and cannot do with your content.

llms.txt is a structured text file built for large language models. It gives AI a quick map of your website: company name, description, organized sections with direct links, and the most common questions your site answers. It follows the llmstxt.org standard used by ChatGPT, Claude, Gemini, and Perplexity.

What's the difference between llms.txt and llms-full.txt?

llms.txt is the cheat sheet: a quick summary an AI can scan in seconds. llms-full.txt is the deep dive with rich descriptions, detailed Q&A, and full category explanations. If llms.txt is a Wikipedia summary, llms-full.txt is the full article.

Why do I need an ai-sitemap.xml?

A regular sitemap lists pages and timestamps. The ai-sitemap adds what each page is about, its content type, and a plain-English summary. It is the difference between a paper road map and GPS with points of interest. AI crawlers can understand pages without fetching and parsing each one individually.

What are ai-entities.json and ai-schema.json for?

ai-entities.json is a structured catalog of the important parts of your site: products, categories, services, and key concepts. It powers AI knowledge graphs. ai-schema.json uses the Schema.org standard to describe your organization in machine-readable format so search engines and AI systems can identify you without ambiguity.

What are the RAG index files?

rag-index.json and rag-index.jsonl are ready-made indexes of your site built for AI research and retrieval pipelines. When a user asks a question, AI first pulls the most relevant documents from this index before generating an answer, which keeps responses grounded in your actual content.

What is a training data policy?

training-data-policy.txt sets formal rules for how AI companies can use your content, including model training, RAG indexing, and commercial use. It answers a practical question every brand should decide upfront: what are others allowed to do with your work?

How often should I update my AI readiness files?

It depends on the file. Update llms.txt and sitemaps monthly or when your site structure changes. Refresh ai.txt and llms-full.txt quarterly or when products change. Review policy files annually. The manifest.json file lists recommended update frequencies for every file in the kit.

Complete reference

AI Readiness FAQ

The complete FAQ for the AI Readiness Kit: every file explained, how to use the GitHub skill, connect the MCP server, deploy to your site, and get cited by ChatGPT, Claude, Gemini, and Perplexity.

FAQ hero illustration: a person faces a glowing question mark while documents flow through a connected AI hub into a verified checklist, representing structured answers about the AI Readiness Kit for ChatGPT, Claude, Gemini, and Perplexity.

Read the full guide Connect MCP Browse all files

Section 1 of 11

AI Readiness Fundamentals

Why structured site files matter when AI assistants are the first stop for research, recommendations, and answers.

What is AI readiness?

AI readiness means publishing structured files at your website root so large language models and AI crawlers can understand who you are, what you offer, and how to represent your brand accurately. Instead of guessing from stale training data or random page snippets, assistants can load purpose-built signals: identity files, content summaries, entity catalogs, intent maps, and policy statements.

The goal is not to trick AI systems. It is to give them the same clarity you would give a journalist, analyst, or partner before they write about your company.

Sources

GEO: Generative Engine Optimization (Princeton, Georgia Tech, et al.) (opens in a new tab)

Why does AI readiness matter now?

When someone wants a product recommendation, a service provider, or a straight answer, they often ask ChatGPT, Claude, Gemini, or Perplexity before they visit a website. If you are not giving those systems clear, current signals, they guess. And they often guess wrong: outdated products, incorrect positioning, or a competitor cited instead of you.

Generative Engine Optimization (GEO) research formalizes this shift. Content structure, authoritative citations, and machine-readable summaries measurably affect whether your brand appears in AI-generated answers.

Sources

GEO: Generative Engine Optimization (Princeton, Georgia Tech, et al.) (opens in a new tab)

Which AI systems use these files?

The kit targets the assistants and crawlers most teams care about today: OpenAI (GPTBot, ChatGPT browsing), Anthropic (ClaudeBot), Google (Google-Extended for Gemini), and Perplexity (PerplexityBot). Files like llms.txt follow an open community standard designed for any LLM that can fetch a URL. JSON entity and intent files help RAG pipelines regardless of which model sits on top.

No vendor guarantees a specific ranking in AI answers. What you control is signal quality: clear identity, structured entities, crawl permissions, and citable summaries that reduce ambiguity.

Sources

Is AI readiness the same as SEO?

They overlap but are not identical. Traditional SEO optimizes for search engine result pages: titles, meta descriptions, Core Web Vitals, backlinks. AI readiness optimizes for how generative systems synthesize answers: plain-text summaries (llms.txt), permission and identity files (ai.txt), entity graphs (ai-entities.json), question-to-URL maps (ai-intent.json), and RAG indexes.

Strong SEO helps. Structured data (Schema.org JSON-LD) helps both. The AI Readiness Kit adds a parallel layer of files many SEO audits never mention, because search consoles were built for blue links, not conversational citations.

Sources

Do I need a developer to become AI-ready?

You need someone who can upload files to your web root and verify HTTP status codes, but you do not need a custom app. The kit ships as markdown templates, JSON schemas, and plain-text files. The GitHub skill and MCP server automate research and generation; your web team (or host) deploys the output.

Operational files like deployment-checklist.md and manifest.json exist specifically so non-engineers can track progress while developers handle DNS, CDN, and robots.txt merges.

How is this different from just having good website copy?

Good copy lives inside HTML pages designed for humans in a browser. AI crawlers often fetch partial content, skip JavaScript-heavy sections, or infer meaning from navigation crumbs. Readiness files live at predictable root URLs (/llms.txt, /ai.txt) in formats models are trained to treat as authoritative context.

Think of website copy as the conversation inside your store. Readiness files are the fact sheet, floor plan, and policy binder you hand to every AI assistant before it speaks about you publicly.

Sources

llms.txt standard (llmstxt.org) (opens in a new tab)

Section 2 of 11

This Site & The Kit

What ai.silverbackmarketing.com is, what the kit includes, and how the pieces fit together.

What is the AI Readiness Kit?

The AI Readiness Kit is a set of 17 purpose-built files organized in 7 categories. You deploy them at the root of any website so AI systems understand exactly who you are, what you offer, and how to represent you. The kit is free, open source, and maintained by Silverback Marketing.

Categories cover identity and permissions, content summaries, site maps, intelligence JSON, research indexes, policy statements, and operational checklists. Each file has a defined audience (crawlers, LLMs, developers, legal) and update cadence.

What is ai.silverbackmarketing.com?

This site is the canonical home for the kit: documentation, live examples of every file deployed on our own domain, the hosted MCP server, and the complete guide. We eat our own cooking. The robots.txt, llms.txt, ai-entities.json, and other files you see in /public are production references you can curl or view in a browser.

The site also publishes JSON-LD, FAQ schema, speakable markup, and downloadable PDF and markdown versions so both humans and AI systems can consume the same facts.

Homepage

How many files are in the kit and what are the categories?

The kit includes 17 files across 7 categories:

Identity & Permissions (2 files): The files that introduce your brand and control which AI systems get access
Content Files (2 files): The files that give AI the full story about your site in readable text
Map & Navigation (2 files): The files that help AI navigate your site structure efficiently
Intelligence Files (4 files): The files that give AI deep knowledge of your entities, products, and user intents
Research Files (2 files): The files that power AI research and retrieval pipelines
Policy Files (2 files): The files that set the rules for how AI can use your content
Operations Files (3 files): The files that help your team deploy and maintain everything

File-by-file guide

Is the kit really free?

Yes. The skill, sample files, MCP server endpoint, and documentation are free to use. The GitHub repository is public. The hosted MCP server at ai.silverbackmarketing.com requires no API key.

Silverback Marketing publishes the kit as part of our GEO and AI search practice. If you want hands-on implementation help, that is a separate services conversation, but the files themselves are not paywalled.

GitHub repository (opens in a new tab)

What live examples can I inspect on this domain?

Every critical file is deployed here so you can verify format and content before generating your own. Examples include /robots.txt, /ai.txt, /llms.txt, /llms-full.txt, /ai-sitemap.xml, /manifest.json, /rag-index.json, and /.well-known/ai-plugin.json.

Use curl -I on any path to confirm 200 OK responses. The deployment-checklist.md in the repo lists verification commands for each file type.

Section 3 of 11

The AI Readiness Skill

How the GitHub skill researches a domain, classifies site type, and generates all output files.

What is the AI Readiness skill?

The skill is a packaged workflow (SKILL.md plus supporting specs) in the public GitHub repository. It instructs a capable coding agent to research a target domain, classify the business type (SaaS, e-commerce, healthcare, local services, and more), and generate all kit files with domain-specific content.

You can run it in Claude Code, Cursor, VS Code Copilot, or any agent that loads skills from the repo. The MCP server exposes the same instructions programmatically via get_skill_instructions.

GitHub: silverbackmarketing/ai-readiness (opens in a new tab)

How does the skill research a website?

The workflow follows a structured checklist: fetch the homepage and key pages, inspect existing robots.txt and sitemap.xml, identify products and services, map navigation, extract entity candidates, and note gaps in structured data. It uses web fetch tools available to the agent rather than a proprietary crawler.

Site classification (via get_site_classification_guide) adjusts tone, compliance notes, and file emphasis. A healthcare site gets different policy language than a D2C shop.

Homepage and about page analysis
Existing SEO and schema audit
Product, service, and location extraction
Competitor and category context (when available)
Output of all 17+ files in generation order

What files does the skill generate?

The full workflow produces every file in the kit: robots.txt, ai.txt, llms.txt, llms-full.txt, ai-sitemap.xml, sitemap.md, ai-entities.json, ai-intent.json, ai-schema.json, rag-index.json, rag-index.jsonl, ai-disclosure.txt, training-data-policy.txt, .well-known/ai-plugin.json, structured-data-guide.md, manifest.json, deployment-checklist.md, and README.md for the output folder.

Files are written to a local output directory. You review, edit brand-sensitive sections, then upload to production.

MCP: list_output_files tool

How do I install and run the skill manually?

Clone or download the GitHub repository. Point your coding agent at SKILL.md in app/lib/mcp/data/ (bundled in the repo) or load the skill path in Claude Code. Provide a target URL and ask the agent to execute the full workflow.

Without MCP, the agent reads specs from file-specs.md and writes outputs locally. With MCP connected, it can call generate_ai_readiness_files and get_file_spec on demand.

Clone https://github.com/silverbackmarketing/ai-readiness
Open the project in Cursor or Claude Code
Connect the MCP server (optional but recommended)
Prompt: "Generate AI readiness files for example.com using the skill workflow"
Review outputs, then deploy to your web root

MCP setup

Can I customize generated content before deploying?

You should. The skill produces a strong first draft grounded in public site content, but legal disclaimers, pricing, proprietary claims, and brand voice need human review. Policy files (training-data-policy.txt, ai-disclosure.txt) especially require stakeholder sign-off.

Treat generation like a technical SEO audit export: 80% done automatically, 20% refined by someone who knows the business.

What is the difference between the skill and the MCP server?

The skill is documentation plus rules: how to research, classify, and write files. The MCP server is a live API that exposes those rules as tools any connected agent can call without copying files into context.

Same brain, two interfaces. Local skill for offline or air-gapped workflows. Hosted MCP for one-URL setup in Cursor, Claude Code, Codex, and VS Code.

Sources

Model Context Protocol (opens in a new tab)

Connect MCP

Section 4 of 11

MCP Server & Coding Agents

Connect Cursor, Claude Code, VS Code, Codex, Antigravity, or Claude Desktop to the hosted MCP endpoint.

What is the AI Readiness MCP server?

The Model Context Protocol (MCP) server is a hosted HTTP endpoint at https://ai.silverbackmarketing.com/api/mcp. It exposes tools, bundled resources, and prompts so coding agents can generate kit files, read specifications, and run the full workflow without cloning the repository.

Transport is Streamable HTTP. Authentication is none. Any MCP-capable client can connect with a JSON config pointing at the URL.

Sources

Model Context Protocol (opens in a new tab)

MCP section on homepage

What tools does the MCP server expose?

Six tools cover the full workflow:

generate_ai_readiness_files(url): start the 18-file workflow for any domain
list_output_files(): list all output files in generation order
get_file_spec(filename): detailed spec for one file (llms.txt, ai-entities.json, etc.)
get_skill_instructions(): full research and generation workflow (SKILL.md)
get_site_classification_guide(): site-type taxonomy (SaaS, e-commerce, healthcare, etc.)
generate_rag_jsonl(rag_index_json): convert rag-index.json to JSONL for embedding pipelines

Which coding agents support this MCP server?

Any client with Streamable HTTP MCP support works. We document configs for Claude Code, Cursor, VS Code / GitHub Copilot, OpenAI Codex, Google Antigravity, and Claude Desktop (via mcp-remote stdio bridge).

Each client uses a slightly different config file path and JSON shape. The homepage MCP section includes copy-paste configs for all of them.

Claude Code: .mcp.json with type http
Cursor: .cursor/mcp.json
VS Code: .vscode/mcp.json
Codex: ~/.codex/config.toml
Antigravity: serverUrl key (not url)
Claude Desktop: npx mcp-remote proxy

Setup instructions

Do I need an API key to use the MCP server?

No. The production endpoint is open. Rate limits may apply at extreme volume, but normal agency and developer usage requires no signup.

You are responsible for reviewing generated content before publishing it to your domain. The server generates drafts; you own what goes live.

How do I connect Cursor to the MCP server?

Create .cursor/mcp.json in your project root (or add via Cursor Settings → MCP) with the ai-readiness server URL. Refresh MCP servers or restart Cursor. In agent mode, prompt: "Use ai-readiness to generate files for example.com".

URL: https://ai.silverbackmarketing.com/api/mcp
Config key: mcpServers.ai-readiness.url
Verify tools appear in the MCP panel before running generation

Why does Claude Desktop need mcp-remote?

Claude Desktop speaks MCP over stdio to local processes, not remote HTTP URLs directly. The mcp-remote package proxies the hosted HTTP server to a local stdio interface. Add it via npx in claude_desktop_config.json.

Requires Node.js 18+. Fully quit and reopen Claude Desktop after config changes.

Sources

Model Context Protocol (opens in a new tab)

What example prompts work well with MCP?

Be explicit about the domain and the tool namespace:

Generate AI readiness files for example.com
Use ai-readiness to list all 18 output files
Get the file spec for llms.txt from ai-readiness
Use ai-readiness get_site_classification_guide for a B2B SaaS homepage

Can I self-host the MCP server?

Yes. The server ships with this Next.js application. Deploy the repo to your own infrastructure and point clients at your /api/mcp endpoint. Bundled skill data is compiled at build time via scripts/bundle-mcp-data.mjs.

Self-hosting makes sense for enterprises with private network requirements. Most teams use the public endpoint for convenience.

GitHub repository (opens in a new tab)

Section 5 of 11

Identity & Permissions Files

robots.txt and ai.txt: crawler access, brand identity, and content-use rules.

What does robots.txt do for AI readiness?

robots.txt sits at the front door of your website and tells automated visitors which sections they may access. For AI readiness, it includes explicit rules for GPTBot, ClaudeBot, Google-Extended, and PerplexityBot, points crawlers toward llms.txt and ai-sitemap.xml, and keeps checkout, login, and admin paths disallowed.

Important: the kit ships a reference robots.txt. You must merge AI crawler directives into your live file rather than replacing production rules blindly.

Sources

Guide: robots.txt

What is ai.txt?

ai.txt is your brand's introduction to AI systems. Where robots.txt controls access, ai.txt describes identity, offerings, authoritative topics, contact paths, and rules for how AI may use your content. It follows the open ai.txt permissions standard.

Deploy at /ai.txt with text/plain content type. Link it from robots.txt Allow rules so crawlers discover it early.

Sources

ai.txt permissions standard (aitxt.org) (opens in a new tab)

Live ai.txt

How is ai.txt different from robots.txt?

robots.txt is access control: which paths bots may fetch. ai.txt is identity and policy: who you are, what you sell, and your ground rules for training, RAG, and citation.

Analogy: robots.txt is the doorman. ai.txt is the briefing document you hand a journalist before an interview.

What is .well-known/ai-plugin.json?

This manifest registers your site as an official AI-accessible resource in the plugin discovery format. It includes name, description, auth requirements, and API references. The kit uses it to point models at the live MCP server and documentation URLs.

Deploy at /.well-known/ai-plugin.json with application/json content type.

Sources

OpenAI plugin manifest format (opens in a new tab)

Live manifest

Section 6 of 11

llms.txt, llms-full.txt, ai-sitemap.xml, and sitemap.md for discovery and context.

What is llms.txt?

llms.txt is a structured plain-text summary of your site for large language models. It includes company name, description, organized sections with links, and top questions your site answers. It follows the llmstxt.org community standard used across the AI ecosystem.

Deploy at /llms.txt. Keep it scannable: one context window, no HTML noise.

Sources

llms.txt standard (llmstxt.org) (opens in a new tab)

Live llms.txt

What is llms-full.txt and when should I use it?

llms-full.txt is the deep dive: rich descriptions, detailed Q&A, category explanations, and extended context. Use it when models need more than a cheat sheet but should not crawl hundreds of pages.

If llms.txt is a Wikipedia summary, llms-full.txt is the full article. Update quarterly or when major products change.

Sources

llms.txt standard (llmstxt.org) (opens in a new tab)

Why do I need ai-sitemap.xml?

A regular sitemap lists URLs and timestamps. ai-sitemap.xml adds content type, topic tags, and plain-English summaries per URL so AI crawlers understand pages before fetching them.

It extends the standard sitemaps.org format with AI-specific metadata. Submit it to Google Search Console alongside your standard sitemap.

Sources

What is sitemap.md for?

sitemap.md is the same site structure in human-readable markdown. Developers, content teams, and LLMs can parse it without XML tooling. It is the welcome brochure for your site architecture.

Update monthly when you add major sections or products.

Which content files should I deploy first?

Priority order for most sites: robots.txt and ai.txt (identity and access), then llms.txt and llms-full.txt (context), then ai-sitemap.xml and sitemap.md (discovery). These six deliver the largest immediate lift before JSON intelligence files.

How it works

Section 7 of 11

Intelligence & Research Files

Structured JSON for entities, intents, schema, and RAG pipelines.

What is ai-entities.json?

A structured catalog of products, services, categories, locations, and key concepts on your site. It powers knowledge graph-style reasoning: which offerings relate to which topics, and what language you use to describe them.

Keep entities aligned with how customers actually search, not internal org chart names.

What is ai-intent.json?

Maps real user questions to the best URL on your site to answer each one. Without it, AI guesses which page to recommend. With it, you supply a direct preferred URL per query pattern.

Include informational, navigational, and transactional intents with confidence levels.

What is ai-schema.json?

Publishes Schema.org JSON-LD describing your organization, products, and key pages in a machine-readable format search engines and AI systems already consume.

Align ai-schema.json with on-page JSON-LD to avoid conflicting entity descriptions.

Sources

Schema.org structured data (opens in a new tab)

What are rag-index.json and rag-index.jsonl?

Pre-built indexes of your major pages for retrieval-augmented generation. Each entry includes URL, title, topics, and summary text. JSONL format streams into vector databases and embedding pipelines.

Engineering teams load these into LlamaIndex, LangChain, Pinecone, and similar tools so answers stay grounded in your content.

Do intelligence files replace on-page JSON-LD?

No. On-page JSON-LD remains essential for rich results and page-level entity markup. ai-schema.json, ai-entities.json, and ai-intent.json complement HTML by giving AI systems a single bundle at predictable root URLs.

Use structured-data-guide.md in the kit for page-type examples (homepage, product, FAQ, blog).

Sources

Schema.org structured data (opens in a new tab)

Guide: Intelligence Files

Section 8 of 11

Policy & Operations Files

Legal clarity, transparency, deployment checklists, and team playbooks.

What is training-data-policy.txt?

Sets formal rules for how AI companies may use your content: model training, RAG indexing, commercial use, and attribution requirements. Every brand should decide these rules upfront rather than accepting silent scraping.

Have legal review before publishing. Policies vary by industry (healthcare, finance, government).

What is ai-disclosure.txt?

A public transparency statement explaining how your organization uses AI in products, content, support, or operations. It builds trust with customers and signals honesty to AI systems summarizing your brand.

Update when your AI usage changes materially.

What is manifest.json?

The master inventory of every readiness file: deploy path, content type, purpose, audience, update frequency, and priority. It doubles as a readiness scorecard for audits.

AI systems and internal teams can fetch one URL to see what you have deployed.

Live manifest.json

What is deployment-checklist.md?

A phased launch playbook: upload order, curl verification commands, robots.txt merge steps, Search Console submission, and CDN cache notes. Designed so project managers can track progress while developers execute.

Follow it line by line on first deploy. Reuse verification commands after updates.

What is structured-data-guide.md?

Developer documentation with JSON-LD examples for common page types in your stack. It bridges marketing readiness files and on-page implementation.

Share with whoever maintains your CMS or frontend templates.

Sources

Schema.org structured data (opens in a new tab)

Section 9 of 11

Deployment & Verification

Getting files live, validating responses, and avoiding common launch mistakes.

Where do I upload AI readiness files?

Every file goes at your website root (or .well-known/ for ai-plugin.json) so predictable URLs work: https://yourdomain.com/llms.txt, https://yourdomain.com/ai.txt, etc.

Do not bury them in /assets or /downloads. AI crawlers and standards expect root paths.

How do I verify files deployed correctly?

Use HTTP HEAD or GET checks for each URL. Expect 200 OK, correct Content-Type, and no accidental redirects to HTML login pages.

curl -I https://yourdomain.com/llms.txt
curl -I https://yourdomain.com/ai-entities.json
Confirm text/plain, application/json, or application/xml as appropriate
Fetch in an incognito window to bypass auth cookies

deployment-checklist.md (opens in a new tab)

How do I merge robots.txt without breaking SEO?

Never overwrite production robots.txt with the kit template alone. Copy AI-specific User-agent blocks and Sitemap lines into your existing file. Preserve current Disallow rules, crawl-delay settings, and sitemap references.

Test in staging first. One bad Disallow rule can block your entire site.

Sources

Should I submit ai-sitemap.xml to Google Search Console?

Yes. Add it under Sitemaps in Search Console and Bing Webmaster Tools alongside your standard sitemap.xml. AI metadata tags do not replace classic SEO sitemaps; they extend them.

Sources

Sitemaps.org protocol (opens in a new tab)

What are the most common deployment mistakes?

Teams often hit these issues on first launch:

Files uploaded to a subdirectory instead of web root
CDN serving stale cached 404 after upload
robots.txt accidentally Disallow: / for all bots
JSON files served as text/html due to SPA fallback rules
llms.txt truncated or wrapped in HTML layout templates

Section 10 of 11

Maintenance & Governance

Update cadences, ownership, and keeping AI signals aligned with the live site.

How often should I update AI readiness files?

Cadence depends on the file. manifest.json lists recommended frequencies. As a rule: llms.txt, sitemaps, and rag indexes monthly or when structure changes; ai.txt and llms-full.txt quarterly or when offerings change; policy files annually or when regulations change.

llms.txt, ai-sitemap.xml, sitemap.md, rag-index.*: monthly
ai.txt, llms-full.txt, ai-entities.json, ai-intent.json: quarterly
training-data-policy.txt, ai-disclosure.txt: annually
robots.txt: when routes or crawler policy changes

Who should own AI readiness internally?

Best results come from a trio: marketing or SEO owns messaging and entity accuracy, web/dev owns deployment and JSON-LD, legal/compliance owns policy files. manifest.json makes handoffs visible.

Treat updates like a sitemap refresh, not a one-time project.

What triggers an out-of-cycle update?

Rebrand, new product line, pricing model change, merger, regulatory event, or a major site migration all require immediate refreshes. ai-intent.json especially drifts when URLs change.

After any migration, rerun curl checks on every readiness URL.

How do I track readiness over time?

Use manifest.json as your scorecard. Mark files deployed, last updated, and priority. Some teams add a simple spreadsheet mirror for stakeholders who do not read JSON.

Regenerate via the skill or MCP after major site changes rather than hand-editing dozens of files.

Section 11 of 11

GEO, Citations & AI Search

Research-backed tactics for visibility in AI-generated answers.

What is Generative Engine Optimization (GEO)?

GEO is the practice of optimizing content and structure for visibility in AI-generated answers, not just traditional search result pages. Research from Princeton, Georgia Tech, and collaborators shows structured citations, quotations, and statistics can significantly increase inclusion in generative responses in controlled experiments.

AI readiness files are infrastructure for GEO: they supply the structured facts models cite.

Sources

GEO: Generative Engine Optimization (Princeton, Georgia Tech, et al.) (opens in a new tab)

Will AI readiness files guarantee citations?

No vendor guarantees placement in ChatGPT, Claude, Gemini, or Perplexity answers. Files reduce ambiguity and improve discoverability. Combined with authoritative content, clear entities, and earned mentions elsewhere, they increase the odds your brand is described accurately when cited.

How do readiness files interact with RAG?

Retrieval-augmented generation pulls documents before answering. rag-index.json and rag-index.jsonl give RAG pipelines a curated starting set. llms.txt and llms-full.txt provide low-token summaries when full crawls are expensive.

If you build custom AI products on your content, these files accelerate indexing.

What else helps besides readiness files?

Files are necessary infrastructure, not the whole strategy. Also invest in authoritative original research, clear service pages, consistent NAP and entity data, PR and backlinks, FAQ content with real answers, and Schema.org markup on key templates.

AI systems cite sources humans already trust. Readiness files make you easier to trust accurately.

Sources

Still have questions?

The plain-English guide walks through every file with analogies and priorities. Connect the MCP server to generate a custom kit for your domain.

Read the full guide Connect MCP Back to homepage

Research & standards

Sources cited on this page

Claims about how AI assistants and crawlers use web content are grounded in published research, open standards, and vendor documentation, not marketing guesswork.

Generative engine research

GEO: Generative Engine Optimization (Princeton, Georgia Tech, et al.) (opens in a new tab)
Foundational KDD 2024 study on how content structure and citations affect visibility in AI-generated answers (up to ~40% lift in controlled tests).

LLM site summary standard

llms.txt standard (llmstxt.org) (opens in a new tab)
Community specification for a machine-readable site summary that LLMs can load in one context window.

AI permissions & sitemaps

ai.txt permissions standard (aitxt.org) (opens in a new tab)
Open standard for machine-readable AI permissions, identity, and sitemap extensions at a site root.
Sitemaps.org protocol (opens in a new tab)
Standard XML sitemap format extended by ai-sitemap.xml for AI-specific page metadata.

Structured data & plugins

Schema.org structured data (opens in a new tab)
Collaborative vocabulary for machine-readable entity descriptions used by search engines and AI systems.
OpenAI plugin manifest format (opens in a new tab)
Specification for .well-known/ai-plugin.json manifests that register a site as an AI-accessible resource.

MCP & agent tooling

Model Context Protocol (opens in a new tab)
Open protocol for connecting AI assistants to tools, resources, and data sources via a standard interface.

AI crawler documentation

OpenAI GPTBot (opens in a new tab)
How OpenAI's web crawler accesses sites for ChatGPT and training pipelines.
Anthropic ClaudeBot (opens in a new tab)
Anthropic's documentation on Claude web crawling and opt-out.
Google-Extended (opens in a new tab)
Google's crawler used for Gemini and other generative AI product use cases.
PerplexityBot (opens in a new tab)
Perplexity's bot documentation for indexing and citation in AI answers.

About the author

Written by Russ Wittmann, SVP of Technology

SVP of Technology at Silverback Marketing. Helping brands become the source AI cites through GEO, AIO, technical SEO, and AI search strategy.

View LinkedIn profile

AI Readiness FAQ

FAQ sections

AI Readiness Fundamentals

This Site & The Kit

The AI Readiness Skill

MCP Server & Coding Agents

Identity & Permissions Files

Content & Navigation Files

Intelligence & Research Files

Policy & Operations Files

Deployment & Verification

Maintenance & Governance

GEO, Citations & AI Search

Still have questions?

Sources cited on this page

Generative engine research

LLM site summary standard

AI permissions & sitemaps

Structured data & plugins

MCP & agent tooling

AI crawler documentation

Written by Russ Wittmann, SVP of Technology