# AI Readiness FAQ

> The complete FAQ for the AI Readiness Kit: every file explained, how to use the GitHub skill, connect the MCP server, deploy to your site, and get cited by ChatGPT, Claude, Gemini, and Perplexity.

- **Site:** https://ai.silverbackmarketing.com
- **HTML version:** https://ai.silverbackmarketing.com/faq
- **Markdown version:** https://ai.silverbackmarketing.com/faq.md
- **Full guide:** https://ai.silverbackmarketing.com/guide
- **Skill repo:** https://github.com/silverbackmarketing/ai-readiness
- **Questions:** 57

## Table of contents

- [AI Readiness Fundamentals](#ai-readiness-fundamentals)
- [This Site & The Kit](#this-site-the-kit)
- [The AI Readiness Skill](#the-ai-readiness-skill)
- [MCP Server & Coding Agents](#mcp-server-coding-agents)
- [Identity & Permissions Files](#identity-permissions-files)
- [Content & Navigation Files](#content-navigation-files)
- [Intelligence & Research Files](#intelligence-research-files)
- [Policy & Operations Files](#policy-operations-files)
- [Deployment & Verification](#deployment-verification)
- [Maintenance & Governance](#maintenance-governance)
- [GEO, Citations & AI Search](#geo-citations-ai-search)

## AI Readiness Fundamentals

Why structured site files matter when AI assistants are the first stop for research, recommendations, and answers.

### What is AI readiness?

AI readiness means publishing structured files at your website root so large language models and AI crawlers can understand who you are, what you offer, and how to represent your brand accurately. Instead of guessing from stale training data or random page snippets, assistants can load purpose-built signals: identity files, content summaries, entity catalogs, intent maps, and policy statements.

The goal is not to trick AI systems. It is to give them the same clarity you would give a journalist, analyst, or partner before they write about your company.

**Sources**

- [GEO: Generative Engine Optimization (Princeton, Georgia Tech, et al.)](https://arxiv.org/abs/2311.09735)

**Related:**

- [Read the full guide](https://ai.silverbackmarketing.com/guide)
- [See all 17 files](https://ai.silverbackmarketing.com/#files)

### Why does AI readiness matter now?

When someone wants a product recommendation, a service provider, or a straight answer, they often ask ChatGPT, Claude, Gemini, or Perplexity before they visit a website. If you are not giving those systems clear, current signals, they guess. And they often guess wrong: outdated products, incorrect positioning, or a competitor cited instead of you.

Generative Engine Optimization (GEO) research formalizes this shift. Content structure, authoritative citations, and machine-readable summaries measurably affect whether your brand appears in AI-generated answers.

**Sources**

- [GEO: Generative Engine Optimization (Princeton, Georgia Tech, et al.)](https://arxiv.org/abs/2311.09735)

### Which AI systems use these files?

The kit targets the assistants and crawlers most teams care about today: OpenAI (GPTBot, ChatGPT browsing), Anthropic (ClaudeBot), Google (Google-Extended for Gemini), and Perplexity (PerplexityBot). Files like llms.txt follow an open community standard designed for any LLM that can fetch a URL. JSON entity and intent files help RAG pipelines regardless of which model sits on top.

No vendor guarantees a specific ranking in AI answers. What you control is signal quality: clear identity, structured entities, crawl permissions, and citable summaries that reduce ambiguity.

**Sources**

- [OpenAI GPTBot](https://platform.openai.com/docs/gptbot)
- [Anthropic ClaudeBot](https://support.anthropic.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-i-give-feedback-or-request-removal)
- [Google-Extended](https://developers.google.com/search/docs/crawling-indexing/google-common-crawlers#google-extended)
- [PerplexityBot](https://docs.perplexity.ai/guides/bots)

### Is AI readiness the same as SEO?

They overlap but are not identical. Traditional SEO optimizes for search engine result pages: titles, meta descriptions, Core Web Vitals, backlinks. AI readiness optimizes for how generative systems synthesize answers: plain-text summaries (llms.txt), permission and identity files (ai.txt), entity graphs (ai-entities.json), question-to-URL maps (ai-intent.json), and RAG indexes.

Strong SEO helps. Structured data (Schema.org JSON-LD) helps both. The AI Readiness Kit adds a parallel layer of files many SEO audits never mention, because search consoles were built for blue links, not conversational citations.

**Sources**

- [Schema.org structured data](https://schema.org/)
- [GEO: Generative Engine Optimization (Princeton, Georgia Tech, et al.)](https://arxiv.org/abs/2311.09735)

### Do I need a developer to become AI-ready?

You need someone who can upload files to your web root and verify HTTP status codes, but you do not need a custom app. The kit ships as markdown templates, JSON schemas, and plain-text files. The GitHub skill and MCP server automate research and generation; your web team (or host) deploys the output.

Operational files like deployment-checklist.md and manifest.json exist specifically so non-engineers can track progress while developers handle DNS, CDN, and robots.txt merges.

**Related:**

- [How it works](https://ai.silverbackmarketing.com/#how)
- [Download the skill](https://github.com/silverbackmarketing/ai-readiness)

### How is this different from just having good website copy?

Good copy lives inside HTML pages designed for humans in a browser. AI crawlers often fetch partial content, skip JavaScript-heavy sections, or infer meaning from navigation crumbs. Readiness files live at predictable root URLs (/llms.txt, /ai.txt) in formats models are trained to treat as authoritative context.

Think of website copy as the conversation inside your store. Readiness files are the fact sheet, floor plan, and policy binder you hand to every AI assistant before it speaks about you publicly.

**Sources**

- [llms.txt standard (llmstxt.org)](https://llmstxt.org/)

## This Site & The Kit

What ai.silverbackmarketing.com is, what the kit includes, and how the pieces fit together.

### What is the AI Readiness Kit?

The AI Readiness Kit is a set of 17 purpose-built files organized in 7 categories. You deploy them at the root of any website so AI systems understand exactly who you are, what you offer, and how to represent you. The kit is free, open source, and maintained by Silverback Marketing.

Categories cover identity and permissions, content summaries, site maps, intelligence JSON, research indexes, policy statements, and operational checklists. Each file has a defined audience (crawlers, LLMs, developers, legal) and update cadence.

**Related:**

- [Explore all files](https://ai.silverbackmarketing.com/#files)
- [Full plain-English guide](https://ai.silverbackmarketing.com/guide)

### What is ai.silverbackmarketing.com?

This site is the canonical home for the kit: documentation, live examples of every file deployed on our own domain, the hosted MCP server, and the complete guide. We eat our own cooking. The robots.txt, llms.txt, ai-entities.json, and other files you see in /public are production references you can curl or view in a browser.

The site also publishes JSON-LD, FAQ schema, speakable markup, and downloadable PDF and markdown versions so both humans and AI systems can consume the same facts.

**Related:**

- [Homepage](https://ai.silverbackmarketing.com/)

### How many files are in the kit and what are the categories?

The kit includes 17 files across 7 categories:

- Identity & Permissions (2 files): The files that introduce your brand and control which AI systems get access
- Content Files (2 files): The files that give AI the full story about your site in readable text
- Map & Navigation (2 files): The files that help AI navigate your site structure efficiently
- Intelligence Files (4 files): The files that give AI deep knowledge of your entities, products, and user intents
- Research Files (2 files): The files that power AI research and retrieval pipelines
- Policy Files (2 files): The files that set the rules for how AI can use your content
- Operations Files (3 files): The files that help your team deploy and maintain everything

**Related:**

- [File-by-file guide](https://ai.silverbackmarketing.com/guide)

### Is the kit really free?

Yes. The skill, sample files, MCP server endpoint, and documentation are free to use. The GitHub repository is public. The hosted MCP server at ai.silverbackmarketing.com requires no API key.

Silverback Marketing publishes the kit as part of our GEO and AI search practice. If you want hands-on implementation help, that is a separate services conversation, but the files themselves are not paywalled.

**Related:**

- [GitHub repository](https://github.com/silverbackmarketing/ai-readiness)

### What live examples can I inspect on this domain?

Every critical file is deployed here so you can verify format and content before generating your own. Examples include /robots.txt, /ai.txt, /llms.txt, /llms-full.txt, /ai-sitemap.xml, /manifest.json, /rag-index.json, and /.well-known/ai-plugin.json.

Use curl -I on any path to confirm 200 OK responses. The deployment-checklist.md in the repo lists verification commands for each file type.

**Related:**

- [View llms.txt](https://ai.silverbackmarketing.com/llms.txt)
- [View manifest.json](https://ai.silverbackmarketing.com/manifest.json)

## The AI Readiness Skill

How the GitHub skill researches a domain, classifies site type, and generates all output files.

### What is the AI Readiness skill?

The skill is a packaged workflow (SKILL.md plus supporting specs) in the public GitHub repository. It instructs a capable coding agent to research a target domain, classify the business type (SaaS, e-commerce, healthcare, local services, and more), and generate all kit files with domain-specific content.

You can run it in Claude Code, Cursor, VS Code Copilot, or any agent that loads skills from the repo. The MCP server exposes the same instructions programmatically via get_skill_instructions.

**Related:**

- [GitHub: silverbackmarketing/ai-readiness](https://github.com/silverbackmarketing/ai-readiness)

### How does the skill research a website?

The workflow follows a structured checklist: fetch the homepage and key pages, inspect existing robots.txt and sitemap.xml, identify products and services, map navigation, extract entity candidates, and note gaps in structured data. It uses web fetch tools available to the agent rather than a proprietary crawler.

Site classification (via get_site_classification_guide) adjusts tone, compliance notes, and file emphasis. A healthcare site gets different policy language than a D2C shop.

- Homepage and about page analysis
- Existing SEO and schema audit
- Product, service, and location extraction
- Competitor and category context (when available)
- Output of all 17+ files in generation order

### What files does the skill generate?

The full workflow produces every file in the kit: robots.txt, ai.txt, llms.txt, llms-full.txt, ai-sitemap.xml, sitemap.md, ai-entities.json, ai-intent.json, ai-schema.json, rag-index.json, rag-index.jsonl, ai-disclosure.txt, training-data-policy.txt, .well-known/ai-plugin.json, structured-data-guide.md, manifest.json, deployment-checklist.md, and README.md for the output folder.

Files are written to a local output directory. You review, edit brand-sensitive sections, then upload to production.

**Related:**

- [MCP: list_output_files tool](https://ai.silverbackmarketing.com/#mcp)

### How do I install and run the skill manually?

Clone or download the GitHub repository. Point your coding agent at SKILL.md in app/lib/mcp/data/ (bundled in the repo) or load the skill path in Claude Code. Provide a target URL and ask the agent to execute the full workflow.

Without MCP, the agent reads specs from file-specs.md and writes outputs locally. With MCP connected, it can call generate_ai_readiness_files and get_file_spec on demand.

- Clone https://github.com/silverbackmarketing/ai-readiness
- Open the project in Cursor or Claude Code
- Connect the MCP server (optional but recommended)
- Prompt: "Generate AI readiness files for example.com using the skill workflow"
- Review outputs, then deploy to your web root

**Related:**

- [MCP setup](https://ai.silverbackmarketing.com/#mcp)

### Can I customize generated content before deploying?

You should. The skill produces a strong first draft grounded in public site content, but legal disclaimers, pricing, proprietary claims, and brand voice need human review. Policy files (training-data-policy.txt, ai-disclosure.txt) especially require stakeholder sign-off.

Treat generation like a technical SEO audit export: 80% done automatically, 20% refined by someone who knows the business.

### What is the difference between the skill and the MCP server?

The skill is documentation plus rules: how to research, classify, and write files. The MCP server is a live API that exposes those rules as tools any connected agent can call without copying files into context.

Same brain, two interfaces. Local skill for offline or air-gapped workflows. Hosted MCP for one-URL setup in Cursor, Claude Code, Codex, and VS Code.

**Sources**

- [Model Context Protocol](https://modelcontextprotocol.io/)

**Related:**

- [Connect MCP](https://ai.silverbackmarketing.com/#mcp)

## MCP Server & Coding Agents

Connect Cursor, Claude Code, VS Code, Codex, Antigravity, or Claude Desktop to the hosted MCP endpoint.

### What is the AI Readiness MCP server?

The Model Context Protocol (MCP) server is a hosted HTTP endpoint at https://ai.silverbackmarketing.com/api/mcp. It exposes tools, bundled resources, and prompts so coding agents can generate kit files, read specifications, and run the full workflow without cloning the repository.

Transport is Streamable HTTP. Authentication is none. Any MCP-capable client can connect with a JSON config pointing at the URL.

**Sources**

- [Model Context Protocol](https://modelcontextprotocol.io/)

**Related:**

- [MCP section on homepage](https://ai.silverbackmarketing.com/#mcp)

### What tools does the MCP server expose?

Six tools cover the full workflow:

- generate_ai_readiness_files(url): start the 18-file workflow for any domain
- list_output_files(): list all output files in generation order
- get_file_spec(filename): detailed spec for one file (llms.txt, ai-entities.json, etc.)
- get_skill_instructions(): full research and generation workflow (SKILL.md)
- get_site_classification_guide(): site-type taxonomy (SaaS, e-commerce, healthcare, etc.)
- generate_rag_jsonl(rag_index_json): convert rag-index.json to JSONL for embedding pipelines

### Which coding agents support this MCP server?

Any client with Streamable HTTP MCP support works. We document configs for Claude Code, Cursor, VS Code / GitHub Copilot, OpenAI Codex, Google Antigravity, and Claude Desktop (via mcp-remote stdio bridge).

Each client uses a slightly different config file path and JSON shape. The homepage MCP section includes copy-paste configs for all of them.

- Claude Code: .mcp.json with type http
- Cursor: .cursor/mcp.json
- VS Code: .vscode/mcp.json
- Codex: ~/.codex/config.toml
- Antigravity: serverUrl key (not url)
- Claude Desktop: npx mcp-remote proxy

**Related:**

- [Setup instructions](https://ai.silverbackmarketing.com/#mcp)

### Do I need an API key to use the MCP server?

No. The production endpoint is open. Rate limits may apply at extreme volume, but normal agency and developer usage requires no signup.

You are responsible for reviewing generated content before publishing it to your domain. The server generates drafts; you own what goes live.

### How do I connect Cursor to the MCP server?

Create .cursor/mcp.json in your project root (or add via Cursor Settings → MCP) with the ai-readiness server URL. Refresh MCP servers or restart Cursor. In agent mode, prompt: "Use ai-readiness to generate files for example.com".

- URL: https://ai.silverbackmarketing.com/api/mcp
- Config key: mcpServers.ai-readiness.url
- Verify tools appear in the MCP panel before running generation

### Why does Claude Desktop need mcp-remote?

Claude Desktop speaks MCP over stdio to local processes, not remote HTTP URLs directly. The mcp-remote package proxies the hosted HTTP server to a local stdio interface. Add it via npx in claude_desktop_config.json.

Requires Node.js 18+. Fully quit and reopen Claude Desktop after config changes.

**Sources**

- [Model Context Protocol](https://modelcontextprotocol.io/)

### What example prompts work well with MCP?

Be explicit about the domain and the tool namespace:

- Generate AI readiness files for example.com
- Use ai-readiness to list all 18 output files
- Get the file spec for llms.txt from ai-readiness
- Use ai-readiness get_site_classification_guide for a B2B SaaS homepage

### Can I self-host the MCP server?

Yes. The server ships with this Next.js application. Deploy the repo to your own infrastructure and point clients at your /api/mcp endpoint. Bundled skill data is compiled at build time via scripts/bundle-mcp-data.mjs.

Self-hosting makes sense for enterprises with private network requirements. Most teams use the public endpoint for convenience.

**Related:**

- [GitHub repository](https://github.com/silverbackmarketing/ai-readiness)

## Identity & Permissions Files

robots.txt and ai.txt: crawler access, brand identity, and content-use rules.

### What does robots.txt do for AI readiness?

robots.txt sits at the front door of your website and tells automated visitors which sections they may access. For AI readiness, it includes explicit rules for GPTBot, ClaudeBot, Google-Extended, and PerplexityBot, points crawlers toward llms.txt and ai-sitemap.xml, and keeps checkout, login, and admin paths disallowed.

Important: the kit ships a reference robots.txt. You must merge AI crawler directives into your live file rather than replacing production rules blindly.

**Sources**

- [OpenAI GPTBot](https://platform.openai.com/docs/gptbot)
- [Anthropic ClaudeBot](https://support.anthropic.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-i-give-feedback-or-request-removal)
- [Google-Extended](https://developers.google.com/search/docs/crawling-indexing/google-common-crawlers#google-extended)
- [PerplexityBot](https://docs.perplexity.ai/guides/bots)

**Related:**

- [Guide: robots.txt](https://ai.silverbackmarketing.com/guide#guide-robots-txt)

### What is ai.txt?

ai.txt is your brand's introduction to AI systems. Where robots.txt controls access, ai.txt describes identity, offerings, authoritative topics, contact paths, and rules for how AI may use your content. It follows the open ai.txt permissions standard.

Deploy at /ai.txt with text/plain content type. Link it from robots.txt Allow rules so crawlers discover it early.

**Sources**

- [ai.txt permissions standard (aitxt.org)](https://aitxt.org/)

**Related:**

- [Live ai.txt](https://ai.silverbackmarketing.com/ai.txt)

### How is ai.txt different from robots.txt?

robots.txt is access control: which paths bots may fetch. ai.txt is identity and policy: who you are, what you sell, and your ground rules for training, RAG, and citation.

Analogy: robots.txt is the doorman. ai.txt is the briefing document you hand a journalist before an interview.

### What is .well-known/ai-plugin.json?

This manifest registers your site as an official AI-accessible resource in the plugin discovery format. It includes name, description, auth requirements, and API references. The kit uses it to point models at the live MCP server and documentation URLs.

Deploy at /.well-known/ai-plugin.json with application/json content type.

**Sources**

- [OpenAI plugin manifest format](https://platform.openai.com/docs/plugins/plugin-manifest)

**Related:**

- [Live manifest](https://ai.silverbackmarketing.com/.well-known/ai-plugin.json)

## Content & Navigation Files

llms.txt, llms-full.txt, ai-sitemap.xml, and sitemap.md for discovery and context.

### What is llms.txt?

llms.txt is a structured plain-text summary of your site for large language models. It includes company name, description, organized sections with links, and top questions your site answers. It follows the llmstxt.org community standard used across the AI ecosystem.

Deploy at /llms.txt. Keep it scannable: one context window, no HTML noise.

**Sources**

- [llms.txt standard (llmstxt.org)](https://llmstxt.org/)

**Related:**

- [Live llms.txt](https://ai.silverbackmarketing.com/llms.txt)

### What is llms-full.txt and when should I use it?

llms-full.txt is the deep dive: rich descriptions, detailed Q&A, category explanations, and extended context. Use it when models need more than a cheat sheet but should not crawl hundreds of pages.

If llms.txt is a Wikipedia summary, llms-full.txt is the full article. Update quarterly or when major products change.

**Sources**

- [llms.txt standard (llmstxt.org)](https://llmstxt.org/)

### Why do I need ai-sitemap.xml?

A regular sitemap lists URLs and timestamps. ai-sitemap.xml adds content type, topic tags, and plain-English summaries per URL so AI crawlers understand pages before fetching them.

It extends the standard sitemaps.org format with AI-specific metadata. Submit it to Google Search Console alongside your standard sitemap.

**Sources**

- [Sitemaps.org protocol](https://www.sitemaps.org/)
- [ai.txt permissions standard (aitxt.org)](https://aitxt.org/)

### What is sitemap.md for?

sitemap.md is the same site structure in human-readable markdown. Developers, content teams, and LLMs can parse it without XML tooling. It is the welcome brochure for your site architecture.

Update monthly when you add major sections or products.

### Which content files should I deploy first?

Priority order for most sites: robots.txt and ai.txt (identity and access), then llms.txt and llms-full.txt (context), then ai-sitemap.xml and sitemap.md (discovery). These six deliver the largest immediate lift before JSON intelligence files.

**Related:**

- [How it works](https://ai.silverbackmarketing.com/#how)

## Intelligence & Research Files

Structured JSON for entities, intents, schema, and RAG pipelines.

### What is ai-entities.json?

A structured catalog of products, services, categories, locations, and key concepts on your site. It powers knowledge graph-style reasoning: which offerings relate to which topics, and what language you use to describe them.

Keep entities aligned with how customers actually search, not internal org chart names.

### What is ai-intent.json?

Maps real user questions to the best URL on your site to answer each one. Without it, AI guesses which page to recommend. With it, you supply a direct preferred URL per query pattern.

Include informational, navigational, and transactional intents with confidence levels.

### What is ai-schema.json?

Publishes Schema.org JSON-LD describing your organization, products, and key pages in a machine-readable format search engines and AI systems already consume.

Align ai-schema.json with on-page JSON-LD to avoid conflicting entity descriptions.

**Sources**

- [Schema.org structured data](https://schema.org/)

### What are rag-index.json and rag-index.jsonl?

Pre-built indexes of your major pages for retrieval-augmented generation. Each entry includes URL, title, topics, and summary text. JSONL format streams into vector databases and embedding pipelines.

Engineering teams load these into LlamaIndex, LangChain, Pinecone, and similar tools so answers stay grounded in your content.

### Do intelligence files replace on-page JSON-LD?

No. On-page JSON-LD remains essential for rich results and page-level entity markup. ai-schema.json, ai-entities.json, and ai-intent.json complement HTML by giving AI systems a single bundle at predictable root URLs.

Use structured-data-guide.md in the kit for page-type examples (homepage, product, FAQ, blog).

**Sources**

- [Schema.org structured data](https://schema.org/)

**Related:**

- [Guide: Intelligence Files](https://ai.silverbackmarketing.com/guide#guide-category-intelligence-files)

## Policy & Operations Files

Legal clarity, transparency, deployment checklists, and team playbooks.

### What is training-data-policy.txt?

Sets formal rules for how AI companies may use your content: model training, RAG indexing, commercial use, and attribution requirements. Every brand should decide these rules upfront rather than accepting silent scraping.

Have legal review before publishing. Policies vary by industry (healthcare, finance, government).

### What is ai-disclosure.txt?

A public transparency statement explaining how your organization uses AI in products, content, support, or operations. It builds trust with customers and signals honesty to AI systems summarizing your brand.

Update when your AI usage changes materially.

### What is manifest.json?

The master inventory of every readiness file: deploy path, content type, purpose, audience, update frequency, and priority. It doubles as a readiness scorecard for audits.

AI systems and internal teams can fetch one URL to see what you have deployed.

**Related:**

- [Live manifest.json](https://ai.silverbackmarketing.com/manifest.json)

### What is deployment-checklist.md?

A phased launch playbook: upload order, curl verification commands, robots.txt merge steps, Search Console submission, and CDN cache notes. Designed so project managers can track progress while developers execute.

Follow it line by line on first deploy. Reuse verification commands after updates.

### What is structured-data-guide.md?

Developer documentation with JSON-LD examples for common page types in your stack. It bridges marketing readiness files and on-page implementation.

Share with whoever maintains your CMS or frontend templates.

**Sources**

- [Schema.org structured data](https://schema.org/)

## Deployment & Verification

Getting files live, validating responses, and avoiding common launch mistakes.

### Where do I upload AI readiness files?

Every file goes at your website root (or .well-known/ for ai-plugin.json) so predictable URLs work: https://yourdomain.com/llms.txt, https://yourdomain.com/ai.txt, etc.

Do not bury them in /assets or /downloads. AI crawlers and standards expect root paths.

### How do I verify files deployed correctly?

Use HTTP HEAD or GET checks for each URL. Expect 200 OK, correct Content-Type, and no accidental redirects to HTML login pages.

- curl -I https://yourdomain.com/llms.txt
- curl -I https://yourdomain.com/ai-entities.json
- Confirm text/plain, application/json, or application/xml as appropriate
- Fetch in an incognito window to bypass auth cookies

**Related:**

- [deployment-checklist.md](https://github.com/silverbackmarketing/ai-readiness/blob/main/public/ai-readiness-files/deployment-checklist.md)

### How do I merge robots.txt without breaking SEO?

Never overwrite production robots.txt with the kit template alone. Copy AI-specific User-agent blocks and Sitemap lines into your existing file. Preserve current Disallow rules, crawl-delay settings, and sitemap references.

Test in staging first. One bad Disallow rule can block your entire site.

**Sources**

- [OpenAI GPTBot](https://platform.openai.com/docs/gptbot)
- [Anthropic ClaudeBot](https://support.anthropic.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-i-give-feedback-or-request-removal)
- [Google-Extended](https://developers.google.com/search/docs/crawling-indexing/google-common-crawlers#google-extended)
- [PerplexityBot](https://docs.perplexity.ai/guides/bots)

### Should I submit ai-sitemap.xml to Google Search Console?

Yes. Add it under Sitemaps in Search Console and Bing Webmaster Tools alongside your standard sitemap.xml. AI metadata tags do not replace classic SEO sitemaps; they extend them.

**Sources**

- [Sitemaps.org protocol](https://www.sitemaps.org/)

### What are the most common deployment mistakes?

Teams often hit these issues on first launch:

- Files uploaded to a subdirectory instead of web root
- CDN serving stale cached 404 after upload
- robots.txt accidentally Disallow: / for all bots
- JSON files served as text/html due to SPA fallback rules
- llms.txt truncated or wrapped in HTML layout templates

## Maintenance & Governance

Update cadences, ownership, and keeping AI signals aligned with the live site.

### How often should I update AI readiness files?

Cadence depends on the file. manifest.json lists recommended frequencies. As a rule: llms.txt, sitemaps, and rag indexes monthly or when structure changes; ai.txt and llms-full.txt quarterly or when offerings change; policy files annually or when regulations change.

- llms.txt, ai-sitemap.xml, sitemap.md, rag-index.*: monthly
- ai.txt, llms-full.txt, ai-entities.json, ai-intent.json: quarterly
- training-data-policy.txt, ai-disclosure.txt: annually
- robots.txt: when routes or crawler policy changes

### Who should own AI readiness internally?

Best results come from a trio: marketing or SEO owns messaging and entity accuracy, web/dev owns deployment and JSON-LD, legal/compliance owns policy files. manifest.json makes handoffs visible.

Treat updates like a sitemap refresh, not a one-time project.

### What triggers an out-of-cycle update?

Rebrand, new product line, pricing model change, merger, regulatory event, or a major site migration all require immediate refreshes. ai-intent.json especially drifts when URLs change.

After any migration, rerun curl checks on every readiness URL.

### How do I track readiness over time?

Use manifest.json as your scorecard. Mark files deployed, last updated, and priority. Some teams add a simple spreadsheet mirror for stakeholders who do not read JSON.

Regenerate via the skill or MCP after major site changes rather than hand-editing dozens of files.

## GEO, Citations & AI Search

Research-backed tactics for visibility in AI-generated answers.

### What is Generative Engine Optimization (GEO)?

GEO is the practice of optimizing content and structure for visibility in AI-generated answers, not just traditional search result pages. Research from Princeton, Georgia Tech, and collaborators shows structured citations, quotations, and statistics can significantly increase inclusion in generative responses in controlled experiments.

AI readiness files are infrastructure for GEO: they supply the structured facts models cite.

**Sources**

- [GEO: Generative Engine Optimization (Princeton, Georgia Tech, et al.)](https://arxiv.org/abs/2311.09735)

### Will AI readiness files guarantee citations?

No vendor guarantees placement in ChatGPT, Claude, Gemini, or Perplexity answers. Files reduce ambiguity and improve discoverability. Combined with authoritative content, clear entities, and earned mentions elsewhere, they increase the odds your brand is described accurately when cited.

### How do readiness files interact with RAG?

Retrieval-augmented generation pulls documents before answering. rag-index.json and rag-index.jsonl give RAG pipelines a curated starting set. llms.txt and llms-full.txt provide low-token summaries when full crawls are expensive.

If you build custom AI products on your content, these files accelerate indexing.

### What else helps besides readiness files?

Files are necessary infrastructure, not the whole strategy. Also invest in authoritative original research, clear service pages, consistent NAP and entity data, PR and backlinks, FAQ content with real answers, and Schema.org markup on key templates.

AI systems cite sources humans already trust. Readiness files make you easier to trust accurately.

**Sources**

- [GEO: Generative Engine Optimization (Princeton, Georgia Tech, et al.)](https://arxiv.org/abs/2311.09735)
- [Schema.org structured data](https://schema.org/)

**Related:**

- [Sources cited on this site](https://ai.silverbackmarketing.com/guide#sources)
- [Silverback Marketing](https://www.silverbackmarketing.com)

## Research & standards

Claims on this page cite published research, open standards, and vendor documentation.

### Generative engine research

- [GEO: Generative Engine Optimization (Princeton, Georgia Tech, et al.)](https://arxiv.org/abs/2311.09735) — Foundational KDD 2024 study on how content structure and citations affect visibility in AI-generated answers (up to ~40% lift in controlled tests).

### LLM site summary standard

- [llms.txt standard (llmstxt.org)](https://llmstxt.org/) — Community specification for a machine-readable site summary that LLMs can load in one context window.

### AI permissions & sitemaps

- [ai.txt permissions standard (aitxt.org)](https://aitxt.org/) — Open standard for machine-readable AI permissions, identity, and sitemap extensions at a site root.
- [Sitemaps.org protocol](https://www.sitemaps.org/) — Standard XML sitemap format extended by ai-sitemap.xml for AI-specific page metadata.

### Structured data & plugins

- [Schema.org structured data](https://schema.org/) — Collaborative vocabulary for machine-readable entity descriptions used by search engines and AI systems.
- [OpenAI plugin manifest format](https://platform.openai.com/docs/plugins/plugin-manifest) — Specification for .well-known/ai-plugin.json manifests that register a site as an AI-accessible resource.

### MCP & agent tooling

- [Model Context Protocol](https://modelcontextprotocol.io/) — Open protocol for connecting AI assistants to tools, resources, and data sources via a standard interface.

### AI crawler documentation

- [OpenAI GPTBot](https://platform.openai.com/docs/gptbot) — How OpenAI's web crawler accesses sites for ChatGPT and training pipelines.
- [Anthropic ClaudeBot](https://support.anthropic.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-i-give-feedback-or-request-removal) — Anthropic's documentation on Claude web crawling and opt-out.
- [Google-Extended](https://developers.google.com/search/docs/crawling-indexing/google-common-crawlers#google-extended) — Google's crawler used for Gemini and other generative AI product use cases.
- [PerplexityBot](https://docs.perplexity.ai/guides/bots) — Perplexity's bot documentation for indexing and citation in AI answers.

## About the author

Written by [Russ Wittmann](https://www.linkedin.com/in/russwittmann/), SVP of Technology.

SVP of Technology at Silverback Marketing. Helping brands become the source AI cites through GEO, AIO, technical SEO, and AI search strategy.

© 2026 Silverback Marketing. All rights reserved.