
About the author
Written by Russ Wittmann, SVP of Technology
SVP of Technology at Silverback Marketing. Helping brands become the source AI cites through GEO, AIO, technical SEO, and AI search strategy.
View LinkedIn profile (opens in a new tab)Complete reference
The complete FAQ for the AI Readiness Kit: every file explained, how to use the GitHub skill, connect the MCP server, deploy to your site, and get cited by ChatGPT, Claude, Gemini, and Perplexity.

Section 1 of 11
Why structured site files matter when AI assistants are the first stop for research, recommendations, and answers.
AI readiness means publishing structured files at your website root so large language models and AI crawlers can understand who you are, what you offer, and how to represent your brand accurately. Instead of guessing from stale training data or random page snippets, assistants can load purpose-built signals: identity files, content summaries, entity catalogs, intent maps, and policy statements.
The goal is not to trick AI systems. It is to give them the same clarity you would give a journalist, analyst, or partner before they write about your company.
When someone wants a product recommendation, a service provider, or a straight answer, they often ask ChatGPT, Claude, Gemini, or Perplexity before they visit a website. If you are not giving those systems clear, current signals, they guess. And they often guess wrong: outdated products, incorrect positioning, or a competitor cited instead of you.
Generative Engine Optimization (GEO) research formalizes this shift. Content structure, authoritative citations, and machine-readable summaries measurably affect whether your brand appears in AI-generated answers.
The kit targets the assistants and crawlers most teams care about today: OpenAI (GPTBot, ChatGPT browsing), Anthropic (ClaudeBot), Google (Google-Extended for Gemini), and Perplexity (PerplexityBot). Files like llms.txt follow an open community standard designed for any LLM that can fetch a URL. JSON entity and intent files help RAG pipelines regardless of which model sits on top.
No vendor guarantees a specific ranking in AI answers. What you control is signal quality: clear identity, structured entities, crawl permissions, and citable summaries that reduce ambiguity.
They overlap but are not identical. Traditional SEO optimizes for search engine result pages: titles, meta descriptions, Core Web Vitals, backlinks. AI readiness optimizes for how generative systems synthesize answers: plain-text summaries (llms.txt), permission and identity files (ai.txt), entity graphs (ai-entities.json), question-to-URL maps (ai-intent.json), and RAG indexes.
Strong SEO helps. Structured data (Schema.org JSON-LD) helps both. The AI Readiness Kit adds a parallel layer of files many SEO audits never mention, because search consoles were built for blue links, not conversational citations.
You need someone who can upload files to your web root and verify HTTP status codes, but you do not need a custom app. The kit ships as markdown templates, JSON schemas, and plain-text files. The GitHub skill and MCP server automate research and generation; your web team (or host) deploys the output.
Operational files like deployment-checklist.md and manifest.json exist specifically so non-engineers can track progress while developers handle DNS, CDN, and robots.txt merges.
Good copy lives inside HTML pages designed for humans in a browser. AI crawlers often fetch partial content, skip JavaScript-heavy sections, or infer meaning from navigation crumbs. Readiness files live at predictable root URLs (/llms.txt, /ai.txt) in formats models are trained to treat as authoritative context.
Think of website copy as the conversation inside your store. Readiness files are the fact sheet, floor plan, and policy binder you hand to every AI assistant before it speaks about you publicly.
Section 2 of 11
What ai.silverbackmarketing.com is, what the kit includes, and how the pieces fit together.
The AI Readiness Kit is a set of 17 purpose-built files organized in 7 categories. You deploy them at the root of any website so AI systems understand exactly who you are, what you offer, and how to represent you. The kit is free, open source, and maintained by Silverback Marketing.
Categories cover identity and permissions, content summaries, site maps, intelligence JSON, research indexes, policy statements, and operational checklists. Each file has a defined audience (crawlers, LLMs, developers, legal) and update cadence.
This site is the canonical home for the kit: documentation, live examples of every file deployed on our own domain, the hosted MCP server, and the complete guide. We eat our own cooking. The robots.txt, llms.txt, ai-entities.json, and other files you see in /public are production references you can curl or view in a browser.
The site also publishes JSON-LD, FAQ schema, speakable markup, and downloadable PDF and markdown versions so both humans and AI systems can consume the same facts.
Related
The kit includes 17 files across 7 categories:
Related
Yes. The skill, sample files, MCP server endpoint, and documentation are free to use. The GitHub repository is public. The hosted MCP server at ai.silverbackmarketing.com requires no API key.
Silverback Marketing publishes the kit as part of our GEO and AI search practice. If you want hands-on implementation help, that is a separate services conversation, but the files themselves are not paywalled.
Every critical file is deployed here so you can verify format and content before generating your own. Examples include /robots.txt, /ai.txt, /llms.txt, /llms-full.txt, /ai-sitemap.xml, /manifest.json, /rag-index.json, and /.well-known/ai-plugin.json.
Use curl -I on any path to confirm 200 OK responses. The deployment-checklist.md in the repo lists verification commands for each file type.
Section 3 of 11
How the GitHub skill researches a domain, classifies site type, and generates all output files.
The skill is a packaged workflow (SKILL.md plus supporting specs) in the public GitHub repository. It instructs a capable coding agent to research a target domain, classify the business type (SaaS, e-commerce, healthcare, local services, and more), and generate all kit files with domain-specific content.
You can run it in Claude Code, Cursor, VS Code Copilot, or any agent that loads skills from the repo. The MCP server exposes the same instructions programmatically via get_skill_instructions.
The workflow follows a structured checklist: fetch the homepage and key pages, inspect existing robots.txt and sitemap.xml, identify products and services, map navigation, extract entity candidates, and note gaps in structured data. It uses web fetch tools available to the agent rather than a proprietary crawler.
Site classification (via get_site_classification_guide) adjusts tone, compliance notes, and file emphasis. A healthcare site gets different policy language than a D2C shop.
The full workflow produces every file in the kit: robots.txt, ai.txt, llms.txt, llms-full.txt, ai-sitemap.xml, sitemap.md, ai-entities.json, ai-intent.json, ai-schema.json, rag-index.json, rag-index.jsonl, ai-disclosure.txt, training-data-policy.txt, .well-known/ai-plugin.json, structured-data-guide.md, manifest.json, deployment-checklist.md, and README.md for the output folder.
Files are written to a local output directory. You review, edit brand-sensitive sections, then upload to production.
Related
Clone or download the GitHub repository. Point your coding agent at SKILL.md in app/lib/mcp/data/ (bundled in the repo) or load the skill path in Claude Code. Provide a target URL and ask the agent to execute the full workflow.
Without MCP, the agent reads specs from file-specs.md and writes outputs locally. With MCP connected, it can call generate_ai_readiness_files and get_file_spec on demand.
Related
You should. The skill produces a strong first draft grounded in public site content, but legal disclaimers, pricing, proprietary claims, and brand voice need human review. Policy files (training-data-policy.txt, ai-disclosure.txt) especially require stakeholder sign-off.
Treat generation like a technical SEO audit export: 80% done automatically, 20% refined by someone who knows the business.
The skill is documentation plus rules: how to research, classify, and write files. The MCP server is a live API that exposes those rules as tools any connected agent can call without copying files into context.
Same brain, two interfaces. Local skill for offline or air-gapped workflows. Hosted MCP for one-URL setup in Cursor, Claude Code, Codex, and VS Code.
Related
Section 4 of 11
Connect Cursor, Claude Code, VS Code, Codex, Antigravity, or Claude Desktop to the hosted MCP endpoint.
The Model Context Protocol (MCP) server is a hosted HTTP endpoint at https://ai.silverbackmarketing.com/api/mcp. It exposes tools, bundled resources, and prompts so coding agents can generate kit files, read specifications, and run the full workflow without cloning the repository.
Transport is Streamable HTTP. Authentication is none. Any MCP-capable client can connect with a JSON config pointing at the URL.
Six tools cover the full workflow:
Any client with Streamable HTTP MCP support works. We document configs for Claude Code, Cursor, VS Code / GitHub Copilot, OpenAI Codex, Google Antigravity, and Claude Desktop (via mcp-remote stdio bridge).
Each client uses a slightly different config file path and JSON shape. The homepage MCP section includes copy-paste configs for all of them.
Related
No. The production endpoint is open. Rate limits may apply at extreme volume, but normal agency and developer usage requires no signup.
You are responsible for reviewing generated content before publishing it to your domain. The server generates drafts; you own what goes live.
Create .cursor/mcp.json in your project root (or add via Cursor Settings → MCP) with the ai-readiness server URL. Refresh MCP servers or restart Cursor. In agent mode, prompt: "Use ai-readiness to generate files for example.com".
Claude Desktop speaks MCP over stdio to local processes, not remote HTTP URLs directly. The mcp-remote package proxies the hosted HTTP server to a local stdio interface. Add it via npx in claude_desktop_config.json.
Requires Node.js 18+. Fully quit and reopen Claude Desktop after config changes.
Be explicit about the domain and the tool namespace:
Yes. The server ships with this Next.js application. Deploy the repo to your own infrastructure and point clients at your /api/mcp endpoint. Bundled skill data is compiled at build time via scripts/bundle-mcp-data.mjs.
Self-hosting makes sense for enterprises with private network requirements. Most teams use the public endpoint for convenience.
Section 5 of 11
robots.txt and ai.txt: crawler access, brand identity, and content-use rules.
robots.txt sits at the front door of your website and tells automated visitors which sections they may access. For AI readiness, it includes explicit rules for GPTBot, ClaudeBot, Google-Extended, and PerplexityBot, points crawlers toward llms.txt and ai-sitemap.xml, and keeps checkout, login, and admin paths disallowed.
Important: the kit ships a reference robots.txt. You must merge AI crawler directives into your live file rather than replacing production rules blindly.
ai.txt is your brand's introduction to AI systems. Where robots.txt controls access, ai.txt describes identity, offerings, authoritative topics, contact paths, and rules for how AI may use your content. It follows the open ai.txt permissions standard.
Deploy at /ai.txt with text/plain content type. Link it from robots.txt Allow rules so crawlers discover it early.
robots.txt is access control: which paths bots may fetch. ai.txt is identity and policy: who you are, what you sell, and your ground rules for training, RAG, and citation.
Analogy: robots.txt is the doorman. ai.txt is the briefing document you hand a journalist before an interview.
This manifest registers your site as an official AI-accessible resource in the plugin discovery format. It includes name, description, auth requirements, and API references. The kit uses it to point models at the live MCP server and documentation URLs.
Deploy at /.well-known/ai-plugin.json with application/json content type.
Section 7 of 11
Structured JSON for entities, intents, schema, and RAG pipelines.
A structured catalog of products, services, categories, locations, and key concepts on your site. It powers knowledge graph-style reasoning: which offerings relate to which topics, and what language you use to describe them.
Keep entities aligned with how customers actually search, not internal org chart names.
Maps real user questions to the best URL on your site to answer each one. Without it, AI guesses which page to recommend. With it, you supply a direct preferred URL per query pattern.
Include informational, navigational, and transactional intents with confidence levels.
Publishes Schema.org JSON-LD describing your organization, products, and key pages in a machine-readable format search engines and AI systems already consume.
Align ai-schema.json with on-page JSON-LD to avoid conflicting entity descriptions.
Pre-built indexes of your major pages for retrieval-augmented generation. Each entry includes URL, title, topics, and summary text. JSONL format streams into vector databases and embedding pipelines.
Engineering teams load these into LlamaIndex, LangChain, Pinecone, and similar tools so answers stay grounded in your content.
No. On-page JSON-LD remains essential for rich results and page-level entity markup. ai-schema.json, ai-entities.json, and ai-intent.json complement HTML by giving AI systems a single bundle at predictable root URLs.
Use structured-data-guide.md in the kit for page-type examples (homepage, product, FAQ, blog).
Section 8 of 11
Legal clarity, transparency, deployment checklists, and team playbooks.
Sets formal rules for how AI companies may use your content: model training, RAG indexing, commercial use, and attribution requirements. Every brand should decide these rules upfront rather than accepting silent scraping.
Have legal review before publishing. Policies vary by industry (healthcare, finance, government).
A public transparency statement explaining how your organization uses AI in products, content, support, or operations. It builds trust with customers and signals honesty to AI systems summarizing your brand.
Update when your AI usage changes materially.
The master inventory of every readiness file: deploy path, content type, purpose, audience, update frequency, and priority. It doubles as a readiness scorecard for audits.
AI systems and internal teams can fetch one URL to see what you have deployed.
Related
A phased launch playbook: upload order, curl verification commands, robots.txt merge steps, Search Console submission, and CDN cache notes. Designed so project managers can track progress while developers execute.
Follow it line by line on first deploy. Reuse verification commands after updates.
Developer documentation with JSON-LD examples for common page types in your stack. It bridges marketing readiness files and on-page implementation.
Share with whoever maintains your CMS or frontend templates.
Section 9 of 11
Getting files live, validating responses, and avoiding common launch mistakes.
Every file goes at your website root (or .well-known/ for ai-plugin.json) so predictable URLs work: https://yourdomain.com/llms.txt, https://yourdomain.com/ai.txt, etc.
Do not bury them in /assets or /downloads. AI crawlers and standards expect root paths.
Use HTTP HEAD or GET checks for each URL. Expect 200 OK, correct Content-Type, and no accidental redirects to HTML login pages.
Never overwrite production robots.txt with the kit template alone. Copy AI-specific User-agent blocks and Sitemap lines into your existing file. Preserve current Disallow rules, crawl-delay settings, and sitemap references.
Test in staging first. One bad Disallow rule can block your entire site.
Yes. Add it under Sitemaps in Search Console and Bing Webmaster Tools alongside your standard sitemap.xml. AI metadata tags do not replace classic SEO sitemaps; they extend them.
Teams often hit these issues on first launch:
Section 10 of 11
Update cadences, ownership, and keeping AI signals aligned with the live site.
Cadence depends on the file. manifest.json lists recommended frequencies. As a rule: llms.txt, sitemaps, and rag indexes monthly or when structure changes; ai.txt and llms-full.txt quarterly or when offerings change; policy files annually or when regulations change.
Best results come from a trio: marketing or SEO owns messaging and entity accuracy, web/dev owns deployment and JSON-LD, legal/compliance owns policy files. manifest.json makes handoffs visible.
Treat updates like a sitemap refresh, not a one-time project.
Rebrand, new product line, pricing model change, merger, regulatory event, or a major site migration all require immediate refreshes. ai-intent.json especially drifts when URLs change.
After any migration, rerun curl checks on every readiness URL.
Use manifest.json as your scorecard. Mark files deployed, last updated, and priority. Some teams add a simple spreadsheet mirror for stakeholders who do not read JSON.
Regenerate via the skill or MCP after major site changes rather than hand-editing dozens of files.
Section 11 of 11
Research-backed tactics for visibility in AI-generated answers.
GEO is the practice of optimizing content and structure for visibility in AI-generated answers, not just traditional search result pages. Research from Princeton, Georgia Tech, and collaborators shows structured citations, quotations, and statistics can significantly increase inclusion in generative responses in controlled experiments.
AI readiness files are infrastructure for GEO: they supply the structured facts models cite.
No vendor guarantees placement in ChatGPT, Claude, Gemini, or Perplexity answers. Files reduce ambiguity and improve discoverability. Combined with authoritative content, clear entities, and earned mentions elsewhere, they increase the odds your brand is described accurately when cited.
Retrieval-augmented generation pulls documents before answering. rag-index.json and rag-index.jsonl give RAG pipelines a curated starting set. llms.txt and llms-full.txt provide low-token summaries when full crawls are expensive.
If you build custom AI products on your content, these files accelerate indexing.
Files are necessary infrastructure, not the whole strategy. Also invest in authoritative original research, clear service pages, consistent NAP and entity data, PR and backlinks, FAQ content with real answers, and Schema.org markup on key templates.
AI systems cite sources humans already trust. Readiness files make you easier to trust accurately.
The plain-English guide walks through every file with analogies and priorities. Connect the MCP server to generate a custom kit for your domain.
Research & standards
Claims about how AI assistants and crawlers use web content are grounded in published research, open standards, and vendor documentation, not marketing guesswork.
Foundational KDD 2024 study on how content structure and citations affect visibility in AI-generated answers (up to ~40% lift in controlled tests).
Community specification for a machine-readable site summary that LLMs can load in one context window.
Open standard for machine-readable AI permissions, identity, and sitemap extensions at a site root.
Standard XML sitemap format extended by ai-sitemap.xml for AI-specific page metadata.
Collaborative vocabulary for machine-readable entity descriptions used by search engines and AI systems.
Specification for .well-known/ai-plugin.json manifests that register a site as an AI-accessible resource.
Open protocol for connecting AI assistants to tools, resources, and data sources via a standard interface.
How OpenAI's web crawler accesses sites for ChatGPT and training pipelines.
Anthropic's documentation on Claude web crawling and opt-out.
Google's crawler used for Gemini and other generative AI product use cases.
Perplexity's bot documentation for indexing and citation in AI answers.