> ## Documentation Index > Fetch the complete documentation index at: https://wiki.platelunchcollective.com/llms.txt > Use this file to discover all available pages before exploring further. # Training Corpus > A training corpus is the complete dataset of text used to pre-train a large language model. *Technical implementation* · *AI Search Infrastructure* ## Definition A training corpus is the complete dataset of text used to pre-train a large language model. For models like GPT, Claude, and Gemini, this includes vast amounts of web content, books, and structured data collected up to a specific cutoff date. ## Why It Matters for AI Search Brand content that appears in a model's training corpus becomes part of what that model "knows" about the world — independent of any real-time retrieval. This is a separate channel from RAG-based citation. A brand that was well-represented in quality web content before a model's training cutoff has a baseline presence in that model's knowledge that newer brands lack. Publishing high-quality, widely-referenced content is a long-term investment in training corpus presence, not just retrieval. ## Related Terms ## Relevant Plate Lunch Collective Services [Citation-Ready Content](https://www.platelunchcollective.com/services/citation-ready-content) [AI SEO](https://www.platelunchcollective.com/services/ai-seo)