> ## Documentation Index
> Fetch the complete documentation index at: https://wiki.platelunchcollective.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Tokenization

> Tokenization is the process of breaking text into smaller units — tokens — that a language model can process.

*Technical implementation* · *AI Search Infrastructure*

## Definition

Tokenization is the process of breaking text into smaller units — tokens — that a language model can process. Tokens are typically words, subwords, or characters, depending on the model's tokenizer. Most modern LLMs use subword tokenization schemes like BPE (Byte Pair Encoding).

## Why It Matters for AI Search

Tokenization sets an upper limit on how much content a model can process in a single context window. For long documents, content beyond the context limit is either truncated or chunked. For content strategists, understanding tokenization explains why shorter, denser content often performs better in AI extraction than long, discursive pieces — the model is working within a limited token budget, and content that answers the query early wins that budget.

## Related Terms

<CardGroup cols={2}>
  <Card title="Chunking" href="/ai-search-glossary/chunking" />

  <Card title="Context window" href="/ai-search-glossary/context-window" />

  <Card title="Foundation model" href="/ai-search-glossary/foundation-model" />

  <Card title="Inference" href="/ai-search-glossary/inference" />

  <Card title="Embedding" href="/ai-search-glossary/embedding" />
</CardGroup>

## Relevant Plate Lunch Collective Services

[AI SEO](https://www.platelunchcollective.com/services/ai-seo)  [Citation-Ready Content](https://www.platelunchcollective.com/services/citation-ready-content)
