Pre-Training Corpus
The pre-training corpus is the large dataset of text used to train an LLM before fine-tuning — which determines the model’s baseline knowledge and associations.
Technical implementation · AI Search Infrastructure
The pre-training corpus is the large dataset of text used to train an LLM before fine-tuning — which determines the model’s baseline knowledge and associations.