Training Corpus
A training corpus is the complete dataset of text used to pre-train a large language model.
Technical implementation · AI Search Infrastructure
A training corpus is the complete dataset of text used to pre-train a large language model.