A cross-encoder is the model architecture used in reranking. It takes a query-chunk pair as joint input — processing both together — and outputs a relevance score. Unlike a bi-encoder, it can attend to the interaction between query and chunk, making it dramatically more accurate at relevance scoring.
Cross-encoders are too slow to run over a full index at query time, which is why retrieval pipelines separate first-pass retrieval (bi-encoder speed) from reranking (cross-encoder accuracy). The cross-encoder sees far fewer candidates — the top-k from first pass — and can afford to be thorough. Content that answers a query completely and specifically scores better under cross-encoder reranking than content that is merely topically adjacent.