Skip to main content
Technical implementation · AI Search Infrastructure

Definition

A cross-encoder is the model architecture used in reranking. It takes a query-chunk pair as joint input — processing both together — and outputs a relevance score. Unlike a bi-encoder, it can attend to the interaction between query and chunk, making it dramatically more accurate at relevance scoring. Cross-encoders are too slow to run over a full index at query time, which is why retrieval pipelines separate first-pass retrieval (bi-encoder speed) from reranking (cross-encoder accuracy). The cross-encoder sees far fewer candidates — the top-k from first pass — and can afford to be thorough. Content that answers a query completely and specifically scores better under cross-encoder reranking than content that is merely topically adjacent.

Reranking

Bi-encoder

First-pass retrieval

Retrieval pipeline

Cosine similarity

Relevant Plate Lunch Collective Services

AI SEO