A bi-encoder is the model architecture used in first-pass retrieval. It encodes the query and each document chunk independently into vectors, then compares them using cosine similarity. Fast enough for large-scale search but less accurate than a cross-encoder because it cannot attend to the interaction between query and chunk.
Bi-encoders make large-scale semantic search possible — without them, vector retrieval over millions of documents would be too slow for real-time queries. Their limitation is that they represent query and document independently, so they miss relevance signals that only appear when the two are read together. This is why reranking with a cross-encoder exists as a second stage.