Bi-encoder

Technical implementation · AI Search Infrastructure

Definition

A bi-encoder is the model architecture used in first-pass retrieval. It encodes the query and each document chunk independently into vectors, then compares them using cosine similarity. Fast enough for large-scale search but less accurate than a cross-encoder because it cannot attend to the interaction between query and chunk.

Why It Matters for AI Search

Bi-encoders make large-scale semantic search possible — without them, vector retrieval over millions of documents would be too slow for real-time queries. Their limitation is that they represent query and document independently, so they miss relevance signals that only appear when the two are read together. This is why reranking with a cross-encoder exists as a second stage.

Definition

Why It Matters for AI Search

Cross-encoder

Reranking

First-pass retrieval

Embedding

Cosine similarity

Relevant Plate Lunch Collective Services

​Definition

​Why It Matters for AI Search

​Related Terms

Cross-encoder

Reranking

First-pass retrieval

Embedding

Cosine similarity

​Relevant Plate Lunch Collective Services

Definition

Why It Matters for AI Search

Related Terms

Relevant Plate Lunch Collective Services