Topic coherence is the degree to which all content within a chunk or section addresses the same underlying topic. High topic coherence produces tight embeddings that retrieve consistently for their target sub-query. Low topic coherence produces diffuse embeddings that retrieve weakly across multiple sub-queries.
Topic coherence is the content-side property that determines embedding quality. Research confirms that explicitly incorporating topic structure into embedding construction reduces retrieval of off-topic chunks by statistically significant margins — and that the penalty for low-coherence passages increases as embedding models improve. The models are getting better at detecting it, not worse.