ECI_{sem}: Semantic Residual Effective Contrastive Information for Evaluating Hard Negatives
Abstract
ECI_sem, a semantic residual variant of Effective Contrastive Information, ranks negative sources for dense retrieval using frozen embeddings without requiring training, achieving strong performance on MS MARCO and BEIR benchmarks.
Hard-negative source selection for dense retrieval is usually decided only after fine-tuning and downstream evaluation. We propose ECI_{sem}, a semantic residual variant of Effective Contrastive Information (ECI) that ranks candidate negative sources using frozen target-encoder embeddings. ECI_{sem} is training-free, not label-free: each scored example requires a query, a labeled positive, and an explicit candidate negative. ECI_{sem} builds a weighted residual information matrix from target consistency, semantic locality, lexical residuality, and a log-determinant diversity objective. On MS MARCO negative sources, in-family ECI_{sem} ranks LLM negatives highest among non-hybrid sources and Dense+LLM highest among hybrid sources, matching the strongest aggregate BEIR transfer results across DistilBERT, E5-base, and Contriever. Controlled ablations show that this alignment depends on using the target encoder family, while additional ablations show stability under sample-size, temperature, tokenizer, and IDF-corpus perturbations. The theory gives a local linearized link to loss reduction, while the empirical study treats downstream evaluation as the final test.
Community
This paper introduces a training free metric to evaluate hard-negatives.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- MuCon: Clipped Muon Updates for LLM Training (2026)
- How Much Is One Recurrence Worth? Iso-Depth Scaling Laws for Looped Language Models (2026)
- Future Validity is the Missing Statistic: From Impossibility to $\Phi$-Estimation for Grammar-Faithful Speculative Decoding (2026)
- A Closed-Form Persistence-Landmark Pipeline for Certified Point-Cloud and Graph Classification (2026)
- Nonsmooth Nonconvex-Concave Minimax Optimization: Convergence Criteria and Algorithms (2026)
- A Closed-Form Adaptive-Landmark Kernel for Certified Point-Cloud and Graph Classification (2026)
- Toward a Characterization of Simulation Between Arithmetic Theories (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2603.20990 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper