MiniLM-L6 Yat FFN Swap

This repo contains Yat student replacements for all 6 feed-forward blocks of sentence-transformers/all-MiniLM-L6-v2.

The checkpoints were produced on Kaggle with a three-phase pipeline:

  1. Phase 1: data-free random/on-shell Yat distillation for every FFN block.
  2. Phase 2: real-activation fine-tuning for every FFN block.
  3. Phase 3: patch all six blocks and run a small MTEB STS evaluation.

The published model is a lightweight patch over the base MiniLM model. Loader code in yat_minilm.py downloads the base model, loads the Phase-2 Yat checkpoints, and replaces every BERT feed_forward_chunk.

Results

Phase 1 mean rho: 0.005847

Phase 2 mean rho: 0.098501 -> 0.001715

MTEB STS scores:

Task Baseline Yat-swapped
STSBenchmark 0.820325 0.816818
STS12 0.723690 0.720878
STS16 0.789895 0.789110

Usage

from yat_minilm import load_model

model = load_model("azettaai/minilm-l6-yat-ffn-swap")
emb = model.encode(["hello world", "yat swapped minilm"])
print(emb.shape)

Files

  • phase2/block0.safetensors ... phase2/block5.safetensors: final Yat FFN replacements.
  • phase1/: random/on-shell warm-start checkpoints.
  • scripts/: Kaggle scripts used to train and evaluate the model.
  • yat_minilm.py: loader and patching code.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for azettaai/minilm-l6-yat-ffn-swap