MiniLM-L6 Yat FFN Swap

This repo contains Yat student replacements for all 6 feed-forward blocks of sentence-transformers/all-MiniLM-L6-v2.

The checkpoints were produced on Kaggle with a three-phase pipeline:

Phase 1: data-free random/on-shell Yat distillation for every FFN block.
Phase 2: real-activation fine-tuning for every FFN block.
Phase 3: patch all six blocks and run a small MTEB STS evaluation.

The published model is a lightweight patch over the base MiniLM model. Loader code in yat_minilm.py downloads the base model, loads the Phase-2 Yat checkpoints, and replaces every BERT feed_forward_chunk.

Results

Phase 1 mean rho: 0.005847

Phase 2 mean rho: 0.098501 -> 0.001715

MTEB STS scores:

Task	Baseline	Yat-swapped
STSBenchmark	0.820325	0.816818
STS12	0.723690	0.720878
STS16	0.789895	0.789110

Usage

from yat_minilm import load_model

model = load_model("azettaai/minilm-l6-yat-ffn-swap")
emb = model.encode(["hello world", "yat swapped minilm"])
print(emb.shape)

Files

phase2/block0.safetensors ... phase2/block5.safetensors: final Yat FFN replacements.
phase1/: random/on-shell warm-start checkpoints.
scripts/: Kaggle scripts used to train and evaluate the model.
yat_minilm.py: loader and patching code.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for azettaai/minilm-l6-yat-ffn-swap

Base model

nreimers/MiniLM-L6-H384-uncased

Quantized

sentence-transformers/all-MiniLM-L6-v2

Finetuned

(920)

this model