Qwen3-4B SWE-Gym Moto KL02 hardmulti qwen36scheduler capped v1 adapter

LoRA adapter checkpoint from the Qwen3 GRPO patch investigation on SWE-Gym Moto tasks.

This is a durability upload for the qwen36scheduler-capped-v1 continuation, not a promoted frontier checkpoint. It starts from the prior hard-multi SFT adapter and adds a capped teacher mix containing Qwen3.6 scheduler analog rows.

Source checkpoint

/mnt/disks/unslothai/datta0/cache/qwen3-grpo-patch/20260605_125656_swegym_q4b-kl02-sft20k-hardmulti-qwen36scheduler-capped-v1_ee83de6/checkpoints/best_holdout

Local run

Run: 20260605_125656_swegym_q4b-kl02-sft20k-hardmulti-qwen36scheduler-capped-v1_ee83de6
Stage: post_sft
Training data: runs/distill_hardmulti20k_plus_qwen36scheduler_capped_v1/sft_pass.jsonl
Initial adapter: /mnt/disks/unslothai/datta0/cache/qwen3-grpo-patch/20260605_045145_swegym_q4b-kl02-sft20k-hardmulti_10e3a3b/checkpoints/best_holdout
Built-in greedy gate from checkpoint metadata: 7/35, mean reward 0.3706, patch-applied rate 25/35

Notes

Use with the matching Qwen3-4B Instruct base model and the repo's search/replace SWE-Gym evaluation harness. The checkpoint has not replaced the hard-multi SFT frontier unless later pass@8 evaluation proves otherwise.

Downloads last month: 12

Model tree for imdatta0/qwen3-4b-swegym-moto-kl02-sft20k-hardmulti-qwen36scheduler-capped-v1-adapter

Base model

Qwen/Qwen3-4B-Instruct-2507

Finetuned

unsloth/Qwen3-4B-Instruct-2507

Adapter

(436)

this model