Qwen3-4B SWE-Gym KL02 SFT20K Hard-Multi Best Holdout

This repository contains a PEFT LoRA adapter checkpoint, not a standalone base model. It was produced during the Qwen3 SWE-Gym RL investigation in /home/datta0/codes/ai/qwen3-grpo-patch.

Base Model

  • unsloth/Qwen3-4B-Instruct-2507

Checkpoint

Local source checkpoint before upload:

/mnt/disks/unslothai/datta0/cache/qwen3-grpo-patch/20260605_045145_swegym_q4b-kl02-sft20k-hardmulti_10e3a3b/checkpoints/best_holdout

This is the current best trainable Qwen3-4B adapter identified in the investigation: KL-GRPO beta 0.02 with hard-multi 20k Coder-30B teacher SFT.

Evaluation Notes

Held-out SWE-Gym patch-evaluation results recorded locally:

Evaluation Greedy Selected@1 Pass@8 Multi-file pass@8
Seed 9012, 20k context 9/35 10/35 16/35 4/17
Seed 5678, 20k context 10/35 12/35 13/35 4/17

The robust claim from this checkpoint is replicated multi-file pass@8 of 4/17. The larger overall 16/35 run is a measured result, but should not be treated as fully robust without further re-sampling.

Intended Use

This checkpoint is for research on SWE-style patch generation using a search/replace edit contract. It should be loaded as a PEFT adapter on top of the base model.

Limitations

  • This is an experimental research adapter.
  • The evaluation is on a small held-out slice and is sensitive to sampling and routing choices.
  • It is not a general-purpose coding assistant release.
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for imdatta0/qwen3-4b-swegym-kl02-sft20k-hardmulti-best-holdout

Adapter
(436)
this model