Qwen3-4B SWE-Gym Hard-Anchor Process-Local Epsilon Adapter

This repository contains the best_holdout PEFT LoRA adapter from the Qwen3 SWE-Gym patch-repair investigation.

This checkpoint is not a general instruction model. It is an experimental adapter for search/replace patch generation on the held-out SWE-Gym style code-repair harness used in the project.

Base Model

Base: unsloth/Qwen3-4B-Instruct-2507
Adapter type: LoRA
PEFT version recorded in config: 0.19.1
Local source checkpoint: /mnt/disks/unslothai/datta0/cache/qwen3-grpo-patch/20260606_123149_swegym_q4b-kl02-sft20k-hardanchor-processlocal-eps-v1-lr5e7_9a576cd/checkpoints/best_holdout

Training Context

This adapter started from the 4B hard-multi frontier adapter and received a tiny process-local SFT epsilon update while preserving the hard-multi anchor rows.

Run details:

Run tag: 20260606_123149_swegym_q4b-kl02-sft20k-hardanchor-processlocal-eps-v1-lr5e7_9a576cd
Stage: post_sft
Examples: 40
Optimizer steps: 20
Learning rate: 5e-7
Built-in first-sample held-out pass@1: 7/35
Built-in mean reward: 0.3497
Built-in patch-applied rate: 0.6857

The interrupted seed9012 pass@8 validation did not complete, so this adapter should not be treated as a promoted replacement for the hard-multi frontier without a completed resample.

Loading

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_id = "unsloth/Qwen3-4B-Instruct-2507"
adapter_id = "imdatta0/qwen3-4b-swegym-hardanchor-processlocal-eps-v1-best-holdout"

tokenizer = AutoTokenizer.from_pretrained(adapter_id)
base = AutoModelForCausalLM.from_pretrained(base_id, device_map="auto", torch_dtype="auto")
model = PeftModel.from_pretrained(base, adapter_id)

Caveats

This is an experimental research checkpoint. The training and evaluation setup used a specific retrieval and search/replace editing contract. Results are not directly comparable to general coding benchmarks without reproducing that harness.

Downloads last month: -

Model tree for imdatta0/qwen3-4b-swegym-hardanchor-processlocal-eps-v1-best-holdout

Base model

Qwen/Qwen3-4B-Instruct-2507

Finetuned

unsloth/Qwen3-4B-Instruct-2507

Adapter

(436)

this model