How to use from
llama.cpp
Install from brew
brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf arnomatic/gpt-oss-20b-heretic-scannerV1-1:
# Run inference directly in the terminal:
llama-cli -hf arnomatic/gpt-oss-20b-heretic-scannerV1-1:
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf arnomatic/gpt-oss-20b-heretic-scannerV1-1:
# Run inference directly in the terminal:
llama-cli -hf arnomatic/gpt-oss-20b-heretic-scannerV1-1:
Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf arnomatic/gpt-oss-20b-heretic-scannerV1-1:
# Run inference directly in the terminal:
./llama-cli -hf arnomatic/gpt-oss-20b-heretic-scannerV1-1:
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf arnomatic/gpt-oss-20b-heretic-scannerV1-1:
# Run inference directly in the terminal:
./build/bin/llama-cli -hf arnomatic/gpt-oss-20b-heretic-scannerV1-1:
Use Docker
docker model run hf.co/arnomatic/gpt-oss-20b-heretic-scannerV1-1:
Quick Links

GPT-OSS-20B Heretic (Scanner V1.1)

This is a decensored version of openai/gpt-oss-20b, made using a currently not available version of Heretic.

Trial 142 Results:

  • Refusals: 8/100 (Primary Goal)
  • KL Divergence: 0.94

Abliteration Parameters

Parameter Value
direction_index 16.60
attn.o_proj.max_weight 1.47
attn.o_proj.max_weight_position 9.62
attn.o_proj.min_weight 1.37
attn.o_proj.min_weight_distance 8.09

Methodology

This model was abliterated using a targeted intervention on the attn.o_proj layers, specifically focusing on layer 10+ where refusal directions were identified via layer scanning. The mlp.down_proj layers were excluded from the intervention based on scan findings proving they contributed negligible divergence.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("arnomatic/gpt-oss-20b-heretic-scannerV1-1", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("arnomatic/gpt-oss-20b-heretic-scannerV1-1")

prompt = "Generate a story about..."
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0]))
Downloads last month
83
Safetensors
Model size
2B params
Tensor type
BF16
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for arnomatic/gpt-oss-20b-heretic-scannerV1-1

Quantized
(203)
this model
Quantizations
2 models