Instructions to use rhoninseiei/Crow-9B-Opus-4.6-Distill-Heretic_Qwen3.5-NVFP4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use rhoninseiei/Crow-9B-Opus-4.6-Distill-Heretic_Qwen3.5-NVFP4 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="rhoninseiei/Crow-9B-Opus-4.6-Distill-Heretic_Qwen3.5-NVFP4")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("rhoninseiei/Crow-9B-Opus-4.6-Distill-Heretic_Qwen3.5-NVFP4")
model = AutoModelForImageTextToText.from_pretrained("rhoninseiei/Crow-9B-Opus-4.6-Distill-Heretic_Qwen3.5-NVFP4")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use rhoninseiei/Crow-9B-Opus-4.6-Distill-Heretic_Qwen3.5-NVFP4 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "rhoninseiei/Crow-9B-Opus-4.6-Distill-Heretic_Qwen3.5-NVFP4"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "rhoninseiei/Crow-9B-Opus-4.6-Distill-Heretic_Qwen3.5-NVFP4",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/rhoninseiei/Crow-9B-Opus-4.6-Distill-Heretic_Qwen3.5-NVFP4

SGLang

How to use rhoninseiei/Crow-9B-Opus-4.6-Distill-Heretic_Qwen3.5-NVFP4 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "rhoninseiei/Crow-9B-Opus-4.6-Distill-Heretic_Qwen3.5-NVFP4" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "rhoninseiei/Crow-9B-Opus-4.6-Distill-Heretic_Qwen3.5-NVFP4",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "rhoninseiei/Crow-9B-Opus-4.6-Distill-Heretic_Qwen3.5-NVFP4" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "rhoninseiei/Crow-9B-Opus-4.6-Distill-Heretic_Qwen3.5-NVFP4",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use rhoninseiei/Crow-9B-Opus-4.6-Distill-Heretic_Qwen3.5-NVFP4 with Docker Model Runner:
```
docker model run hf.co/rhoninseiei/Crow-9B-Opus-4.6-Distill-Heretic_Qwen3.5-NVFP4
```

Crow-9B-Opus-4.6-Distill-Heretic_Qwen3.5 NVFP4 (ModelOpt)

This repository contains a Hugging Face export of crownelius/Crow-9B-Opus-4.6-Distill-Heretic_Qwen3.5 quantized with NVIDIA ModelOpt to NVFP4 for Blackwell-oriented inference.

Relationship to the source model

Source model: crownelius/Crow-9B-Opus-4.6-Distill-Heretic_Qwen3.5
This repo is a quantized derivative of that model.
Architecture in exported config: Qwen3_5ForConditionalGeneration
Format: standard Hugging Face checkpoint with hf_quant_config.json

Quantization summary

Quantizer: NVIDIA ModelOpt 0.41.0
Output quantization: NVFP4
Weight format used: qformat=nvfp4_mlp_only
KV cache format: fp8
Output dtype metadata: bfloat16
Export format: Hugging Face checkpoint
Main weight file: model.safetensors

Quantization environment

Quantization was completed in a WSL2 Ubuntu 24.04 environment with NVIDIA drivers already configured on the host.

Primary container used for PTQ/export:

nvcr.io/nvidia/tensorrt-llm/release:1.2.0rc4

Important runtime/tooling details used during successful export:

TensorRT-Model-Optimizer source checkout
transformers 5.3.0.dev0 source-injected to recognize qwen3_5
local patch to the official examples/llm_ptq/hf_ptq.py flow so image calibration could work on this non-Nemotron multimodal model

PTQ calibration settings used in the successful run

calibration mode: image-text calibration
calib_size=128
calib_seq=8192
batch_size=1
peak observed single-GPU memory during quantization: about 25.55 GB

Files in this repo

model.safetensors
config.json
hf_quant_config.json
processor_config.json
tokenizer.json
tokenizer_config.json
generation_config.json
chat_template.jinja

Serving status

Validated locally with:

SGLang 0.5.9
transformers 5.3.0.dev0
--quantization modelopt_fp4
--attention-backend triton

Local validation covered:

text chat in Chinese, Japanese, English, and mixed prompts
multimodal image understanding
simple concurrent requests
long-context retrieval

Example SGLang launch

A tested container image for this model family is available at:

rhoninseiei/sglang-qwen35-nvfp4:sglang0.5.9-transformers5.3.0dev0

Example launch:

docker run -d \
  --name sglang_qwen35_nvfp4 \
  --gpus all \
  --ipc=host \
  --ulimit memlock=-1 \
  --ulimit stack=67108864 \
  -e MODEL_PATH=/models/Crow-9B-Opus-4.6-Distill-Heretic_Qwen3.5-NVFP4-ModelOpt \
  -p 31000:30000 \
  -v /path/to/models:/models \
  rhoninseiei/sglang-qwen35-nvfp4:sglang0.5.9-transformers5.3.0dev0

Notes

This checkpoint is intended for ModelOpt FP4 / NVFP4 aware runtimes.
In local testing, current stable vLLM did not support this exact Qwen3_5ForConditionalGeneration architecture even though ModelOpt/NVFP4 support exists more generally.
The included chat_template.jinja was adjusted so thinking output is suppressed by default for cleaner chat responses.

Disclaimer

This is an unofficial quantized redistribution of the source model.
Users must review and comply with the original model license, upstream runtime licenses, and any applicable distribution or export restrictions.
No claim is made that every runtime or every hardware target will load this checkpoint unchanged.

Downloads last month: 104

Safetensors

Model size

7B params

Tensor type

BF16

F8_E4M3

Model tree for rhoninseiei/Crow-9B-Opus-4.6-Distill-Heretic_Qwen3.5-NVFP4

Base model

Qwen/Qwen3.5-9B-Base

Finetuned

trohrbaugh/Qwen3.5-9B-heretic-v2

Quantized

Crownelius/Crow-9B-HERETIC-4.6

Quantized

(20)

this model