The DECKARD's Brain - Gemma 4, 31b - NVFP4

NVFP4-quantized version of kabachuha/Gemma-4-The-Deckards-Brain-31B, a merge of two heretic Gemma 4 31B dense models with thinking capabilities.

Calibration was done on r/writingprompts story continuations using the modelopt library.

For 5090: Assuming you don't run heavy processes, you can easily fully fit up to ~49000 tokens context into a single GPU. This gives around 60+ tokengen/s.

Convertation to GGUF is made easily with llama.cpp's convert_hf_to_gguf.py.

The prompt format is fully inherited from DavidAU's The Deckard Thinking, meaning the chat template will have thinking by default. To disable, override the chat template with chat_template-instruct.jinja.

Thinking is highly recommended to not be ever turned off! If you are role-playing in non-English, this is crucial to bring your language output distribution closer to the English/Japanese training set, otherwise it will be much more bland and more censored!

For the rest of the information, see the original model card.

Downloads last month
63
Safetensors
Model size
18B params
Tensor type
F16
·
F8_E4M3
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kabachuha/Gemma-4-The-Deckards-Brain-31B-NVFP4

Quantized
(4)
this model