Instructions to use llmfan46/G4-MeroMero-26B-A4B-it-uncensored-heretic-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use llmfan46/G4-MeroMero-26B-A4B-it-uncensored-heretic-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="llmfan46/G4-MeroMero-26B-A4B-it-uncensored-heretic-GGUF", filename="G4-MeroMero-26B-A4B-it-uncensored-heretic-BF16.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use llmfan46/G4-MeroMero-26B-A4B-it-uncensored-heretic-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf llmfan46/G4-MeroMero-26B-A4B-it-uncensored-heretic-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf llmfan46/G4-MeroMero-26B-A4B-it-uncensored-heretic-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf llmfan46/G4-MeroMero-26B-A4B-it-uncensored-heretic-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf llmfan46/G4-MeroMero-26B-A4B-it-uncensored-heretic-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf llmfan46/G4-MeroMero-26B-A4B-it-uncensored-heretic-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf llmfan46/G4-MeroMero-26B-A4B-it-uncensored-heretic-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf llmfan46/G4-MeroMero-26B-A4B-it-uncensored-heretic-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf llmfan46/G4-MeroMero-26B-A4B-it-uncensored-heretic-GGUF:Q4_K_M
Use Docker
docker model run hf.co/llmfan46/G4-MeroMero-26B-A4B-it-uncensored-heretic-GGUF:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use llmfan46/G4-MeroMero-26B-A4B-it-uncensored-heretic-GGUF with Ollama:
ollama run hf.co/llmfan46/G4-MeroMero-26B-A4B-it-uncensored-heretic-GGUF:Q4_K_M
- Unsloth Studio
How to use llmfan46/G4-MeroMero-26B-A4B-it-uncensored-heretic-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for llmfan46/G4-MeroMero-26B-A4B-it-uncensored-heretic-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for llmfan46/G4-MeroMero-26B-A4B-it-uncensored-heretic-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for llmfan46/G4-MeroMero-26B-A4B-it-uncensored-heretic-GGUF to start chatting
- Pi
How to use llmfan46/G4-MeroMero-26B-A4B-it-uncensored-heretic-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf llmfan46/G4-MeroMero-26B-A4B-it-uncensored-heretic-GGUF:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "llmfan46/G4-MeroMero-26B-A4B-it-uncensored-heretic-GGUF:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use llmfan46/G4-MeroMero-26B-A4B-it-uncensored-heretic-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf llmfan46/G4-MeroMero-26B-A4B-it-uncensored-heretic-GGUF:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default llmfan46/G4-MeroMero-26B-A4B-it-uncensored-heretic-GGUF:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use llmfan46/G4-MeroMero-26B-A4B-it-uncensored-heretic-GGUF with Docker Model Runner:
docker model run hf.co/llmfan46/G4-MeroMero-26B-A4B-it-uncensored-heretic-GGUF:Q4_K_M
- Lemonade
How to use llmfan46/G4-MeroMero-26B-A4B-it-uncensored-heretic-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull llmfan46/G4-MeroMero-26B-A4B-it-uncensored-heretic-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.G4-MeroMero-26B-A4B-it-uncensored-heretic-GGUF-Q4_K_M
List all available models
lemonade list
🚨⚠️ I HAVE REACHED HUGGING FACE'S FREE STORAGE LIMIT ⚠️🚨
I can no longer upload new models unless I can cover the cost of additional storage.
I host 70+ free models as an independent contributor and this work is unpaid.
Without your support, no more new models can be uploaded.
🎉 Patreon (Monthly) | ☕ Ko-fi (One-time)
Every contribution goes directly toward Hugging Face storage fees to keep models free for everyone.
GGUF quantizations of llmfan46/G4-MeroMero-26B-A4B-it-uncensored-heretic.
88% fewer refusals (12/100 Uncensored vs 99/100 Original) while preserving model quality (0.0152 KL divergence).
❤️ Support My Work
Creating these models takes significant time, work and compute. If you find them useful consider supporting me:
| Platform | Link | What you get |
|---|---|---|
| 🎉 Patreon | Monthly support | Priority model requests |
| ☕ Ko-fi | One-time tip | My eternal gratitude |
Your help will motivate me and would go into further improving my workflow and coverings fees for storage, compute and may even help uncensoring bigger model with rental Cloud GPUs.
This is a decensored version of zerofata/G4-MeroMero-26B-A4B, made using Heretic v1.2.0 with the Arbitrary-Rank Ablation (ARA) method
Abliteration parameters
| Parameter | Value |
|---|---|
| start_layer_index | 15 |
| end_layer_index | 26 |
| preserve_good_behavior_weight | 0.3274 |
| steer_bad_behavior_weight | 0.0005 |
| overcorrect_relative_weight | 0.6647 |
| neighbor_count | 15 |
Targeted components
- attn.o_proj
Performance
| Metric | This model | Original model (G4-MeroMero-26B-A4B) |
|---|---|---|
| KL divergence | 0.0152 | 0 (by definition) |
| Refusals | ✅ 12/100 | ❌ 99/100 |
Lower refusals indicate fewer content restrictions, while lower KL divergence indicates more closeness to the original model's baseline. Higher refusals cause more rejections, objections, pushbacks, lecturing, censorship, softening and deflections.
MMLU test results:
Original:
============================================================
Total questions: 7021
Correct: 5758
Accuracy: 0.8201 (82.01%)
Parse failures: 9
============================================================
Tested subject scores:
- professional_law: 0.6841 (537/785)
- moral_scenarios: 0.6991 (309/442)
- miscellaneous: 0.9191 (352/383)
- professional_psychology: 0.8829 (279/316)
- high_school_psychology: 0.9556 (258/270)
- high_school_macroeconomics: 0.8934 (176/197)
- elementary_mathematics: 0.8804 (162/184)
- moral_disputes: 0.8333 (145/174)
- prehistory: 0.9070 (156/172)
- philosophy: 0.8365 (133/159)
- high_school_biology: 0.9605 (146/152)
- professional_accounting: 0.7692 (110/143)
- clinical_knowledge: 0.8714 (122/140)
- high_school_microeconomics: 0.9265 (126/136)
- nutrition: 0.8815 (119/135)
- professional_medicine: 0.8433 (113/134)
- conceptual_physics: 0.8672 (111/128)
- high_school_mathematics: 0.4803 (61/127)
- human_aging: 0.7931 (92/116)
- security_studies: 0.7946 (89/112)
- high_school_statistics: 0.8018 (89/111)
- marketing: 0.9725 (106/109)
- high_school_world_history: 0.8962 (95/106)
- sociology: 0.9029 (93/103)
- high_school_government_and_politics: 0.9505 (96/101)
- high_school_geography: 0.9394 (93/99)
- high_school_chemistry: 0.8144 (79/97)
- high_school_us_history: 0.9158 (87/95)
- virology: 0.5393 (48/89)
- college_medicine: 0.8068 (71/88)
- world_religions: 0.8636 (76/88)
- high_school_physics: 0.7024 (59/84)
- electrical_engineering: 0.7901 (64/81)
- astronomy: 0.9114 (72/79)
- logical_fallacies: 0.8158 (62/76)
- high_school_european_history: 0.9041 (66/73)
- anatomy: 0.8451 (60/71)
- college_biology: 0.9219 (59/64)
- human_sexuality: 0.8594 (55/64)
- formal_logic: 0.6875 (44/64)
- public_relations: 0.7049 (43/61)
- international_law: 0.9333 (56/60)
- college_physics: 0.7544 (43/57)
- college_mathematics: 0.6182 (34/55)
- econometrics: 0.7407 (40/54)
- jurisprudence: 0.8679 (46/53)
- high_school_computer_science: 0.9423 (49/52)
- machine_learning: 0.8462 (44/52)
- medical_genetics: 0.9216 (47/51)
- global_facts: 0.5294 (27/51)
- management: 0.9000 (45/50)
- us_foreign_policy: 0.9400 (47/50)
- college_chemistry: 0.5532 (26/47)
- abstract_algebra: 0.7234 (34/47)
- business_ethics: 0.7826 (36/46)
- college_computer_science: 0.8000 (36/45)
- computer_security: 0.8140 (35/43)
Heretic:
============================================================
Total questions: 7021
Correct: 5698
Accuracy: 0.8116 (81.16%)
Parse failures: 6
============================================================
Tested subject scores:
- professional_law: 0.6510 (511/785)
- moral_scenarios: 0.7059 (312/442)
- miscellaneous: 0.9164 (351/383)
- professional_psychology: 0.8861 (280/316)
- high_school_psychology: 0.9519 (257/270)
- high_school_macroeconomics: 0.8985 (177/197)
- elementary_mathematics: 0.8696 (160/184)
- moral_disputes: 0.8276 (144/174)
- prehistory: 0.8953 (154/172)
- philosophy: 0.8428 (134/159)
- high_school_biology: 0.9539 (145/152)
- professional_accounting: 0.6853 (98/143)
- clinical_knowledge: 0.9000 (126/140)
- high_school_microeconomics: 0.9265 (126/136)
- nutrition: 0.8815 (119/135)
- professional_medicine: 0.8134 (109/134)
- conceptual_physics: 0.8516 (109/128)
- high_school_mathematics: 0.4803 (61/127)
- human_aging: 0.8276 (96/116)
- security_studies: 0.7946 (89/112)
- high_school_statistics: 0.7658 (85/111)
- marketing: 0.9725 (106/109)
- high_school_world_history: 0.8868 (94/106)
- sociology: 0.8932 (92/103)
- high_school_government_and_politics: 0.9505 (96/101)
- high_school_geography: 0.9394 (93/99)
- high_school_chemistry: 0.7526 (73/97)
- high_school_us_history: 0.9158 (87/95)
- virology: 0.5169 (46/89)
- college_medicine: 0.8409 (74/88)
- world_religions: 0.8750 (77/88)
- high_school_physics: 0.6786 (57/84)
- electrical_engineering: 0.8025 (65/81)
- astronomy: 0.9114 (72/79)
- logical_fallacies: 0.7763 (59/76)
- high_school_european_history: 0.8904 (65/73)
- anatomy: 0.8732 (62/71)
- college_biology: 0.8906 (57/64)
- human_sexuality: 0.9219 (59/64)
- formal_logic: 0.6875 (44/64)
- public_relations: 0.7213 (44/61)
- international_law: 0.9333 (56/60)
- college_physics: 0.6842 (39/57)
- college_mathematics: 0.5636 (31/55)
- econometrics: 0.7222 (39/54)
- jurisprudence: 0.8491 (45/53)
- high_school_computer_science: 0.9423 (49/52)
- machine_learning: 0.8077 (42/52)
- medical_genetics: 0.9216 (47/51)
- global_facts: 0.4706 (24/51)
- management: 0.8800 (44/50)
- us_foreign_policy: 0.9400 (47/50)
- college_chemistry: 0.4894 (23/47)
- abstract_algebra: 0.7447 (35/47)
- business_ethics: 0.8261 (38/46)
- college_computer_science: 0.8222 (37/45)
- computer_security: 0.8605 (37/43)
MMLU - Massive Multitask Language Understanding, multiple-choice questions across 57 subjects (math, history, law, medicine, etc.).
Quantizations
For the K-quants below, selected Gemma 4 attention and FFN tensors are kept at higher precision where useful.
Gemma 4 does not use the ssm_alpha, ssm_beta, or ssm_out tensors found in some Qwen-style hybrid/SSM architectures. Instead, these GGUFs preserve key Gemma 4 attention projection tensors at higher precision.
Q6_Kuses a higher-quality XL-style layout:attn_q,attn_k,attn_v, andattn_outputare kept asQ8_0.ffn_gate,ffn_up, andffn_downare kept asQ8_0.ffn_down_expsis requested asQ6_Kwhere supported. Some tensors may fall back toQ8_0due to Gemma 4 tensor shape constraints.
Q5_K_M,Q5_K_S,Q4_K_M, andQ4_K_Skeep the main attention projection tensors asQ8_0:attn_qattn_kattn_vattn_output
Q3_K_LandQ3_K_Mkeep the main attention projection tensors asBF16:attn_qattn_kattn_vattn_output
This helps preserve Gemma 4’s attention path at higher precision, especially for lower-bit quants, while avoiding large file-size increases from unnecessarily up-quantizing the largest MoE expert tensors.
| Filename | Quant | Description |
|---|---|---|
| G4-MeroMero-26B-A4B-it-uncensored-heretic-BF16.gguf | BF16 | Full precision |
| G4-MeroMero-26B-A4B-it-uncensored-heretic-Q8_0.gguf | Q8_0 | Near-lossless, recommended |
| G4-MeroMero-26B-A4B-it-uncensored-heretic-Q6_K.gguf | Q6_K | Excellent quality |
| G4-MeroMero-26B-A4B-it-uncensored-heretic-Q5_K_M.gguf | Q5_K_M | Good balance |
| G4-MeroMero-26B-A4B-it-uncensored-heretic-Q5_K_S.gguf | Q5_K_S | Smaller Q5 |
| G4-MeroMero-26B-A4B-it-uncensored-heretic-Q4_K_M.gguf | Q4_K_M | Good for limited VRAM |
| G4-MeroMero-26B-A4B-it-uncensored-heretic-Q4_K_S.gguf | Q4_K_S | Smaller Q4 |
| G4-MeroMero-26B-A4B-it-uncensored-heretic-Q3_K_L.gguf | Q3_K_L | Low VRAM, decent quality |
| G4-MeroMero-26B-A4B-it-uncensored-heretic-Q3_K_M.gguf | Q3_K_M | Low VRAM, smaller |
Vision Projector
| Filename | Quant | Description |
|---|---|---|
| G4-MeroMero-26B-A4B-it-uncensored-heretic-mmproj-BF16.gguf | BF16 | Native precision |
A Vision Projector File is Required for vision/multimodal capabilities. Use alongside any quantization above.
Usage
Works with llama.cpp, LM Studio, Ollama, and other GGUF-compatible tools.
Mero Mero
Gemma4 26B A4BGod, this model was difficult to work with.
Google cooked, there wasn't a lot to improve but there was a lot to break.
This model is a finetune that was merged back into the original instruct. It feels a lot like the original instruct. However, reasoning is more structured, using less tokens during RP and this model generally has a slightly less verbose / flowery writing style.
Main weakness of this model I think is the swipe variety hasn't improved. Logic and repetition I think are roughly on par with the original.
Supports both thinking and non thinking.
Creation Process: SFT > Merge
SFT on approx 35 million tokens.
Despite using 35 million tokens, this dataset is fairly modest in size. Trainable is somewhere in the rough ballpark of 15 million. The extra tokens are from a new multi turn RP dataset that I train last turn only.
Feels like Google left the instruct model at the razor's edge of overfitting. Finetune it at all and it feels like it'll rapidly lose intelligence, despite taking the writing style nicely. Hard to tell if you're overfitting or underfitting.
My solution was to blast the model with my data anyway to ensure it picked up the new reasoning format and writing style and then merge that back into the instruct to heal the logic damage. There's still room for a better merge that keeps more of the writing style and potentially using the base model to undo some of the overfitting.
Trained using Axolotl.
Mergekit Config
models:
- model: google/gemma-4-26B-A4B-it
parameters:
weight: 0.5
- model: ApocalypseParty/G4-26B-SFT-6
parameters:
weight: 0.5
merge_method: linear
dtype: bfloat16
Axolotl Config
# Gemma 4 26B-A4B MoE QLoRA with ScatterMoE kernels
#
# Validated: 50 steps on FineTome-100k, loss 8.8 -> 1.8, single RTX 5090 (32GB)
# torch_compile=true: 21 GiB peak VRAM, ~230 tok/s, 336s total
#
# Key notes:
# - Max sequence length on 32GB GPU: 2048 (micro_batch_size=1, SDP attention).
# 4096 seq_len OOMs due to head_dim=512 math SDP materializing full score matrix.
# Use 48GB+ GPUs for longer sequences or multi-GPU with FSDP.
base_model: google/gemma-4-26B-A4B-it
plugins:
- axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
- axolotl.integrations.kernels.KernelsPlugin
- axolotl.integrations.liger.LigerPlugin
use_kernels: true
use_scattermoe: true
cut_cross_entropy: true
experts_implementation: scattermoe
liger_layer_norm: true
liger_rope: true
liger_rms_norm: true
liger_glu_activation: true
liger_rms_norm_gated: true
strict: false
datasets:
- path: ./data/gemma_4_sft_5_masked_20260415_082234.jsonl
val_set_size: 0.02
output_dir: ./G4-26B-SFT-6
sequence_len: 10756
pad_to_sequence_len: true
sample_packing: true
load_in_4bit: false
#quantize_moe_experts: true
adapter: lora
lora_r: 128
lora_alpha: 128
peft_use_rslora: true
lora_dropout: 0.0
freeze_mm_modules: true
# Restrict LoRA to text backbone only (skip vision/audio encoders)
# using regex to match only the text decoder attention projections.
lora_target_modules: 'model.language_model.layers.[\d]+.(_checkpoint_wrapped_module.)?(mlp|self_attn).(up|down|gate|q|k|v|o)_proj'
# MoE expert LoRA (3D Parameter tensors, not nn.Linear)
lora_target_parameters:
- experts.gate_up_proj
- experts.down_proj
lora_mlp_kernel: false
lora_qkv_kernel: false
lora_o_kernel: false
#bnb_config_kwargs:
# bnb_4bit_use_double_quant: true
wandb_project: G4-26B-SFT
wandb_name: G4-26B-SFT-6
gradient_accumulation_steps: 2
micro_batch_size: 2
num_epochs: 2
optimizer: adamw_torch_fused
lr_scheduler: constant_with_warmup
learning_rate: 1e-5
max_grad_norm: 1.0
bf16: auto
tf32: true
#gradient_checkpointing: true
#activation_offloading: true
logging_steps: 1
# FA2 not supported
sdp_attention: true
#flex_attention: true
#torch_compile: true
flash_attention: false
warmup_ratio: 0.1
evals_per_epoch: 4
saves_per_epoch: 4
weight_decay: 0.01
special_tokens:
fsdp_config:
fsdp_version: 2
offload_params: false
cpu_ram_efficient_loading: false
auto_wrap_policy: TRANSFORMER_BASED_WRAP
transformer_layer_cls_to_wrap: Gemma4TextDecoderLayer
state_dict_type: FULL_STATE_DICT
sharding_strategy: FULL_SHARD
reshard_after_forward: true
activation_checkpointing: true
- Downloads last month
- 15,217
3-bit
4-bit
5-bit
6-bit
8-bit
16-bit
Model tree for llmfan46/G4-MeroMero-26B-A4B-it-uncensored-heretic-GGUF
Base model
google/gemma-4-26B-A4B