How to use from
llama.cpp
Install from brew
brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Ayodele01/gemma-4-E2B-Gemini-3.1-Pro-Reasoning-Distill-GGUF:
# Run inference directly in the terminal:
llama-cli -hf Ayodele01/gemma-4-E2B-Gemini-3.1-Pro-Reasoning-Distill-GGUF:
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Ayodele01/gemma-4-E2B-Gemini-3.1-Pro-Reasoning-Distill-GGUF:
# Run inference directly in the terminal:
llama-cli -hf Ayodele01/gemma-4-E2B-Gemini-3.1-Pro-Reasoning-Distill-GGUF:
Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Ayodele01/gemma-4-E2B-Gemini-3.1-Pro-Reasoning-Distill-GGUF:
# Run inference directly in the terminal:
./llama-cli -hf Ayodele01/gemma-4-E2B-Gemini-3.1-Pro-Reasoning-Distill-GGUF:
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Ayodele01/gemma-4-E2B-Gemini-3.1-Pro-Reasoning-Distill-GGUF:
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Ayodele01/gemma-4-E2B-Gemini-3.1-Pro-Reasoning-Distill-GGUF:
Use Docker
docker model run hf.co/Ayodele01/gemma-4-E2B-Gemini-3.1-Pro-Reasoning-Distill-GGUF:
Quick Links

Gemma-4 E2B Gemini 3.1 Pro Reasoning Distill - GGUF

GGUF quantized versions of Ayodele01/gemma-4-E2B-Gemini-3.1-Pro-Reasoning-Distill.

Model Description

This is Google's Gemma-4 2B (E2B) instruction-tuned model, fine-tuned on Gemini 3.1 Pro reasoning datasets to improve chain-of-thought reasoning capabilities.

Training Data

Training Configuration

  • Base Model: google/gemma-4b-it (via unsloth/gemma-4-E2B-it)
  • Method: LoRA fine-tuning
  • LoRA Config: r=8, alpha=8, dropout=0.1
  • Learning Rate: 5e-5
  • Epochs: 0.5
  • Framework: Unsloth + TRL

Available Quantizations

Filename Quant Type Size Description
gemma-4-E2B-Gemini-3.1-Pro-Reasoning-Distill.gguf BF16 ~5GB Full precision, best quality
gemma-4-E2B-Gemini-3.1-Pro-Reasoning-Distill-Q8_0.gguf Q8_0 ~2.5GB High quality
gemma-4-E2B-Gemini-3.1-Pro-Reasoning-Distill-Q5_K_M.gguf Q5_K_M ~2GB Balanced (recommended)
gemma-4-E2B-Gemini-3.1-Pro-Reasoning-Distill-Q4_K_M.gguf Q4_K_M ~1.7GB Good quality, smaller
gemma-4-E2B-Gemini-3.1-Pro-Reasoning-Distill-Q3_K_M.gguf Q3_K_M ~1.4GB Smallest

Usage with llama.cpp

# Download a quantized model
wget https://huggingface.co/Ayodele01/gemma-4-E2B-Gemini-3.1-Pro-Reasoning-Distill-GGUF/resolve/main/gemma-4-E2B-Gemini-3.1-Pro-Reasoning-Distill-Q5_K_M.gguf

# Run with llama.cpp
./llama-cli -m gemma-4-E2B-Gemini-3.1-Pro-Reasoning-Distill-Q5_K_M.gguf \
  -p "What is the sum of all prime numbers between 1 and 50?" \
  -n 512

Usage with Ollama

Create a Modelfile:

FROM ./gemma-4-E2B-Gemini-3.1-Pro-Reasoning-Distill-Q5_K_M.gguf

TEMPLATE """<bos><start_of_turn>user
{{ .Prompt }}<end_of_turn>
<start_of_turn>model
"""

PARAMETER stop "<end_of_turn>"
PARAMETER temperature 0.7

Then:

ollama create gemma4-e2b-reasoning -f Modelfile
ollama run gemma4-e2b-reasoning

License

This model is released under the Gemma License.

Related Models

Downloads last month
1,112
GGUF
Model size
5B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

3-bit

4-bit

5-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support