distilgpt2-grok-coder-reasoning

Model Details

Model Description

This model is a full fine-tuned version of DistilGPT2, exposed to an aggressive, completely uncapped curriculum of Grok-4 level distillation traces, hyper-creative and logic datasets, comprehensive coding logic, and mature internet discourse. It is designed to act as a highly responsive, analytical engine capable of deep structural reasoning and complex logic emulation.

Trained natively at an accelerated maximum learning rate with a cosine decay schedule, the model synthesizes diverse programmatic and theoretical domains from a massive multi-repository corpus, processed at the model's absolute maximum context window of 1024 tokens.

  • Developed by: GODsStrongestSoldier
  • Model type: Causal Language Model (Transformer Decoder)
  • Language: English
  • License: Apache 2.0
  • Finetuned from model: distilgpt2

Datasets Used for Fine-Tuning

This model was trained comprehensively on the full, uncapped contents of the following datasets:


Training Details

Training Procedure

The model underwent full fine-tuning without the use of adapters or LoRA layers. All native parameters of the base model were globally updated. The training harness dynamically parsed heavily nested dataset repositories, enforcing a strict shape constraint to generate mathematically perfect 1024-token continuous sequences for the GPU, maxing out the DistilGPT2 context window.

To maximize adaptation to the Grok-level reasoning data, an absolute peak learning rate (3e-4) was utilized alongside a 5% warmup phase and a cosine scheduler.

Hardware

  • Environment: Kaggle
  • Accelerators: Dual NVIDIA T4 GPUs (15GB VRAM each)

Hyperparameters

  • Epochs: 1
  • Context Window / Block Size: 1024
  • Per-Device Batch Size: 4
  • Gradient Accumulation Steps: 16
  • Effective Global Batch Size: 128
  • Peak Learning Rate: 3e-04
  • Learning Rate Scheduler: Cosine
  • Warmup Ratio: 0.05
  • Optimizer: Fused AdamW (adamw_torch_fused)
  • Mixed Precision: fp16
  • Gradient Checkpointing: Enabled
Downloads last month
25
Safetensors
Model size
81.9M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for 11-47/Distil-Grok4.4-Minute.Codex.NSFW-0.1B

Finetuned
(1486)
this model
Quantizations
1 model