Not-For-All-Audiences

distilgpt2-grok-coder-reasoning

Model Details

Model Description

This model is a full fine-tuned version of DistilGPT2, exposed to an aggressive, completely uncapped curriculum of Grok-4 level distillation traces, hyper-creative and logic datasets, comprehensive coding logic, and mature internet discourse. It is designed to act as a highly responsive, analytical engine capable of deep structural reasoning and complex logic emulation.

Trained natively at an accelerated maximum learning rate with a cosine decay schedule, the model synthesizes diverse programmatic and theoretical domains from a massive multi-repository corpus, processed at the model's absolute maximum context window of 1024 tokens.

Developed by: GODsStrongestSoldier
Model type: Causal Language Model (Transformer Decoder)
Language: English
License: Apache 2.0
Finetuned from model: distilgpt2

Datasets Used for Fine-Tuning

This model was trained comprehensively on the full, uncapped contents of the following datasets:

Training Details

Training Procedure

The model underwent full fine-tuning without the use of adapters or LoRA layers. All native parameters of the base model were globally updated. The training harness dynamically parsed heavily nested dataset repositories, enforcing a strict shape constraint to generate mathematically perfect 1024-token continuous sequences for the GPU, maxing out the DistilGPT2 context window.

To maximize adaptation to the Grok-level reasoning data, an absolute peak learning rate (3e-4) was utilized alongside a 5% warmup phase and a cosine scheduler.

Hardware

Environment: Kaggle
Accelerators: Dual NVIDIA T4 GPUs (15GB VRAM each)

Hyperparameters

Epochs: 1
Context Window / Block Size: 1024
Per-Device Batch Size: 4
Gradient Accumulation Steps: 16
Effective Global Batch Size: 128
Peak Learning Rate: 3e-04
Learning Rate Scheduler: Cosine
Warmup Ratio: 0.05
Optimizer: Fused AdamW (adamw_torch_fused)
Mixed Precision: fp16
Gradient Checkpointing: Enabled

Downloads last month: 25

Safetensors

Model size

81.9M params

Tensor type

F32

Model tree for 11-47/Distil-Grok4.4-Minute.Codex.NSFW-0.1B

Base model

distilbert/distilgpt2

Finetuned

(1486)

this model

Quantizations

1 model