Model Submission Request: Veyra3-5M Base and Veyra-30M Base
Model Submission Request: Veyra3-5M Base and Veyra-30M Base
Submitted by: Veyra AI
Models
Veyra3-5M Base
Hugging Face Model Page: https://huggingface.co/veyra-ai/veyra3-5m-base
Veyra3-5M Base is a compact 4.5M parameter pretrained causal language model based on a small Gemma4-style architecture. It was trained from scratch as a proof-of-concept model.
| Property | Value |
|---|---|
| Model ID | veyra-ai/veyra3-5m-base |
| Parameters | ~4.5M |
| Model Type | Base / Pretrained |
| Architecture | Gemma4-style causal LM |
| Context Length | 4096 |
| Vocabulary Size | 4096 |
| Training Tokens | ~350M |
| Training Data | Cosmopedia v2 |
Veyra-30M Base
Hugging Face Model Page: https://huggingface.co/veyra-ai/veyra-30m-base-5b-tokens
Veyra-30M Base is a 36.2M parameter pretrained causal language model trained from scratch using a custom Veyra architecture. Training used approximately 5B tokens across Cosmopedia v2, FineWeb-Edu, and Python-Edu style data, with a later context-length continuation stage from 512 to 1024 tokens.
| Property | Value |
|---|---|
| Model ID | veyra-ai/veyra-30m-base-5b-tokens |
| Parameters | ~36.2M |
| Model Type | Base / Pretrained |
| Architecture | Custom Veyra causal LM |
| Context Length | 1024 |
| Vocabulary Size | 8192 |
| Training Tokens | ~5B |
| Training Data | Cosmopedia v2, FineWeb-Edu, Python-Edu |
Evaluation Setup
Evaluated with lm-evaluation-harness 0.4.12 on an NVIDIA L4 GPU using float16 precision. All listed scores are zero-shot.
Benchmarks:
- HellaSwag
- ARC-Easy
- ARC-Challenge
- PIQA
- ArithMark-2.0
Results โ acc_norm
| Model | AVG | HellaSwag | ARC-Easy | ARC-Challenge | PIQA | ArithMark-2 |
|---|---|---|---|---|---|---|
| veyra-ai/veyra3-5m-base | 29.71 | 25.83 | 26.98 | 24.57 | 49.35 | 21.84 |
| veyra-ai/veyra-30m-base-5b-tokens | 34.09 | 28.56 | 35.69 | 24.23 | 58.38 | 23.60 |
Results โ acc
| Model | AVG | HellaSwag | ARC-Easy | ARC-Challenge | PIQA | ArithMark-2 |
|---|---|---|---|---|---|---|
| veyra-ai/veyra3-5m-base | 29.53 | 25.44 | 26.05 | 19.88 | 52.18 | 24.08 |
| veyra-ai/veyra-30m-base-5b-tokens | 34.20 | 27.58 | 37.42 | 20.65 | 59.96 | 25.40 |
Notes
Veyra2 was an intermediate experimental line and did not perform as strongly as expected so I have not added it, I am working on larger Veyra3 models that should be a lot better. I can provide more info on these scores if necessary.
Done! Should both be up now