Model Submission Request: Veyra3-5M Base and Veyra-30M Base

#4
by Jdudeo - opened

Model Submission Request: Veyra3-5M Base and Veyra-30M Base

Submitted by: Veyra AI

Models

Veyra3-5M Base

Hugging Face Model Page: https://huggingface.co/veyra-ai/veyra3-5m-base

Veyra3-5M Base is a compact 4.5M parameter pretrained causal language model based on a small Gemma4-style architecture. It was trained from scratch as a proof-of-concept model.

Property Value
Model ID veyra-ai/veyra3-5m-base
Parameters ~4.5M
Model Type Base / Pretrained
Architecture Gemma4-style causal LM
Context Length 4096
Vocabulary Size 4096
Training Tokens ~350M
Training Data Cosmopedia v2

Veyra-30M Base

Hugging Face Model Page: https://huggingface.co/veyra-ai/veyra-30m-base-5b-tokens

Veyra-30M Base is a 36.2M parameter pretrained causal language model trained from scratch using a custom Veyra architecture. Training used approximately 5B tokens across Cosmopedia v2, FineWeb-Edu, and Python-Edu style data, with a later context-length continuation stage from 512 to 1024 tokens.

Property Value
Model ID veyra-ai/veyra-30m-base-5b-tokens
Parameters ~36.2M
Model Type Base / Pretrained
Architecture Custom Veyra causal LM
Context Length 1024
Vocabulary Size 8192
Training Tokens ~5B
Training Data Cosmopedia v2, FineWeb-Edu, Python-Edu

Evaluation Setup

Evaluated with lm-evaluation-harness 0.4.12 on an NVIDIA L4 GPU using float16 precision. All listed scores are zero-shot.

Benchmarks:

  • HellaSwag
  • ARC-Easy
  • ARC-Challenge
  • PIQA
  • ArithMark-2.0

Results โ€” acc_norm

Model AVG HellaSwag ARC-Easy ARC-Challenge PIQA ArithMark-2
veyra-ai/veyra3-5m-base 29.71 25.83 26.98 24.57 49.35 21.84
veyra-ai/veyra-30m-base-5b-tokens 34.09 28.56 35.69 24.23 58.38 23.60

Results โ€” acc

Model AVG HellaSwag ARC-Easy ARC-Challenge PIQA ArithMark-2
veyra-ai/veyra3-5m-base 29.53 25.44 26.05 19.88 52.18 24.08
veyra-ai/veyra-30m-base-5b-tokens 34.20 27.58 37.42 20.65 59.96 25.40

Notes

Veyra2 was an intermediate experimental line and did not perform as strongly as expected so I have not added it, I am working on larger Veyra3 models that should be a lot better. I can provide more info on these scores if necessary.

Axiomic Labs org

Done! Should both be up now

Datdanboi25 changed discussion status to closed

Sign up or log in to comment