Spaces:

AxiomicLabs
/

Open_SLM_Leaderboard

Running

App Files Files Community

Model Submission Request: Veyra3-5M Base and Veyra-30M Base

by Jdudeo - opened 3 days ago

Discussion

Jdudeo

3 days ago

Model Submission Request: Veyra3-5M Base and Veyra-30M Base

Submitted by: Veyra AI

Models

Veyra3-5M Base

Hugging Face Model Page: https://huggingface.co/veyra-ai/veyra3-5m-base

Veyra3-5M Base is a compact 4.5M parameter pretrained causal language model based on a small Gemma4-style architecture. It was trained from scratch as a proof-of-concept model.

Property	Value
Model ID	veyra-ai/veyra3-5m-base
Parameters	~4.5M
Model Type	Base / Pretrained
Architecture	Gemma4-style causal LM
Context Length	4096
Vocabulary Size	4096
Training Tokens	~350M
Training Data	Cosmopedia v2

Veyra-30M Base

Hugging Face Model Page: https://huggingface.co/veyra-ai/veyra-30m-base-5b-tokens

Veyra-30M Base is a 36.2M parameter pretrained causal language model trained from scratch using a custom Veyra architecture. Training used approximately 5B tokens across Cosmopedia v2, FineWeb-Edu, and Python-Edu style data, with a later context-length continuation stage from 512 to 1024 tokens.

Property	Value
Model ID	veyra-ai/veyra-30m-base-5b-tokens
Parameters	~36.2M
Model Type	Base / Pretrained
Architecture	Custom Veyra causal LM
Context Length	1024
Vocabulary Size	8192
Training Tokens	~5B
Training Data	Cosmopedia v2, FineWeb-Edu, Python-Edu

Evaluation Setup

Evaluated with lm-evaluation-harness 0.4.12 on an NVIDIA L4 GPU using float16 precision. All listed scores are zero-shot.

Benchmarks:

HellaSwag
ARC-Easy
ARC-Challenge
PIQA
ArithMark-2.0

Results — acc_norm

Model	AVG	HellaSwag	ARC-Easy	ARC-Challenge	PIQA	ArithMark-2
veyra-ai/veyra3-5m-base	29.71	25.83	26.98	24.57	49.35	21.84
veyra-ai/veyra-30m-base-5b-tokens	34.09	28.56	35.69	24.23	58.38	23.60

Results — acc

Model	AVG	HellaSwag	ARC-Easy	ARC-Challenge	PIQA	ArithMark-2
veyra-ai/veyra3-5m-base	29.53	25.44	26.05	19.88	52.18	24.08
veyra-ai/veyra-30m-base-5b-tokens	34.20	27.58	37.42	20.65	59.96	25.40

Notes

Veyra2 was an intermediate experimental line and did not perform as strongly as expected so I have not added it, I am working on larger Veyra3 models that should be a lot better. I can provide more info on these scores if necessary.

Datdanboi25

Axiomic Labs org 3 days ago

Done! Should both be up now

Datdanboi25 changed discussion status to closed 3 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment