Add EvalEval community eval results

#16

YAML Metadata Error:Invalid content in Eval Result file .eval_results/hle.yaml

Check out the documentation for more information.

Show details
Task ID "none" does not match any task in dataset "cais/hle". Available: none
.eval_results/gpqa-diamond.yaml ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ - dataset:
2
+ id: Idavidrein/gpqa
3
+ task_id: diamond
4
+ date: '2026-04-19'
5
+ notes: GPQA Diamond
6
+ source:
7
+ name: EvalEval
8
+ url: https://huggingface.co/datasets/evaleval/EEE_datastore/blob/b11a260fe158662bb63b4a144be2b5690615414d/flat/objects/9c/77/9c7740fd-9cad-445c-9681-eb576e2a110e.json
9
+ value: 71.7171717172
.eval_results/hle.yaml ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ - dataset:
2
+ id: cais/hle
3
+ task_id: none
4
+ date: '2025-08-13'
5
+ source:
6
+ name: EvalEval
7
+ url: https://huggingface.co/datasets/evaleval/EEE_datastore/blob/b11a260fe158662bb63b4a144be2b5690615414d/flat/objects/e9/03/e90308ab-e898-4434-aa93-13d131887737.json
8
+ value: 8.12
.eval_results/mmlu-pro.yaml ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ - dataset:
2
+ id: TIGER-Lab/MMLU-Pro
3
+ task_id: mmlu_pro
4
+ source:
5
+ name: EvalEval
6
+ url: https://huggingface.co/datasets/evaleval/EEE_datastore/blob/b11a260fe158662bb63b4a144be2b5690615414d/flat/objects/07/43/074397da-f5e5-441c-bc45-ac16b599ca7c.json
7
+ value: 81.4