🏝️ On Vacation

75 2 32

Erik

Tralalabs

mikeoxmaul's profile picture

timfox's profile picture

Lounafa2's profile picture

https://huggingface.co/openai-community/gpt2-medium

AI & ML interests

pretraining from scratch

Recent Activity

updated a dataset about 4 hours ago

SupraLarps/rizzaurapretraining

published a dataset about 4 hours ago

SupraLarps/rizzaurapretraining

new activity about 4 hours ago

SupraLarps/dumblabs-1234b:Yeah

View all activity

Organizations

Tralalabs 's collections 15

AI Datasets

datasets for training ai models

HuggingFaceH4/stack-exchange-preferences

Viewer • Updated Mar 8, 2023 • 10.8M • 12.4k • 134
HuggingFaceTB/stackexchange_2025_md

Updated Mar 25, 2025 • 5.59k • 3
HuggingFaceGECLM/StackExchange_Mar2023

Viewer • Updated Mar 16, 2023 • 19.4M • 5.3k • 5
mlfoundations-dev/stackexchange_fitness

Viewer • Updated Dec 23, 2024 • 28.3k • 55 • 2

Claude (and Others) Generated Datasets

nothingiisreal/Claude-3-Opus-Instruct-15K

Viewer • Updated Jul 17, 2024 • 29.5k • 142 • 19
mlfoundations-dev/oh-dcft-v3.1-claude-3-5-sonnet-20241022

Viewer • Updated Jan 11, 2025 • 1.05M • 7
mlfoundations-dev/oh-dcft-v3.1-claude-3-5-sonnet-20241022_test

Viewer • Updated Dec 29, 2024 • 1k • 6
mfielding92/claude-3.7-sonnet-reasoning

Viewer • Updated Mar 24, 2025 • 179 • 68 • 35

Chinese Datasets

Skywork/SkyPile-150B

Viewer • Updated Dec 7, 2023 • 1.76M • 9.86k • 407
WanJuan: A Comprehensive Multimodal Dataset for Advancing English and Chinese Large Models

Paper • 2308.10755 • Published Aug 21, 2023 • 1
fjcanyue/wikipedia-zh-cn

Viewer • Updated 29 days ago • 5.55M • 1.03k • 27
0xDing/wikipedia-cn-20230720-filtered

Viewer • Updated Jul 23, 2023 • 255k • 2.28k • 170

My Stuff

Tralalabs/diverse-trala-bpe-tokenizer-32k

Updated May 28
Tralalabs/diverse-dataset-it-162x

Viewer • Updated May 28 • 162 • 30
Tralalabs/gpt-neo-125m-diverse-qca-lora

Updated May 28

Korean Machine Learning Datasets

Datasets for AI and neural networks for Korean generation.

HAERAE-HUB/KOREAN-WEBTEXT

Viewer • Updated May 31, 2024 • 1.28M • 476 • 47
eliceai/korean-webtext-edu

Preview • Updated Jan 1 • 899 • 7
jojo0217/korean_rlhf_dataset

Viewer • Updated Sep 25, 2023 • 107k • 89 • 30
seyoungsong/Open-Korean-Historical-Corpus

Preview • Updated Oct 29, 2025 • 509 • 8

TralalabsLM

Tralalabs's model family called "TralalabsLM". A collection.

Tralalabs/TralalabsLM-160M-Test

Text Generation • 0.2B • Updated May 20 • 56

BLOOM GGUF

GGUF files of BLOOM and BLOOMz up to 7B

Tralalabs/bloomz-3b-Q8_0-GGUF

Text Generation • 3B • Updated Apr 25 • 14
Tralalabs/bloomz-3b-Q4_K_M-GGUF

Text Generation • 3B • Updated Apr 25 • 11
Tralalabs/bloomz-560m-Q8_0-GGUF

Text Generation • 0.6B • Updated Apr 25 • 50
Tralalabs/bloom-560m-Q8_0-GGUF

Text Generation • 0.6B • Updated Apr 25 • 14

Qwen 3

All the Qwen 3 family (including Qwen-Image models released in 2025-2026)

Qwen/Qwen3-0.6B

Text Generation • 0.8B • Updated Jul 26, 2025 • 27.4M • • 1.36k
Qwen/Qwen3-1.7B

Text Generation • 2B • Updated Jul 26, 2025 • 5.75M • • 491
Qwen/Qwen3-4B-Instruct-2507

Text Generation • 4B • Updated Sep 17, 2025 • 5.4M • • 887
Qwen/Qwen3-8B

Text Generation • 8B • Updated Jul 26, 2025 • 13.6M • • 1.17k

PII

PII stuff on Hugging Face

kalyan-ks/ettin-68m-nemotron-pii

Token Classification • 68.5M • Updated May 22 • 599 • 7
fastino/gliner2-privacy-filter-PII-multi

Token Classification • 0.3B • Updated 6 days ago • 38.7k • 48
OpenMed/OpenMed-PII-SuperClinical-Large-434M-v1

Token Classification • 0.4B • Updated Jan 13 • 63.5k • • 18
bardsai/eu-pii-anonimization-multilang

Token Classification • 0.3B • Updated May 13 • 3.2k • 15

Dutch AI DATASETS Collection

goldfish-models/fish-food

Viewer • Updated May 6 • 1.99B • 4.34k • 2
HuggingFaceFW/fineweb-2

Viewer • Updated Oct 27, 2025 • 4.48B • 99.1k • 827
epfml/FineWeb2-HQ

Viewer • Updated Feb 19, 2025 • 380M • 24.7k • 69

CHEETAH

Tralalabs's model family "CHEETAH" comes with different sizes finetuned on different datasets.

Tralalabs/CHEETAH-350M-LoRA

Text Generation • Updated 29 days ago • 55
Tralalabs/CHEETAH-350M-Merged-FP16

Text Generation • 0.4B • Updated 29 days ago • 146
Tralalabs/CHEETAH-350M-Merged-FP16-Q6_K-GGUF

Text Generation • 0.4B • Updated 28 days ago • 156
mradermacher/CHEETAH-350M-Merged-FP16-i1-GGUF

Text Generation • 0.4B • Updated 28 days ago • 776

My datasets

Tralalabs/qa-dataset-130x

Viewer • Updated May 25 • 129 • 11
Tralalabs/qa-dataset-263x

Viewer • Updated May 25 • 263 • 9
Tralalabs/qa-dataset-436x-deduped

Viewer • Updated May 26 • 338 • 12
Tralalabs/qa-dataset-436x

Viewer • Updated May 26 • 436 • 11

Spanish Pretraining and Finetuning Datasets

(Conjuntos de datos de preentrenamiento y finetunamiento en español)

HuggingFaceFW/fineweb-2

Viewer • Updated Oct 27, 2025 • 4.48B • 99.1k • 827
epfml/FineWeb2-HQ

Viewer • Updated Feb 19, 2025 • 380M • 24.7k • 69
allenai/c4

Viewer • Updated Jan 9, 2024 • 10.4B • 1.03M • 601
LHF/escorpius

Viewer • Updated Jan 5, 2023 • 1.25M • 494 • 17

Dataset Library

A dataset library.

HuggingFaceFW/fineweb-edu

Viewer • Updated Jul 11, 2025 • 3.5B • 383k • 1.17k
HuggingFaceFW/fineweb

Viewer • Updated Jul 11, 2025 • 52.5B • 255k • 2.91k
HuggingFaceFW/finephrase

Viewer • Updated Mar 31 • 1.02B • 419k • 130
HuggingFaceFW/finepdfs

Viewer • Updated Apr 3 • 476M • 80.3k • 882

PicoLM

The PicoLM family made by Tralalabs

Tralalabs/PicoLM-15M

Text Generation • 19M • Updated Mar 7 • 18 • 1
Tralalabs/PicoLM-0.5M

Text Generation • 479k • Updated Mar 10 • 78 • 2

AI Datasets

datasets for training ai models

HuggingFaceH4/stack-exchange-preferences

Viewer • Updated Mar 8, 2023 • 10.8M • 12.4k • 134
HuggingFaceTB/stackexchange_2025_md

Updated Mar 25, 2025 • 5.59k • 3
HuggingFaceGECLM/StackExchange_Mar2023

Viewer • Updated Mar 16, 2023 • 19.4M • 5.3k • 5
mlfoundations-dev/stackexchange_fitness

Viewer • Updated Dec 23, 2024 • 28.3k • 55 • 2

PII

PII stuff on Hugging Face

kalyan-ks/ettin-68m-nemotron-pii

Token Classification • 68.5M • Updated May 22 • 599 • 7
fastino/gliner2-privacy-filter-PII-multi

Token Classification • 0.3B • Updated 6 days ago • 38.7k • 48
OpenMed/OpenMed-PII-SuperClinical-Large-434M-v1

Token Classification • 0.4B • Updated Jan 13 • 63.5k • • 18
bardsai/eu-pii-anonimization-multilang

Token Classification • 0.3B • Updated May 13 • 3.2k • 15

Claude (and Others) Generated Datasets

nothingiisreal/Claude-3-Opus-Instruct-15K

Viewer • Updated Jul 17, 2024 • 29.5k • 142 • 19
mlfoundations-dev/oh-dcft-v3.1-claude-3-5-sonnet-20241022

Viewer • Updated Jan 11, 2025 • 1.05M • 7
mlfoundations-dev/oh-dcft-v3.1-claude-3-5-sonnet-20241022_test

Viewer • Updated Dec 29, 2024 • 1k • 6
mfielding92/claude-3.7-sonnet-reasoning

Viewer • Updated Mar 24, 2025 • 179 • 68 • 35

Dutch AI DATASETS Collection

goldfish-models/fish-food

Viewer • Updated May 6 • 1.99B • 4.34k • 2
HuggingFaceFW/fineweb-2

Viewer • Updated Oct 27, 2025 • 4.48B • 99.1k • 827
epfml/FineWeb2-HQ

Viewer • Updated Feb 19, 2025 • 380M • 24.7k • 69

Chinese Datasets

Skywork/SkyPile-150B

Viewer • Updated Dec 7, 2023 • 1.76M • 9.86k • 407
WanJuan: A Comprehensive Multimodal Dataset for Advancing English and Chinese Large Models

Paper • 2308.10755 • Published Aug 21, 2023 • 1
fjcanyue/wikipedia-zh-cn

Viewer • Updated 29 days ago • 5.55M • 1.03k • 27
0xDing/wikipedia-cn-20230720-filtered

Viewer • Updated Jul 23, 2023 • 255k • 2.28k • 170

CHEETAH

Tralalabs's model family "CHEETAH" comes with different sizes finetuned on different datasets.

Tralalabs/CHEETAH-350M-LoRA

Text Generation • Updated 29 days ago • 55
Tralalabs/CHEETAH-350M-Merged-FP16

Text Generation • 0.4B • Updated 29 days ago • 146
Tralalabs/CHEETAH-350M-Merged-FP16-Q6_K-GGUF

Text Generation • 0.4B • Updated 28 days ago • 156
mradermacher/CHEETAH-350M-Merged-FP16-i1-GGUF

Text Generation • 0.4B • Updated 28 days ago • 776

My Stuff

Tralalabs/diverse-trala-bpe-tokenizer-32k

Updated May 28
Tralalabs/diverse-dataset-it-162x

Viewer • Updated May 28 • 162 • 30
Tralalabs/gpt-neo-125m-diverse-qca-lora

Updated May 28

My datasets

Tralalabs/qa-dataset-130x

Viewer • Updated May 25 • 129 • 11
Tralalabs/qa-dataset-263x

Viewer • Updated May 25 • 263 • 9
Tralalabs/qa-dataset-436x-deduped

Viewer • Updated May 26 • 338 • 12
Tralalabs/qa-dataset-436x

Viewer • Updated May 26 • 436 • 11

Korean Machine Learning Datasets

Datasets for AI and neural networks for Korean generation.

HAERAE-HUB/KOREAN-WEBTEXT

Viewer • Updated May 31, 2024 • 1.28M • 476 • 47
eliceai/korean-webtext-edu

Preview • Updated Jan 1 • 899 • 7
jojo0217/korean_rlhf_dataset

Viewer • Updated Sep 25, 2023 • 107k • 89 • 30
seyoungsong/Open-Korean-Historical-Corpus

Preview • Updated Oct 29, 2025 • 509 • 8

Spanish Pretraining and Finetuning Datasets

(Conjuntos de datos de preentrenamiento y finetunamiento en español)

HuggingFaceFW/fineweb-2

Viewer • Updated Oct 27, 2025 • 4.48B • 99.1k • 827
epfml/FineWeb2-HQ

Viewer • Updated Feb 19, 2025 • 380M • 24.7k • 69
allenai/c4

Viewer • Updated Jan 9, 2024 • 10.4B • 1.03M • 601
LHF/escorpius

Viewer • Updated Jan 5, 2023 • 1.25M • 494 • 17

TralalabsLM

Tralalabs's model family called "TralalabsLM". A collection.

Tralalabs/TralalabsLM-160M-Test

Text Generation • 0.2B • Updated May 20 • 56

Dataset Library

A dataset library.

HuggingFaceFW/fineweb-edu

Viewer • Updated Jul 11, 2025 • 3.5B • 383k • 1.17k
HuggingFaceFW/fineweb

Viewer • Updated Jul 11, 2025 • 52.5B • 255k • 2.91k
HuggingFaceFW/finephrase

Viewer • Updated Mar 31 • 1.02B • 419k • 130
HuggingFaceFW/finepdfs

Viewer • Updated Apr 3 • 476M • 80.3k • 882

BLOOM GGUF

GGUF files of BLOOM and BLOOMz up to 7B

Tralalabs/bloomz-3b-Q8_0-GGUF

Text Generation • 3B • Updated Apr 25 • 14
Tralalabs/bloomz-3b-Q4_K_M-GGUF

Text Generation • 3B • Updated Apr 25 • 11
Tralalabs/bloomz-560m-Q8_0-GGUF

Text Generation • 0.6B • Updated Apr 25 • 50
Tralalabs/bloom-560m-Q8_0-GGUF

Text Generation • 0.6B • Updated Apr 25 • 14

PicoLM

The PicoLM family made by Tralalabs

Tralalabs/PicoLM-15M

Text Generation • 19M • Updated Mar 7 • 18 • 1
Tralalabs/PicoLM-0.5M

Text Generation • 479k • Updated Mar 10 • 78 • 2

Qwen 3

All the Qwen 3 family (including Qwen-Image models released in 2025-2026)

Qwen/Qwen3-0.6B

Text Generation • 0.8B • Updated Jul 26, 2025 • 27.4M • • 1.36k
Qwen/Qwen3-1.7B

Text Generation • 2B • Updated Jul 26, 2025 • 5.75M • • 491
Qwen/Qwen3-4B-Instruct-2507

Text Generation • 4B • Updated Sep 17, 2025 • 5.4M • • 887
Qwen/Qwen3-8B

Text Generation • 8B • Updated Jul 26, 2025 • 13.6M • • 1.17k