Collections
Discover the best community collections!
Collections including paper arxiv:2402.17764
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 156 -
Orion-14B: Open-source Multilingual Large Language Models
Paper • 2401.12246 • Published • 14 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 59 -
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 47
-
peteromallet/dataclaw-peteromallet
Viewer • Updated • 549 • 335 • 300 -
Qwen/Qwen3.5-35B-A3B
Image-Text-to-Text • 36B • Updated • 2.83M • • 1.44k -
Nanbeige/Nanbeige4.1-3B
Text Generation • 4B • Updated • 84.8k • • 1.11k -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 630
-
A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale
Paper • 2309.06497 • Published • 7 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 630 -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper • 2307.09288 • Published • 252
-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 24 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 86 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 156 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
-
Qwen/Qwen3.6-35B-A3B
Image-Text-to-Text • 36B • Updated • 5.91M • • 2.02k -
deepseek-ai/DeepSeek-V4-Pro
Text Generation • 862B • Updated • 5.56M • • 4.66k -
moonshotai/Kimi-K2.6
Image-Text-to-Text • 1.1T • Updated • 3.13M • • 1.41k -
openai/privacy-filter
Token Classification • 1B • Updated • 306k • • 1.61k
-
Attention Is All You Need
Paper • 1706.03762 • Published • 124 -
Scaling Laws for Neural Language Models
Paper • 2001.08361 • Published • 10 -
Training Compute-Optimal Large Language Models
Paper • 2203.15556 • Published • 11 -
Analogy Generation by Prompting Large Language Models: A Case Study of InstructGPT
Paper • 2210.04186 • Published
-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 24 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 86 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 156 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
-
Qwen/Qwen3.6-35B-A3B
Image-Text-to-Text • 36B • Updated • 5.91M • • 2.02k -
deepseek-ai/DeepSeek-V4-Pro
Text Generation • 862B • Updated • 5.56M • • 4.66k -
moonshotai/Kimi-K2.6
Image-Text-to-Text • 1.1T • Updated • 3.13M • • 1.41k -
openai/privacy-filter
Token Classification • 1B • Updated • 306k • • 1.61k
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 156 -
Orion-14B: Open-source Multilingual Large Language Models
Paper • 2401.12246 • Published • 14 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 59 -
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 47
-
peteromallet/dataclaw-peteromallet
Viewer • Updated • 549 • 335 • 300 -
Qwen/Qwen3.5-35B-A3B
Image-Text-to-Text • 36B • Updated • 2.83M • • 1.44k -
Nanbeige/Nanbeige4.1-3B
Text Generation • 4B • Updated • 84.8k • • 1.11k -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 630
-
A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale
Paper • 2309.06497 • Published • 7 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 630 -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper • 2307.09288 • Published • 252
-
Attention Is All You Need
Paper • 1706.03762 • Published • 124 -
Scaling Laws for Neural Language Models
Paper • 2001.08361 • Published • 10 -
Training Compute-Optimal Large Language Models
Paper • 2203.15556 • Published • 11 -
Analogy Generation by Prompting Large Language Models: A Case Study of InstructGPT
Paper • 2210.04186 • Published