-
Qwen/Qwen3.6-35B-A3B
Image-Text-to-Text • 36B • Updated • 5.85M • • 2.04k -
deepseek-ai/DeepSeek-V4-Pro
Text Generation • 862B • Updated • 5.52M • • 4.7k -
moonshotai/Kimi-K2.6
Image-Text-to-Text • 1.1T • Updated • 3.14M • • 1.42k -
openai/privacy-filter
Token Classification • 1B • Updated • 318k • • 1.62k
Collections
Discover the best community collections!
Collections including paper arxiv:2509.26507
-
A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale
Paper • 2309.06497 • Published • 7 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 630 -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper • 2307.09288 • Published • 252
-
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain
Paper • 2509.26507 • Published • 550 -
mHC: Manifold-Constrained Hyper-Connections
Paper • 2512.24880 • Published • 328 -
NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos
Paper • 2601.00393 • Published • 132 -
LTX-2: Efficient Joint Audio-Visual Foundation Model
Paper • 2601.03233 • Published • 181
-
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence
Paper • 2511.18538 • Published • 305 -
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain
Paper • 2509.26507 • Published • 550 -
Paper2Video: Automatic Video Generation from Scientific Papers
Paper • 2510.05096 • Published • 120 -
TradingGPT: Multi-Agent System with Layered Memory and Distinct Characters for Enhanced Financial Trading Performance
Paper • 2309.03736 • Published
-
DoPE: Denoising Rotary Position Embedding
Paper • 2511.09146 • Published • 98 -
DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation
Paper • 2511.19365 • Published • 66 -
Latent Collaboration in Multi-Agent Systems
Paper • 2511.20639 • Published • 127 -
Video Generation Models Are Good Latent Reward Models
Paper • 2511.21541 • Published • 49
-
QuantaAlpha: An Evolutionary Framework for LLM-Driven Alpha Mining
Paper • 2602.07085 • Published • 190 -
Seriki/FastHTML
Updated • 3 • 1 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 452 -
AI Can Learn Scientific Taste
Paper • 2603.14473 • Published • 429
-
Attention Is All You Need
Paper • 1706.03762 • Published • 125 -
Scaling Laws for Neural Language Models
Paper • 2001.08361 • Published • 10 -
Training Compute-Optimal Large Language Models
Paper • 2203.15556 • Published • 11 -
Analogy Generation by Prompting Large Language Models: A Case Study of InstructGPT
Paper • 2210.04186 • Published
-
Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing
Paper • 2509.08721 • Published • 665 -
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain
Paper • 2509.26507 • Published • 550 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 452 -
A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code
Paper • 2508.18106 • Published • 350
-
Qwen/Qwen3.6-35B-A3B
Image-Text-to-Text • 36B • Updated • 5.85M • • 2.04k -
deepseek-ai/DeepSeek-V4-Pro
Text Generation • 862B • Updated • 5.52M • • 4.7k -
moonshotai/Kimi-K2.6
Image-Text-to-Text • 1.1T • Updated • 3.14M • • 1.42k -
openai/privacy-filter
Token Classification • 1B • Updated • 318k • • 1.62k
-
QuantaAlpha: An Evolutionary Framework for LLM-Driven Alpha Mining
Paper • 2602.07085 • Published • 190 -
Seriki/FastHTML
Updated • 3 • 1 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 452 -
AI Can Learn Scientific Taste
Paper • 2603.14473 • Published • 429
-
A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale
Paper • 2309.06497 • Published • 7 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 630 -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper • 2307.09288 • Published • 252
-
Attention Is All You Need
Paper • 1706.03762 • Published • 125 -
Scaling Laws for Neural Language Models
Paper • 2001.08361 • Published • 10 -
Training Compute-Optimal Large Language Models
Paper • 2203.15556 • Published • 11 -
Analogy Generation by Prompting Large Language Models: A Case Study of InstructGPT
Paper • 2210.04186 • Published
-
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain
Paper • 2509.26507 • Published • 550 -
mHC: Manifold-Constrained Hyper-Connections
Paper • 2512.24880 • Published • 328 -
NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos
Paper • 2601.00393 • Published • 132 -
LTX-2: Efficient Joint Audio-Visual Foundation Model
Paper • 2601.03233 • Published • 181
-
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence
Paper • 2511.18538 • Published • 305 -
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain
Paper • 2509.26507 • Published • 550 -
Paper2Video: Automatic Video Generation from Scientific Papers
Paper • 2510.05096 • Published • 120 -
TradingGPT: Multi-Agent System with Layered Memory and Distinct Characters for Enhanced Financial Trading Performance
Paper • 2309.03736 • Published
-
DoPE: Denoising Rotary Position Embedding
Paper • 2511.09146 • Published • 98 -
DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation
Paper • 2511.19365 • Published • 66 -
Latent Collaboration in Multi-Agent Systems
Paper • 2511.20639 • Published • 127 -
Video Generation Models Are Good Latent Reward Models
Paper • 2511.21541 • Published • 49
-
Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing
Paper • 2509.08721 • Published • 665 -
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain
Paper • 2509.26507 • Published • 550 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 452 -
A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code
Paper • 2508.18106 • Published • 350