fastino/gliner2-privacy-filter-PII-multi Token Classification • 0.3B • Updated 16 days ago • 30.5k • 42
OmniVideo-100K: A Dataset for Audio-Visual Reasoning through Structured Scripts and Evidence Chains Paper • 2606.14702 • Published 7 days ago • 29
Bernini: Latent Semantic Planning for Video Diffusion Paper • 2605.22344 • Published 29 days ago • 18
OmniRetrieval: Unified Retrieval across Heterogeneous Knowledge Sources Paper • 2605.29250 • Published 22 days ago • 76
Gemini Embedding 2: A Native Multimodal Embedding Model from Gemini Paper • 2605.27295 • Published 24 days ago • 23
LongAV-Compass: Towards Unified Evaluation of Minute-Scale Audio-Visual Generation Across T2AV, I2AV, and V2AV Paper • 2605.26244 • Published 25 days ago • 38
LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding Paper • 2605.27365 • Published 24 days ago • 143
Lance: Unified Multimodal Modeling by Multi-Task Synergy Paper • 2605.18678 • Published May 18 • 78