Trimming the Long-Tail of Visual World Modeling Evaluation Paper • 2606.24256 • Published 13 days ago • 41
GBC: Gradient-Based Connections for Optimizing Multi-Agent Systems Paper • 2606.28187 • Published 10 days ago • 13
BioInsight: Multi-Agent Orchestration for Interactive Biomedical Knowledge Discovery Paper • 2606.20997 • Published 17 days ago • 8
BioInsight: Multi-Agent Orchestration for Interactive Biomedical Knowledge Discovery Paper • 2606.20997 • Published 17 days ago • 8
PlanBench-XL: Evaluating Long-Horizon Planning of LLM Tool-Use Agents in Large-Scale Tool Ecosystems Paper • 2606.22388 • Published 15 days ago • 96
PlanBench-XL: Evaluating Long-Horizon Planning of LLM Tool-Use Agents in Large-Scale Tool Ecosystems Paper • 2606.22388 • Published 15 days ago • 96 • 3
PlanBench-XL: Evaluating Long-Horizon Planning of LLM Tool-Use Agents in Large-Scale Tool Ecosystems Paper • 2606.22388 • Published 15 days ago • 96
GeoBrowse: A Geolocation Benchmark for Agentic Tool Use with Expert-Annotated Reasoning Traces Paper • 2604.04017 • Published Apr 5 • 8
PlanBench-XL: Evaluating Long-Horizon Planning of LLM Tool-Use Agents in Large-Scale Tool Ecosystems Paper • 2606.22388 • Published 15 days ago • 96
AdaPlanBench: Evaluating Adaptive Planning in Large Language Model Agents under World and User Constraints Paper • 2606.05622 • Published Jun 4 • 44 • 4
Brick-Composer: Using MLLMs for Assembly with Diverse Bricks Paper • 2606.05445 • Published Jun 3 • 8
AdaPlanBench: Evaluating Adaptive Planning in Large Language Model Agents under World and User Constraints Paper • 2606.05622 • Published Jun 4 • 44
AdaPlanBench: Evaluating Adaptive Planning in Large Language Model Agents under World and User Constraints Paper • 2606.05622 • Published Jun 4 • 44