Why Multi-Step Tool-Use Reinforcement Learning Collapses and How Supervisory Signals Fix It
Paper • 2606.26027 • Published • 18
None defined yet.
Why Multi-Step Tool-Use Reinforcement Learning Collapses and How Supervisory Signals Fix It
Look Light, Think Heavy: What Multimodal Chain-of-Thought Reasoning Can and Cannot Do