Agentic Chain-of-Thought Steering for Efficient and Controllable LLM Reasoning
Abstract
Agentic Chain-of-Thought Steering (ACTS) formulates reasoning steering as a Markov decision process to enable efficient, controllable chain-of-thought reasoning with token savings.
Large language models improve final-answer accuracy through extended chain-of-thought reasoning, but often spend tokens inefficiently and offer little inference-time control. Existing efficient reasoning methods control thinking length by shortening, early-stopping, or compressing traces, leaving how the model thinks implicit. In this paper, we propose Agentic Chain-of-Thought Steering (ACTS), which formulates reasoning steering as a Markov decision process where a controller agent adaptively steers a frozen reasoner during inference. At each step, the controller observes the reasoning trace and remaining thinking budget, then issues a steering action consisting of a reasoning strategy and a steering phrase that initiates the next reasoner step. This enables budget-aware strategy control for efficient reasoning while preserving the reasoner's generation continuity. We initialize the controller agent from our constructed synthetic steering trajectories with multi-budget augmentation, and further optimize it via reinforcement learning with budget-conditioned reward shaping. Experiments across multiple benchmarks show that ACTS matches full-thinking performance with substantial token savings, and enables controllable accuracy-efficiency trade-offs across different reasoners and tasks. The code is available at https://github.com/Andree-9/ACTS.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Adaptive Latent Agentic Reasoning (2026)
- AdaptR1: Reinforcement Learning Based Adaptive Interleaved Thinking in Multi-hop Question Answering (2026)
- LEAD: Length-Efficient Adaptive and Dynamic Reasoning for Large Language Models (2026)
- ExpThink: Experience-Guided Reinforcement Learning for Adaptive Chain-of-Thought Compression (2026)
- DeepTool: Scaling Interleaved Deliberation in Tool-Integrated Reasoning via Process-Supervised Reinforcement Learning (2026)
- Process Reward Agents for Steering Knowledge-Intensive Reasoning (2026)
- Learning Agent-Compatible Context Management for Long-Horizon Tasks (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Models citing this paper 1
Datasets citing this paper 1
yuuxia/controller-sft-data
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper