Instructions to use bue0912/ToolOmni-Qwen3-4B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use bue0912/ToolOmni-Qwen3-4B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="bue0912/ToolOmni-Qwen3-4B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("bue0912/ToolOmni-Qwen3-4B")
model = AutoModelForMultimodalLM.from_pretrained("bue0912/ToolOmni-Qwen3-4B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use bue0912/ToolOmni-Qwen3-4B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "bue0912/ToolOmni-Qwen3-4B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "bue0912/ToolOmni-Qwen3-4B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/bue0912/ToolOmni-Qwen3-4B

SGLang

How to use bue0912/ToolOmni-Qwen3-4B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "bue0912/ToolOmni-Qwen3-4B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "bue0912/ToolOmni-Qwen3-4B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "bue0912/ToolOmni-Qwen3-4B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "bue0912/ToolOmni-Qwen3-4B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use bue0912/ToolOmni-Qwen3-4B with Docker Model Runner:
```
docker model run hf.co/bue0912/ToolOmni-Qwen3-4B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

ToolOmni

ToolOmni is a tool-use language model released for the ACL 2026 Main Conference paper ToolOmni: Enabling Open-World Tool Use via Agentic Learning with Proactive Retrieval and Grounded Execution.

This checkpoint is built on top of Qwen/Qwen3-4B-Instruct and is designed for open-world tool use. The model is trained to proactively retrieve relevant tools and generate grounded multi-step tool calls for downstream task completion.

Model Description

Model type: Causal language model for tool use
Base model: Qwen/Qwen3-4B-Instruct
Paper venue: ACL 2026 Main Conference
Codebase: training, evaluation, retrieval, and tool execution utilities are available in the public repository

Intended Uses

This model is intended for:

research on tool-use agents
benchmarking open-world tool retrieval and grounded execution
studying retrieval-augmented and execution-aware training
reproducing the ToolOmni evaluation pipeline

The model is expected to work best together with the ToolOmni codebase, retriever, and tool execution environment.

Training

ToolOmni follows an agentic learning framework with:

proactive tool retrieval
grounded tool execution
reinforcement learning for multi-step tool-use behavior

The training and evaluation pipeline is released in the ToolOmni repository.

Evaluation

ToolOmni is evaluated on ToolBench-style benchmarks in both:

with-api-list / golden-tool settings
open-domain settings without golden tool lists

Please refer to the project repository and paper for the detailed evaluation protocol and benchmark results.

Repository

Paper: https://arxiv.org/abs/2604.13787
Code: https://github.com/Huangsz2021/ToolOmni
Model: https://huggingface.co/bue0912/ToolOmni-Qwen3-4B
Dataset: https://huggingface.co/datasets/bue0912/ToolOmni-Data
Collection: https://huggingface.co/collections/bue0912/toolomni

Citation

@misc{huang2026toolomnienablingopenworldtool,
      title={ToolOmni: Enabling Open-World Tool Use via Agentic learning with Proactive Retrieval and Grounded Execution}, 
      author={Shouzheng Huang and Meishan Zhang and Baotian Hu and Min Zhang},
      year={2026},
      eprint={2604.13787},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2604.13787}, 
}