Instructions to use tahamajs/Qwen3-4b-gsm8k-Qlora-SFT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use tahamajs/Qwen3-4b-gsm8k-Qlora-SFT with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-4B") model = PeftModel.from_pretrained(base_model, "tahamajs/Qwen3-4b-gsm8k-Qlora-SFT") - Transformers
How to use tahamajs/Qwen3-4b-gsm8k-Qlora-SFT with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="tahamajs/Qwen3-4b-gsm8k-Qlora-SFT") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("tahamajs/Qwen3-4b-gsm8k-Qlora-SFT", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use tahamajs/Qwen3-4b-gsm8k-Qlora-SFT with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "tahamajs/Qwen3-4b-gsm8k-Qlora-SFT" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "tahamajs/Qwen3-4b-gsm8k-Qlora-SFT", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/tahamajs/Qwen3-4b-gsm8k-Qlora-SFT
- SGLang
How to use tahamajs/Qwen3-4b-gsm8k-Qlora-SFT with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "tahamajs/Qwen3-4b-gsm8k-Qlora-SFT" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "tahamajs/Qwen3-4b-gsm8k-Qlora-SFT", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "tahamajs/Qwen3-4b-gsm8k-Qlora-SFT" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "tahamajs/Qwen3-4b-gsm8k-Qlora-SFT", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Unsloth Studio
How to use tahamajs/Qwen3-4b-gsm8k-Qlora-SFT with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for tahamajs/Qwen3-4b-gsm8k-Qlora-SFT to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for tahamajs/Qwen3-4b-gsm8k-Qlora-SFT to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for tahamajs/Qwen3-4b-gsm8k-Qlora-SFT to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="tahamajs/Qwen3-4b-gsm8k-Qlora-SFT", max_seq_length=2048, ) - Docker Model Runner
How to use tahamajs/Qwen3-4b-gsm8k-Qlora-SFT with Docker Model Runner:
docker model run hf.co/tahamajs/Qwen3-4b-gsm8k-Qlora-SFT
Model Card for outputs_sft
outputs_sft is a Supervised Fine-Tuning (SFT) LoRA adapter on top of [Qwen/Qwen3-4B]. It was trained with TRL and PEFT.
Quick start
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
from peft import AutoPeftModelForCausalLM
REPO_ID = "outputs_sft" # Replace with your Hub repo if different
# Load base model + merge LoRA on the fly (recommended for inference)
model = AutoPeftModelForCausalLM.from_pretrained(REPO_ID, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(REPO_ID, use_fast=True)
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, device_map="auto")
prompt = "If you had a time machine and could go only once, where and when would you go? Explain your reasoning."
out = pipe(prompt, max_new_tokens=256, do_sample=True, top_p=0.9, temperature=0.7)[0]["generated_text"]
print(out)
# Alternatively, if you already merged the LoRA and saved the full model weights:
# model = AutoModelForCausalLM.from_pretrained(REPO_ID, device_map="auto")
# tokenizer = AutoTokenizer.from_pretrained(REPO_ID, use_fast=True)
Intended uses & limitations
Intended uses
- General instruction following and helpful assistant style responses.
- Short-form reasoning and everyday Q&A.
- Creative writing, drafting, and rewriting.
Limitations
- Not evaluated for safety-critical or high-stakes domains.
- May produce inaccurate, biased, or undesired content.
- Long-chain reasoning may require specialized training.
Bias, risks, and limitations: Outputs may reflect biases present in training data. Review before use in production.
Training data
- Dataset not auto-detected from the notebook. Please document your data sources.
Training procedure
This model was trained with SFT using TRL/PEFT.
PEFT / LoRA Config
- lora_dropout:
0.05 - target_modules:
["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
Precision & Quantization
- load_in_4bit:
True - bnb_4bit_compute_dtype:
float32 - dtype:
float32
Key hyperparameters
- num_train_epochs:
2 - per_device_train_batch_size:
10 - per_device_eval_batch_size:
10 - gradient_accumulation_steps:
2 - learning_rate:
9e-4 - lr_scheduler_type:
cosine - logging_steps:
2 - save_steps:
8 - save_strategy:
steps - bf16:
True - fp16:
False - seed:
42
Hardware & runtime
- GPU not detected from notebook logs.
Framework versions
- PEFT: 0.17.0
- TRL: 0.21.0
- Transformers: 4.55.1
- PyTorch: 2.8.0
- Datasets: 3.6.0
- Tokenizers: 0.21.4
Example prompts
Explain diffusion models to a 12-year-old.Write a polite email asking for an extension on a project.Summarize the following text in 3 bullet points: ...
Evaluation
No formal evaluation metrics were logged in the notebook. If you run evaluations (e.g., on MT-Bench, MMLU, or a domain-specific set), please add the results here in a Model Index block or a table.
Pushing to the Hub
from huggingface_hub import HfApi, create_repo, upload_folder
REPO_ID = "outputs_sft" # e.g., "YourUsername/outputs_sft"
# 1) Create the repo (once)
# create_repo(REPO_ID, repo_type="model", private=False)
# 2) Upload your adapter or merged model folder
upload_folder(
repo_id=REPO_ID,
folder_path="./outputs_sft", # change to your output dir
commit_message="Add SFT model",
)
License: Set
licensein the YAML header to a license compatible with the base model and your data (e.g.,apache-2.0,mit, or the specific Qwen license if required).
Citations
@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
year = 2020,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
}
@misc{mukherjee2023peft,
title = {PEFT: Parameter-Efficient Fine-Tuning},
author = {Edward Hu and others},
year = 2023,
howpublished = {\url{https://github.com/huggingface/peft}}
}
@inproceedings{wolf-etal-2020-transformers,
title = "Transformers: State-of-the-Art Natural Language Processing",
author = "Thomas Wolf and others",
booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
year = "2020"
}
- Downloads last month
- 1