Instructions to use llmware/dragon-llama-3.1-gguf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use llmware/dragon-llama-3.1-gguf with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="llmware/dragon-llama-3.1-gguf", filename="dragon-llama31.gguf", )
output = llm( "Once upon a time,", max_tokens=512, echo=True ) print(output)
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use llmware/dragon-llama-3.1-gguf with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf llmware/dragon-llama-3.1-gguf # Run inference directly in the terminal: llama-cli -hf llmware/dragon-llama-3.1-gguf
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf llmware/dragon-llama-3.1-gguf # Run inference directly in the terminal: llama-cli -hf llmware/dragon-llama-3.1-gguf
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf llmware/dragon-llama-3.1-gguf # Run inference directly in the terminal: ./llama-cli -hf llmware/dragon-llama-3.1-gguf
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf llmware/dragon-llama-3.1-gguf # Run inference directly in the terminal: ./build/bin/llama-cli -hf llmware/dragon-llama-3.1-gguf
Use Docker
docker model run hf.co/llmware/dragon-llama-3.1-gguf
- LM Studio
- Jan
- Ollama
How to use llmware/dragon-llama-3.1-gguf with Ollama:
ollama run hf.co/llmware/dragon-llama-3.1-gguf
- Unsloth Studio
How to use llmware/dragon-llama-3.1-gguf with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for llmware/dragon-llama-3.1-gguf to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for llmware/dragon-llama-3.1-gguf to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for llmware/dragon-llama-3.1-gguf to start chatting
- Docker Model Runner
How to use llmware/dragon-llama-3.1-gguf with Docker Model Runner:
docker model run hf.co/llmware/dragon-llama-3.1-gguf
- Lemonade
How to use llmware/dragon-llama-3.1-gguf with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull llmware/dragon-llama-3.1-gguf
Run and chat with the model
lemonade run user.dragon-llama-3.1-gguf-{{QUANT_TAG}}List all available models
lemonade list
DRAGON-LLAMA-3.1-GGUF
dragon-llama-3.1-gguf is RAG-instruct trained on top of a Llama-3.1 base model.
Benchmark Tests
Evaluated against the benchmark test: RAG-Instruct-Benchmark-Tester
1 Test Run (temperature=0.0, sample=False) with 1 point for correct answer, 0.5 point for partial correct or blank / NF, 0.0 points for incorrect, and -1 points for hallucinations.
--Accuracy Score: 94.0 correct out of 100
--Not Found Classification: 70.0%
--Boolean: 90.0%
--Math/Logic: 72.5%
--Complex Questions (1-5): 4 (Above Average - table-reading, causal)
--Summarization Quality (1-5): 4 (Above Average)
--Hallucinations: No hallucinations but a few instances of drawing on 'background' knowledge.
For test run results (and good indicator of target use cases), please see the files ("core_rag_test" and "answer_sheet" in this repo).
The inference accuracy tests were performed on this model (GGUF 4_K_M) not the original Pytorch, and it is possible that the original Pytorch may score higher, but we have chosen to use the quantized version as it is most representative of the likely use of the model for inference.
Please compare with dragon-llama2-7b or the most recent dragon-mistral-0.3.
Model Description
- Developed by: llmware
- Model type: Llama-8b-3.1-Base
- Language(s) (NLP): English
- License: Llama-3.1 Community License
- Finetuned from model: Llama-3.1-Base
Bias, Risks, and Limitations
Any model can provide inaccurate or incomplete information, and should be used in conjunction with appropriate safeguards and fact-checking mechanisms.
How to Get Started with the Model
To pull the model via API:
from huggingface_hub import snapshot_download
snapshot_download("llmware/dragon-llama-3.1-gguf", local_dir="/path/on/your/machine/", local_dir_use_symlinks=False)
Load in your favorite GGUF inference engine, or try with llmware as follows:
from llmware.models import ModelCatalog
# to load the model and make a basic inference
model = ModelCatalog().load_model("llmware/dragon-llama-3.1-gguf", temperature=0.0, sample=False)
response = model.inference(query, add_context=text_sample)
Details on the prompt wrapper and other configurations are on the config.json file in the files repository.
Model Card Contact
Darren Oberst & llmware team
- Downloads last month
- 47
We're not able to determine the quantization variants.