Instructions to use RichardErkhov/Vikhrmodels_-_salt-asr_wav-uni_1_tts_wav-uni_1-12k-gguf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use RichardErkhov/Vikhrmodels_-_salt-asr_wav-uni_1_tts_wav-uni_1-12k-gguf with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="RichardErkhov/Vikhrmodels_-_salt-asr_wav-uni_1_tts_wav-uni_1-12k-gguf", filename="salt-asr_wav-uni_1_tts_wav-uni_1-12k.IQ3_M.gguf", )
output = llm( "Once upon a time,", max_tokens=512, echo=True ) print(output)
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use RichardErkhov/Vikhrmodels_-_salt-asr_wav-uni_1_tts_wav-uni_1-12k-gguf with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf RichardErkhov/Vikhrmodels_-_salt-asr_wav-uni_1_tts_wav-uni_1-12k-gguf:Q4_K_M # Run inference directly in the terminal: llama-cli -hf RichardErkhov/Vikhrmodels_-_salt-asr_wav-uni_1_tts_wav-uni_1-12k-gguf:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf RichardErkhov/Vikhrmodels_-_salt-asr_wav-uni_1_tts_wav-uni_1-12k-gguf:Q4_K_M # Run inference directly in the terminal: llama-cli -hf RichardErkhov/Vikhrmodels_-_salt-asr_wav-uni_1_tts_wav-uni_1-12k-gguf:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf RichardErkhov/Vikhrmodels_-_salt-asr_wav-uni_1_tts_wav-uni_1-12k-gguf:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf RichardErkhov/Vikhrmodels_-_salt-asr_wav-uni_1_tts_wav-uni_1-12k-gguf:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf RichardErkhov/Vikhrmodels_-_salt-asr_wav-uni_1_tts_wav-uni_1-12k-gguf:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf RichardErkhov/Vikhrmodels_-_salt-asr_wav-uni_1_tts_wav-uni_1-12k-gguf:Q4_K_M
Use Docker
docker model run hf.co/RichardErkhov/Vikhrmodels_-_salt-asr_wav-uni_1_tts_wav-uni_1-12k-gguf:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use RichardErkhov/Vikhrmodels_-_salt-asr_wav-uni_1_tts_wav-uni_1-12k-gguf with Ollama:
ollama run hf.co/RichardErkhov/Vikhrmodels_-_salt-asr_wav-uni_1_tts_wav-uni_1-12k-gguf:Q4_K_M
- Unsloth Studio
How to use RichardErkhov/Vikhrmodels_-_salt-asr_wav-uni_1_tts_wav-uni_1-12k-gguf with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for RichardErkhov/Vikhrmodels_-_salt-asr_wav-uni_1_tts_wav-uni_1-12k-gguf to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for RichardErkhov/Vikhrmodels_-_salt-asr_wav-uni_1_tts_wav-uni_1-12k-gguf to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for RichardErkhov/Vikhrmodels_-_salt-asr_wav-uni_1_tts_wav-uni_1-12k-gguf to start chatting
- Docker Model Runner
How to use RichardErkhov/Vikhrmodels_-_salt-asr_wav-uni_1_tts_wav-uni_1-12k-gguf with Docker Model Runner:
docker model run hf.co/RichardErkhov/Vikhrmodels_-_salt-asr_wav-uni_1_tts_wav-uni_1-12k-gguf:Q4_K_M
- Lemonade
How to use RichardErkhov/Vikhrmodels_-_salt-asr_wav-uni_1_tts_wav-uni_1-12k-gguf with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull RichardErkhov/Vikhrmodels_-_salt-asr_wav-uni_1_tts_wav-uni_1-12k-gguf:Q4_K_M
Run and chat with the model
lemonade run user.Vikhrmodels_-_salt-asr_wav-uni_1_tts_wav-uni_1-12k-gguf-Q4_K_M
List all available models
lemonade list
Run and chat with the model
lemonade run user.Vikhrmodels_-_salt-asr_wav-uni_1_tts_wav-uni_1-12k-gguf-List all available models
lemonade listYAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Quantization made by Richard Erkhov.
salt-asr_wav-uni_1_tts_wav-uni_1-12k - GGUF
- Model creator: https://huggingface.co/Vikhrmodels/
- Original model: https://huggingface.co/Vikhrmodels/salt-asr_wav-uni_1_tts_wav-uni_1-12k/
Original model description:
English Version 🇬🇧
Model Performance Overview
Metrics:
- PESQ@200: Perceptual Evaluation of Speech Quality (higher = better).
- STOI@200: Short-Time Objective Intelligibility (closer to 1 = better).
- SI-SDR@200: Scale-Invariant Signal-to-Distortion Ratio (higher = better).
- SIM-O@200: Similarity to ground truth (higher = better).
| Model | PESQ@200 | STOI@200 | SI-SDR@200 | SIM-O@200 |
|---|---|---|---|---|
| Original (LibriSpeech) | 4.15 | 0.997 | 27.45 ±1.09 | — |
| Parler TTS Mini v1 | 1.29 ±0.49 | 0.15 ±0.12 | 25.0 ±2.9 | 0.88 ±0.03 |
| Fish Speech 1.5 | 1.26 ±0.38 | 0.17 ±0.12 | 25.0 ±3.2 | 0.91 ±0.02 |
| **Salt-ASR Wav-Uni 1-12k ** | 1.27 ±0.40 | 0.18 ±0.09 | 20.3 ±3.69 | 0.88 ±0.02 |
Our Solution
- Method: Extends a pre-trained LLM with audio tokens and fine-tunes on TTS and ASR tasks.
- Training:
- SpeechTokenizer (semantic + audio tokens) outperformed Encodec (loss explosions resolved with TF32 precision).
- Training time: 150 A100 GPU hours.
- Advantages: Unified LM loss for dual tasks, minimal training overhead.
Resources
- Code: GitHub Repo
- Inference Demo: Google Colab
- Reference Papers: Vitta, Valle
Русская Версия 🇷🇺
Сравнение моделей
Метрики:
- PESQ@200: Качество речи (чем выше, тем лучше).
- STOI@200: Разборчивость речи (ближе к 1 = лучше).
- SI-SDR@200: Соотношение сигнал-шум (выше = лучше).
- SIM-O@200: Сходство с эталоном (выше = лучше).
| Модель | PESQ@200 | STOI@200 | SI-SDR@200 | SIM-O@200 |
|---|---|---|---|---|
| Original (LibriSpeech) | 4.15 | 0.997 | 27.45 ±1.09 | — |
| Parler TTS Mini v1 | 1.25 ±0.49 | 0.15 ±0.12 | 25.0 ±2.9 | 0.88 ±0.03 |
| Fish Speech 1.5 | 1.26 ±0.38 | 0.17 ±0.12 | 25.0 ±3.2 | 0.91 ±0.02 |
| **Salt-ASR Wav-Uni 1-12k ** | 1.27 ±0.40 | 0.18 ±0.09 | 20.3 ±3.69 | 0.88 ±0.02 |
Наше решение
- Метод: Расширение словаря LLM аудиотокенами + дообучение на TTS и ASR.
- Обучение:
- SpeechTokenizer (семитические + аудиотокены) показал лучшие результаты, чем Encodec.
- Время обучения: 150 часов на A100.
- Преимущества: Единая функция потерь для двух задач, минимальные затраты.
Ресурсы
- Код: GitHub
- Демо: Google Colab
Примечание: Модель поддерживает генерацию коротких фраз на английском, немецком и французском.
- Downloads last month
- 501
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
Pull the model
# Download Lemonade from https://lemonade-server.ai/lemonade pull RichardErkhov/Vikhrmodels_-_salt-asr_wav-uni_1_tts_wav-uni_1-12k-gguf: