mlx-community/nemotron-3.5-asr-streaming-0.6b

MLX conversion of nvidia/nemotron-3.5-asr-streaming-0.6b — NVIDIA's 600M-parameter cache-aware streaming FastConformer-RNNT ASR with language-ID prompt conditioning, covering 40 language-locales with punctuation and capitalization.

This repo: weights are bfloat16 (full quality, recommended default).

See the original model card for architecture, benchmarks, and intended use.

Install

This model needs mlx-audio with Nemotron ASR support. It's merged into main but not in a PyPI release yet (latest is 0.4.3), so for now install from GitHub:

pip install "git+https://github.com/Blaizzy/mlx-audio.git"

(Once the next release ships, pip install -U mlx-audio will work.)

Use

from mlx_audio.stt import load

model = load("mlx-community/nemotron-3.5-asr-streaming-0.6b")

# auto language detection (default)
print(model.generate("speech.wav").text)

# force a language via its prompt key (en-US, es-ES, zh-CN, fr-FR, ...)
print(model.generate("speech.wav", language="en-US").text)

CLI:

python -m mlx_audio.stt.generate --model mlx-community/nemotron-3.5-asr-streaming-0.6b --audio speech.wav --format txt

generate(..., att_context_size=[left, right]) selects a trained look-ahead ([56,3], [56,0], [56,6], [56,13]); the default [56,13] gives the best offline accuracy.

Available MLX formats

License & attribution

Original model © NVIDIA Corporation, released under the NVIDIA Open Model License. This is a format conversion of those weights; the same license and terms apply. MLX conversion by @ARahim3 via mlx-audio.

Downloads last month
14
Safetensors
Model size
0.6B params
Tensor type
BF16
·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mlx-community/nemotron-3.5-asr-streaming-0.6b

Finetuned
(9)
this model