mlx-community/nemotron-3.5-asr-streaming-0.6b

MLX conversion of nvidia/nemotron-3.5-asr-streaming-0.6b — NVIDIA's 600M-parameter cache-aware streaming FastConformer-RNNT ASR with language-ID prompt conditioning, covering 40 language-locales with punctuation and capitalization.

This repo: weights are bfloat16 (full quality, recommended default).

See the original model card for architecture, benchmarks, and intended use.

Install

This model needs mlx-audio with Nemotron ASR support. It's merged into main but not in a PyPI release yet (latest is 0.4.3), so for now install from GitHub:

pip install "git+https://github.com/Blaizzy/mlx-audio.git"

(Once the next release ships, pip install -U mlx-audio will work.)

Use

from mlx_audio.stt import load

model = load("mlx-community/nemotron-3.5-asr-streaming-0.6b")

# auto language detection (default)
print(model.generate("speech.wav").text)

# force a language via its prompt key (en-US, es-ES, zh-CN, fr-FR, ...)
print(model.generate("speech.wav", language="en-US").text)

CLI:

python -m mlx_audio.stt.generate --model mlx-community/nemotron-3.5-asr-streaming-0.6b --audio speech.wav --format txt

generate(..., att_context_size=[left, right]) selects a trained look-ahead ([56,3], [56,0], [56,6], [56,13]); the default [56,13] gives the best offline accuracy.

Available MLX formats

Repo	Notes
`mlx-community/nemotron-3.5-asr-streaming-0.6b`	full quality (bf16)
`mlx-community/nemotron-3.5-asr-streaming-0.6b-8bit`	8-bit, matches bf16

License & attribution

Original model © NVIDIA Corporation, released under the NVIDIA Open Model License. This is a format conversion of those weights; the same license and terms apply. MLX conversion by @ARahim3 via mlx-audio.

Downloads last month: 14

Safetensors

Model size

0.6B params

Tensor type

BF16

MLX

Hardware compatibility

Quantized

Model tree for mlx-community/nemotron-3.5-asr-streaming-0.6b

Base model

nvidia/nemotron-3.5-asr-streaming-0.6b

Finetuned

(9)

this model