NVIDIA-NeMo/Speech — ToolRadarHQ

NeMo Speech is the dedicated speech fork of NVIDIA's NeMo framework, covering automatic speech recognition and text-to-speech at the level of actual model training, not just inference calls. The score here reflects years of accumulated stars from teams who actually ship speech pipelines, and the codebase backs that up — you get distributed training, model parallelism, and a library of pretrained checkpoints that span multiple languages and acoustic conditions. What separates it from calling a hosted speech API: you own the model weights, you control the finetuning, and you can run it on your own GPU cluster without per-second billing. The tradeoff is that this is not a weekend integration. Spinning up a full NeMo training run requires real ML infrastructure familiarity — if you want a hosted ASR endpoint in an afternoon, look elsewhere. But if you are building a product where speech accuracy is a core differentiator, this is the stack to build on. -> Best for: AI engineer or ML researcher building a production speech pipeline