AI Audio Tools

Conformer

Introducing new AI model for speech recognition with enhancements.

Tagļ¼š
Conformer-2, our latest AI model for automatic speech recognition, is now available through our API. Trained on 1.1M hours of English audio data, Conformer-2 builds upon the success of Conformer-1 by providing improvements in proper nouns, alphanumerics, and robustness to noise. The research behind Conformer-2 was inspired by DeepMind's Chinchilla paper and the Large Language Model Generative AI space.

Maintaining parity with Conformer-1 in word error rate, Conformer-2 showcases significant improvements in alphanumerics (31.7%), Proper Noun Error Rate (6.8%), and robustness to noise (12.0%). These enhancements were made possible by scaling training data to 1.1M hours of English audio data and leveraging model ensembling.

Conformer-2 is faster than Conformer-1, with latency in the inference pipeline reduced by up to 53.7%. Built on in-house hardware, training speed was approximately 1.6x faster compared to cloud providers. The API now includes a new parameter called `speech_threshold`, which allows users to set a threshold for speech presence in audio files for transcription.

With improved performance across domains and metrics, Conformer-2 offers enhanced Proper Nouns performance, alphanumeric transcription accuracy, and noise robustness. Users can try the API through the Playground, access documentation, and leverage features like model ensembling and model/dataset scaling for optimal results in speech recognition applications.

Related