Common Voice

Common Voice is a crowdsourcing project started by Mozilla to create a free database for speech recognition software. The project is supported by volunteers who record sample sentences with a microphone and review recordings of other users. The transcribed sentences will be collected in a voice database available under the public domain license CC0. This license ensures that developers can use the database for voice-to-text applications without restrictions or costs.

Common Voice
Developer(s)Mozilla Foundation
Initial releaseJune 19, 2017 (2017-06-19)
Repositoryhttps://github.com/mozilla/voice-web
Available inMultilingual (List of languages)
LicenseCreative Commons CC0
Websitecommonvoice.mozilla.org

Aims

Common Voice aims to provide diverse voice samples. According to Mozilla's Katharina Borchert, many existing projects took datasets from public radio or otherwise had datasets that underrepresented both women and people with pronounced accents.[1]

Voice database

The first dataset was released in November 2017. More than 20,000 users worldwide had recorded 500 hours of English sentences.[2]

In February 2019, the first batch of languages was released for use. This included 18 languages: English, French, German and Mandarin Chinese, but also less prevalent languages as Welsh and Kabyle. In total, this included almost 1,400 hours of recorded voice data from more than 42,000 contributors.[3]

As of July 2020 the database has amassed 7,226 hours of voice recordings in 54 languages, 5,591 hours of which has been verified by volunteers.[4]

In May 2021, following the work to add Kinyarwanda, they received a grant to add Kiswahili.[5]

See also

References

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.