Home United States USA — software Mozilla's open-source speech data project, Common Voice, now has 20,000 hours of...

Mozilla's open-source speech data project, Common Voice, now has 20,000 hours of content

April 30, 2022

163

Mozilla has released its latest Common Voice dataset and for the first time, it contains 20,000 hours of content, which is almost double the number of hours that it had one year ago.
Earlier this week, Mozilla revealed that its Common Voice dataset now contains more than 20,000 hours of content that can be used by anyone around the world to improve their speech recognition software, almost double what it was a year ago. The latest dataset in the English language comes in at a huge 71 GB and now there are more languages supported than ever with the addition of Tigre, Taiwanese (Minnan), Meadow Mari, Bengali, Toki Pona, and Cantonese.