SONORA Belarusian Community Project Seeks Those Who Can Help 'Connect' Belarusian Voice to BigTech
In the "TTS in Belarusian" channel, an announcement appeared about the launch of SONORA. This is a Belarusian community project where participants are recording an open Belarusian dataset for TTS — "so that Belarusian sounds natural in modern AI services," writes the Telegram channel Dzik Pic.

Illustrative photo. Photo: freepik.com
What's the problem? As stated on the website, today there are almost no high-quality Belarusian voice datasets specifically recorded for training modern TTS models. At the same time, the Belarusian language has thousands of homographs: identical spelling but different meanings depending on the stress. If the model makes a mistake with stress, both the pronunciation and the meaning are broken.
The second problem is phonetic correctness: softness, 'ў', 'дз/дж', intonation, and rhythm:
«Without high-quality studio material, models repeat errors and sound less natural».
Yes, similar initiatives already exist today, for example, Donar.by or the BexTTS model. But Sonora will continue their path and «bring it to a studio level».
Therefore, enthusiasts have launched crowdfunding to «organize professional studio recording of Belarusian speech using specially selected texts».
Researchers, enthusiasts, startups, and educational initiatives will be able to use the dataset.
In addition, the team plans partnerships with Google, OpenAI, and ElevenLabs — «so that our dataset strengthens their solutions for Belarusians».
Specifically, SONORA is looking for warm intros / direct contacts at:
- Google;
- OpenAI;
- ElevenLabs;
- Speechify;
- Meta.
«If you know anyone in these companies and can make an intro — please write to us privately, or write a request on our behalf yourself. The text of the request is here», — the creators ask.
You can listen to how the Belarusian language sounds in technologies today on the homepage.
Working in Poland or Lithuania? Support "Nasha Niva" — it's completely free for you, and we will be able to do more for Belarus and Belarusian culture!
Working in Poland or Lithuania? Support "Nasha Niva" — it's completely free for you, and we will be able to do more for Belarus and Belarusian culture!
In Ukraine, a banker published a photo of a client who was undergoing verification «against the background of the Russian flag» and got into a scandal
Comments
Ну а наконт "амаль не існуе якасных беларускамоўных галасавых датасэтаў", то наогул хлусня, на адным толькі Common Voice на дадзены момант запісана і праверана 1800 гадзін ад больш як 8000 чалавек. Такія вялікія датасэты ў свабодным доступе мала для якой мовы існуюць.