Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

My interest in offline TTS is actually entirely unrelated to the automation space:

I'm interested in Text to Speech for creative pursuits, such as video game voice dialogue and animated videos.

This is one of the reasons why the range & quantity of available voices is particularly important to me.

After all, you can't really have scene set in a board room with nine characters[3] if you've only got three voices to go around. :)

I've actually been spending time this week on updating my "Dialogue Tool"[1] application (originally created to work with Larynx to help with narrative dialogue workflows such as voice "auditioning", intelligent caching & multiple voice recordings) to work with Piper.

Which is where I ran into the question of how to navigate/curate a collection of more than 900+ voices.

The main approaches I'm using so far are:

(1) Random luck--just audition a bunch of different voices with your sample dialogue & see what you like.

(2) Curation/sorting based on quality-related meta-data from the original dataset.

(3) Generating a different dialogue line for each voice that includes their speaker number for identification purposes that also (hopefully) isn't tedious to listen to for 900+ voices. :)

I haven't quite finished/uploaded results from (3) yet but example output based on approaches (3) & (2) can be heard here: https://rancidbacon.gitlab.io/piper-tts-demos/

The recording has two sets of 10 voices which had the lowest Word Error Rate scores in the original dataset--which doesn't mean the resulting voice model is necessary good but is at least a starting point for exploring.

I'd also like to explore more analysis-based approaches for grouping/curation (e.g. vocal characteristics such "softer", "lower", "older") but as I'm not getting paid for this[2], that's likely a longer term thing.

A different approach which I've previously found really interesting is to use voices as a prompt for writing narrative dialogue. It really helps to hear the dialogue as you write it and the nuances of different voices can help spur ideas for where a conversation goes next...

[1] See: https://rancidbacon.itch.io/dialogue-tool-for-larynx-text-to... & https://gitlab.com/RancidBacon/larynx-dialogue/-/tree/featur...

[2] Am currently available/open to be though. :D

[3] Will try to upload some example audio of this scene because I found it pretty funny. :)



A shameless plug, my colleagues made a demo where you could create a virtually infinite amount of voices with some sort of control of how they sound like: https://huggingface.co/spaces/Flux9665/ThisSpeakerDoesNotExi...


> 900+ voices

Where can I find all these voices? https://github.com/rhasspy/piper/releases/tag/v0.0.2 lists "only" ~50 files.


The libritts file has 900 plus speakers inside it




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: