Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Launch HN: Lyrebird (YC S17) – Create a digital copy of your voice
154 points by adbrebs on Sept 6, 2017 | hide | past | favorite | 93 comments
Hi HN!

We are the co-founders of Lyrebird (https://lyrebird.ai/) and PhD students in AI at University of Montreal. We are building speech synthesis technologies to improve the way we communicate with computers. Right now, our key innovation is that we can copy the voice of someone else and make it say anything. The tech is still at its early stage but we believe that it is eventually going to make possible a wide range of new applications such as:

- reading loud text messages with the voice of the sender,

- reading audiobooks with the voice of your choice,

- giving a personalized digital voice to people who lost their voice due to a disease,

- allowing video game makers to have more customized dialogs generated on the fly, or avatars of their players,

- allowing movie makers to freeze the voice of their actors so that they can still use it if the actor ages or dies.

Yesterday we launched a beta version of our voice-cloning software: anyone can record one minute of audio and get a digital voice that sounds like them.

We know that many on HN are concerned about potential misuses surrounding these technologies and we share your concern. We write further on our ethical stance on this page: https://lyrebird.ai/ethics/.

Our blogpost about the launch: https://lyrebird.ai/blog/create-your-voice-avatar that features the first video combining generated audio and generated elements of the video.

There was a thread about us on HN when we launched our website four months ago (https://news.ycombinator.com/item?id=14182262) but at that time, no one could test our software yet and we did not really answer any question of the community. So this time we are ready for questions and would love some feedback!



This looks really great, congrats! Forgive me if I missed something, but I was wondering if you could clear up some confusion. From the terms: "Subject to the Biometric Data Agreement, you hereby grant to us a fully paid, royalty-free, perpetual, irrevocable, worldwide, non-exclusive and fully sublicensable right (including any moral rights) and license to use, license, distribute, reproduce, modify, adapt, publicly perform, and publicly display Your Voice, Digital Voice..."

Just to be clear, the license of the voice/digital voice is revoked upon deletion of the recordings? I understand it is subject to the biometric agreement, but the words perpetual and irrevocable still worried me. Thanks!


Yes! This is what our lawyers suggested to protect ourselves.

We delete all the recordings when you click delete, so we can't recreate the voice anymore. However, this is still necessary in case we share some generated sentences in social media or so (like we're doing on twitter now).


> However, this is still necessary in case we share some generated sentences in social media or so (like we're doing on twitter now).

This is something that you should only do with the permission of the user who provided the voice. You don't need generalized permission to do that for every user, and given the nature of the technology, you shouldn't ask for such permission.


From the grandparent comment:

> This is what our lawyers suggested to protect ourselves.

Generally speaking, a lawyer's advice is going to be optimized for maximum protection in possibly unforeseeable circumstances, not for what might actually be needed or even reasonable to request of every user.

Generally speaking, companies aren't going to go out of their way to rein in their lawyer. Most people won't even read that fine print, unfortunately.


I don't believe you.


Sounds amazing! Just to add a usecase - for many people, creating a decent voiceover is one of the big sticking points for producing youtube videos or educational courses. If I could write a script, and have software generate a decent enough voiceover, it would be amazing.

It's not even necessary to copy anyone's voice, as long as there's a selection of the most comprehensible and human-sounding ones.

Then, you could even automatically generate slideshow presentation from a few illustrations and headlines, and that would make "rendering" articles into videos very fast and easy. I'm sure a lot of people would pay for such service.

----

By the way, recently I've encountered Deep Voice 2, a similar research project by baidu:

http://research.baidu.com/deep-voice-2-multi-speaker-neural-...

Results are very impressive.


I made a joke video for work, featuring clips of Sir David Attenborough narrating a fake nature documentary I cut together from video I took at work.

It would have been an order of magnitude better if I could just generate arbitrary phrases in his voice.

(Or maybe not; maybe the constraint made the video better.)


I think the constraint makes the video funnier, take for example dinoflask's edits of Blizzard's Jeff Kaplan

https://www.youtube.com/watch?v=gXTrrTX7YuY


Thank you for sharing that! We had not thought about this specific use case yet. It's quite difficult to figure out which use cases are going to become the most popular.


Sounds like the 100 speakers they used were Irish or Scottish.


+1 for this use case!


While it's good that you have an ethics page: https://lyrebird.ai/ethics/, it only has two ethical guidelines:

* Spread awareness of this technology

* Your digital voice remains yours

I would feel a lot better about this if you also had explicit ethical boundaries, for example disallowing users impersonating someone else, e.g. Donald Trump, Barack Obama. "Your digital voice remains yours" sort of sounds like you won't use/share my digital voice with others, but doesn't directly address whether bad actors can maliciously impersonate someone who hasn't registered with Lyrebird.


To build on that a bit. My bank uses voice recognition as a security measure and to authenticate me when I call in. Throw a bad actor in the mix and that becomes a security issue.


Reminds me of the classic "My voice is my passport. Verify me." Hopefully that is not the only second-factor option they provide?


Ha! Luckily it opt-in/out and they do allow you to keep a pin/code word as a backup.


This is a good point, it's true that we don't mention it explicitly in our ethics statement. But it is in our evaluation agreement:

> (v) create a false identity or otherwise impersonate a third party on or through the Services;

section 3.A: https://lyrebird.ai/terms/evaluation

Unfortunately this is something that we can not enforce automatically.

We ensure that people copy their own voice by asking them to read predefined sentences (we use speech recognition to check that the sentence is indeed corresponding to the text).


Their ethics don't seem to be something they take seriously as the video they use to promote their own site is an impersonation itself.

Seems from right out of the gate, they are breaking their own ethical guidelines as a cheap promotional tactic. If they care that little about themselves and a former president of the United States, what do they care about your likeness.

It also doesn't help that you give them a universal perpetual license to do whatever they want (including selling your likeness for someone else's use) by uploading.

This just seems like a slimy team that put up an ethics page as a CYA.

I'm willing to eat my words if they had Barak Obama's consent to use his digitized voice for this but, it's highly doubtful since there's also the coat and seal of the President of the United States on the flag in the background which would be a massive ethical breach of a former President just to promote a silly little startup.


They have a giant banner at the bottom disclaiming it. I don't think that's unethical.


I'll agree. The ethics here seem very wishy-washy. It's not nearly as well-defined as, say, Twilio's Nine Values[0] or ThoughtWorks's Pillars[1], and those are generic values, not ethics. Hell, Google's "Do No Evil" is better than their two bullet points: 1) raise public awareness, 2) your voice is yours (except these Presidents), ???) Imagine if someone bad invented this first!

[0]: https://www.twilio.com/company/nine-values

[1]: https://martinfowler.com/bliki/ThreePillars.html


> Seems from right out of the gate, they are breaking their own ethical guidelines as a cheap promotional tactic. If they care that little about themselves and a former president of the United States, what do they care about your likeness.

We state in our blogpost that we make an exception for Obama/Trump in order to raise public awareness. Both of them are regularly used in Machine Learning benchmarks (for example [0] [1]). Note that we don't allow users to generate from Trump/Obama's voice.

Once again, we care a lot about these issues and that's why we only allow users to copy their own voice.

[0] http://www.washington.edu/news/2017/07/11/lip-syncing-obama-... [1] https://www.youtube.com/watch?v=ohmajJTcpNk

These issues are challenging and suggestions about how you think the technology should be introduced/regulated are very welcome.


It's still hypocritical and insulting to the reader's intelligence.

You could make Obama say anything. He could say something humourous, something that he's never said before. You would have just as impressive of a demo if you had Obama say "I'm a little teapot short and stout..." and then used overlay text to promote yourself. You chose instead to make a video where he promotes your startup.

That is both hypocritical and immoral and not only using his personal likeness but, also the seat of the Presidency of the United States.

This fast and loose way that Lyrebird treats their technology only makes me think that they don't really think about the massive negative potential of the technology and just want to get scale / profitability as fast as possible.


"I'm using my voice as my password".

Vanguard allows voice authentication (https://investor.vanguard.com/account-conveniences/voice-ver...) - and who knows who else will roll something similar out in the future. Yeah, its really really dumb, but it's happening in production now. I wouldn't use this product if I were you, but honestly you should also not use voice verification/authentication for anything.


Fidelity began verifying voice for telephone customer service a short while back. They recorded me during the call then at the end said they were going to use it to verify for future calls. No way to opt out.


Did they say something to that effect before the call started, or only told you at the end? Or did they just use the "this call may be monitored for quality assurance and training purposes" blanket?


I remember it too vaguely at this point but something was mentioned in the beginning while I was waiting. I feel like it was worded along the lines of a promo or I wouldn't have told the rep. I wasn't interested multiple times. "Verifying is now easier and more secure with voice verification..."


Seems like a really useful piece of technology. As you said, it's got quite a few applications in the gaming, film, medical and messaging industries.

That said, am I the only one imagining this getting abused by people in those fields as well? Seems like a good way to avoid paying voice actors for future work. Just record the minimum 30 recordings, then use this software to create all their future dialogue.

This could lead to some interesting lawsuits over who a character's voice belongs to and whether a company has the right to use someone's voice recordings to get free work done on future projects. Like how during the production of Trail of the Pink Panther, Peter Sellers' widow sued the film's producers and studio over them using clips of him from deleted scenes in earlier films in the movie.


The innovation I'm waiting for is

>> reading audiobooks with the voice of your choice, AND the speed of my choice.


Something similar (albeit not in your own voice, but in a wide range of premed voices) at https://www.narro.co


Yes definitively! This is also something we are working on.


I don't buy the "raising awareness" argument, ethically speaking. To do that, you could release demo files that show the capability without weaponizing it through easy access. It'd be great to increase awareness around our vulnerability to EMP attacks, but we don't need to publish specs and or sell a working prototype to make that case.

This is just one of those areas where the negative implications, I believe, far outweigh the positive ones. Aside from the noble cause of helping the disabled, most of the use cases center around entertainment. As great as that may be, the likely application to fraud and the potential for a catastrophic misuse in matters of war and peace just dwarf any upside.


Will this technology be licensed for redistribution or only for online API use? I ask because in the video game scenario it would be great to have this in a library I could distribute instead of relying on the API to be available at all times.


The first version will only be an online API. I agree with you that we should eventually think about licensing it for offline/embedded redistribution.


Really fun stuff. I noticed that it seems to have problems starting sentences. Especially if I try to start a sentence with "hi,". Interesting nonetheless. This passage seems to be rendered fairly well: https://lyrebird.ai/g/LYoVuaZm

Also, https://lyrebird.ai/g/D3Fw328D


Unfortunately for certain voices our model has difficulties to generate the very beginning of the sentence. We hope to fix this problem soon.

Some other people shared their voices on twitter if you want to compare: https://twitter.com/LyrebirdAi


I got a good giggle out of that first one, thank you haha.


I guess I see a ton of upside here, but I also see that this could easily be abused and possibly a tool to completely destroy someones life. Imagine getting a phone call from your "partner" saying they cheated on you. I dont know how it would be useable(api?) and I do still detect a bit of artificialness to to voice, but as this gets better I worry about the down sides and potential for harm by copying someones voice.


Thank you for raising those concerns. We take those very seriously. You can read more about our ethical stance in this article: https://lyrebird.ai/ethics/

To recap:

- we want to start by raising public awareness about the technology and we did demos with the voices of Trump/Obama for that,

- your digital voice is yours, people can not use it without your authorization.


I just tried to signup with a Hotmail email address and I got this error message: This email cannot be used to create an account. It might be due to your email domain name.

I realize Hotmail isn't the sexiest email provider these days but it's one of the more commonly used. Do you have a list of email domains you allow?


We accept hotmail. It might be because of some special characters. Do you use + ?


Even if they did use '+', that should still definitely be allowed. I immediately get turned off when a service actively disallows a '+' because then I start to wonder why they don't want me to be able to filter their messages in my inbox.

It's the only sane and easy (but obviously not bullet-proof) way of catching spammers out.


Nope. Just letters and numbers. Same with my password.

I tried with my Gmail address and it worked fine. That address has no numbers in it. I used the same password. If you aren't prohibiting Hotmail addresses then it must be the numbers in the email address that are triggering some validation.

Regardless, I have access now. Looking forward to trying your product!


Don't reveal your powerlevel in HN, dude. Now you've not only reduced the search space for your Hotmail password, yoy've clued an attacker in that that's also your Gmail password!


I use Pass [0] to generate unique/random passwords for each site I sign into, I don't use the same password for all sites. I was just describing what I used for this instance (only characters and numbers), not what I do every time. I appreciate your concern though!

[0] https://www.passwordstore.org/


When the demo page was launched it seemed like Lyrebird was going to be an API. Will there still be an api?


Yes definitively, we are starting a private beta at the moment.


awesome! I signed up back then but haven't heard anything since. Is there anything else I can do to try out the beta?


Not yet. We are starting with a few developers/companies only and will expand it progressively.

What would be your use case?


My wife built an app that teaches people (foreigners) how to speak english. Based on the words in their flashcards, we generate dynamic sentences so during practice their flashcards are rarely the same. For example, if I have (happy, sad, run, write) in my backpack, then a sample flashcard would show up as: "When I run, I will be sad".

I see lyrebird api being very helpful in helping my users practice listening skills and add a level of creative fun! If we had 10-20 different voices, the flashcards will be read a little differently each time. Right now (since our flashcards is dynamic), our audio feels very monotone. We would love to help you beta test your API and work something out.


I'd be interested in this too. I volunteer for a charity that produces a weekly talking newspaper for the visually impaired in the UK (where such things are very popular).

Our current production process requires a group consisting of editors, readers and technicians to get together every Friday morning from 7am to record an hour or more of news which is then mastered onto CD, duplicated hundreds of times and mailed out by 11am.

We usually have four readers each week (from a pool of 30 or so) who take it in turns to read the items. Some readers are better than others and sometimes readers don't turn up. Sometimes there are interruptions or disturbances to recording such as another reader in the studio coughing, rustling of papers, etc.

If we had the ability to digitise the voices of our readers it would enable our new (in development) totally digital production and distribution system (podcasts, streaming, etc.) to be produced at any convenient time and to allow our listeners to choose their preferred reader's voice(s).

The studio software side is using FL/OSS software, with Ardour as the digital audio workstation attached to a Delta 1010 digital input system and an Evolution UC33e control surface.

Being able to program the pre-production phase to generate the audio recordings using favourite readers voices which are then fed to the (automated) studio mastering process would give us some amazing functionality and flexibility to produce programmes on-demand with no studio presence required.

The development experience and final package will be documented and published for other talking services to adopt and adapt.


I assume you guys know about VocalID that got an NSF SBiR grant for giving mute people a voice (through similar means) https://www.vocalid.co/


This is incredible - recorded my voice and I'm blown away with the results.

One thing: I found that I was in such a hurry to record that I probably spoke faster than normal. It'd be nice if there was a way to tune a few parameters manually (tempo, pitch, etc).

If I ever lose my voice and have to have a TTS appliance speak for me, I'll be contacting you all to get my voice profile!

EDIT: For those interested, pretty impressive that it figured out the appropriate cadence for this: https://lyrebird.ai/g/v7MpYaUA


Thank you for the feedback!

> It'd be nice if there was a way to tune a few parameters manually (tempo, pitch, etc).

Yes we are currently exploring ways to control the generation: volume, pitch, tempo, speed but also intonation and emotion.


Emotion would be a nice one - my wife's first comment was that it sounded too bored.


This looks awesome. I commented on the original post about how exciting this is for worldbuilding (and creating realistic voices for fictional characters, with all the uses that come there).

Random question: it's said that people think their own voices sound weird when they hear recordings of themselves played back. Do you have a way to measure that phenomenon? Have you seen people complaining about the accuracy when in fact it was just that effect making people sound "weird" (to themselves)?


The reason for the phenomenon is that some large percentage of how you hear your own voice comes from bone conduction. In addition, the higher harmonics of your voice are more directional, which is to say "aimed away from your ears", and tend to be diminished when reflected back to you by the objects around you.

The end result of this is that your own voice, when recorded and played back to you, will generally sound less bassy and more harmonically rich than you expect it to.


Yes, it's actually quite interesting! It's a recurrent observation that we have inside the team with our own voices. Friends of the person usually better appreciate the quality of his/her digital voice. You can also observe these reactions to some extent on twitter: https://twitter.com/LyrebirdAi

Other interesting observation are the sentences that people generate for the first time with their digital voice...


This is only tangentially on topic, but is there an API or some engine that I can feed short sentences into and get high-quality generation back?

I have an RC controller radio that supports voice prompts, and I would like to add some short phrases that are missing, such as "air mode on", "throttle warning", etc.

Is there anything on par in quality with Google's/Siri's voice? Not the Google TTS, but the voice they use in Now.


Amazing - I cant wait to integrate this with our VR product. We previously used Amazon Polly attached to a chatbot: https://twitter.com/Alientrap/status/829032930626383873

First uses that come to mind are players adding themselves to a VR world - or maybe celebrities / public figures.


Congrats on the launch! The tech is amazing

Quick q's (purely out of curiosity):

1) > We are [...] PhD students in AI at University of Montreal

Are you doing the startup on the side/planning on going back to school?

2) I don't recall reading about you guys in articles about YC S17 demo days. What are reasons why some companies might not participate in demo day or remain off-the-record? In your case, you seem to have had a working product long before demo day


Thank you!

1) The research of the PhD and the startup are quite complementary at the end, so we hope we can continue doing both.

2) We didn't do demo day because we raised our seed round just before YC and did not want to raise again.


This is probably going to be great, but I just tested out voice generation with the bare minimum of 30 recordings, and it really fell flat. When I tried playback with an input, all it could produce was a high-pitched buzzing sound and then maybe 1/4 of the words I typed in, which sounded nothing like me.

Perhaps you should increase the minimum from 30 recordings to 100?


Hi! Thanks for testing it! For many voices it works well with only 30 recordings. For some, you need a bit more. It seems that quality of the audio (no background noise, clear and loud voice, lots of intonation) is what matters the most.


I am confused about the functionality. What is that I will be able to do, if I go through recording 30 sentences?


The owners of this service will be able to impersonate you at their whim. You're only populating their database for them.


You will be able to create a digital voice that sounds like you and generate any sentence from it.

And thanks, we are going to update the instructions to make them more clear.


Thanks. Such an explanation on the website would be helpful. BTW, the Trump/Obama tweets do not add value. Using political objects to define a technical service, is a mismatch under the context. It also doesn't help in explaining what this service provides (people wouldn't expect that Trump and Obama have given you consent to use their voice). Just an opinion.


This is just a beta version. In the future, we expect to integrate the tech with some other apps.


This technology didn't work out for me. After spending time and effort in providing what it needs, the results I got back in return, were terrible for the time invested. In any case, good to see such attempt at evolving what is potentially possible in the future.


When I try test my digital voice, after clicking "Generate," I get this error after about 10 seconds:

Something went wrong. Please try again!

I've tried about 5 times.

EDIT: I went to back to the page a few minutes later, and the recordings were all there. So it looks like it works, but is giving a false error message.


Can you refresh and try again? Let me know if it works.


It's working now. Thanks. Very cool.

Small issue: Would be nice if you could delete recordings.


You can!


Sorry, I meant the test recordings, not the originals of my voice.


I have a youtube channel (vimgirl) and before recording I have to write scripts for what I plan to say in the video. The digital voice doesn't seem to be working right now, but when it does it would cut down my screencast production time by at least half.


Cool stuff! Question from your FAQ:

> Q: Will I be able to copy another person's voice?

> A: Yes but only if you have the authorization of the person whose voice is being copied.

Perhaps you can unpack that answer a bit? What's the authorization process?


Sure, good question!

There will be two scenarii:

- you want to use the voice of someone that has a Lyrebird account: he or she has to give you their authorization.

- you want to use the voice of someone who does not have an account. We have specific contracts for that. Say you want to copy the voice of Morgan Freeman, the contract will be between him/her, you and Lyrebird. We will also probably explore alternative ways for that.


Did you get authorization to use the voices of public figures in your promotional materials? If not, how can users be sure that you will not arbitrary use their voice profile for promotional materials or otherwise?


Hey.. how does lyrebird handle accent? I work in education space and due to accent of people in my country, the content doesnt work well with global audience.

are you open for beta? would like to try out your api on education content.


For now, it works better with American English accent but it is still able to adapt to other accents.

Our upcoming versions should be more robust to different accents and we also plan to extend it to other languages.


...make possible a wide range of new applications such as

- hacking voice-controlled interfaces

- generating fake news

FTFY

don't @ me saying "sure any technology can be used for good and bad stop being a ludite" yeah I know that just messing with you


Yes, this is a tricky subject! We have thought a lot about it and we think we are doing the right thing for society.

We write more about it here: https://lyrebird.ai/ethics/


This is exciting I've been following you guys since at least May. How do you plan on getting the voices out of the uncanny valley?


This is going to be very tricky! No clear answer to that, we are putting a lot of effort on research but our progress is quite difficult to predict.


I wonder how it would work using training data from one language in generating voice in another language.


Strangely, the resulting digital voice sounds Irish... but I am not.


Press start recording button and nothing happend. iphone 7 ios 10


Awesome idea. It was just a matter of time!


voice upload is not working :(


Thanks for pointing this out, this was reported by a few others. We are investigating it. For now, just refresh the page and it should work.


We've fixed the bug!


Getting failed upload after clicking validation... Chrome showing this in console: "VM291:1 POST https://lyrebird.ai/my/recordings/ 400 (Bad Request)"


Unfortunately still not working on IOS. Tried with both safari and chrome.


How about Adobe Voice? This seems to share a lot of the same breakthroughs as Adobe Voice.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: