Voice Acting, Text to Speech, and AI

Otto26

Inconsistent
Joined
Mar 7, 2006
Posts
1,499
Before you reflexively blow up at me, please read the full post. There are a LOT of questions which don't currently have answers. I do make some decisions which you are, of course, free to vociferously disagree with and I do advocate for a position of limited scope.

I am interested in turning my original works of fiction into audiobooks. I have priced professional services and they would run into the thousands of dollars for my works. I'm going to publish them for free here on Literotica and I'm not spending that kind of money to give my work away. Likewise, I would not expect a voice actor here on Literotica to invest that kind of time into converting my works into audiobooks. I've asked but gotten no responses and don't expect to.

So that leaves me with some sort of text-to-speech (TTS) option. Which is where matters get complicated. Not that TTS is particularly complicated to implement. There are any number of free applications out there that will allow you to implement a TTS solution. You don't even have to install them on your computer, you can choose from a boatload of online tools. However, you quickly discover that the key ingredient in all of this is high quality voices. Most of the free voices are downright robotic. You can tweak a few of the Microsoft voices to make them tolerable, but tolerable is the best you're going to get unless....

Let's talk voice cloning. This is where the discussion gets contentious. Just as you can choose from any number of free TTS tools, you can download free voice cloning tools or access voice cloning services. Some of those services use AI to improve the quality of the cloning, but not all of them. AI is a red herring. What's really happening here is copyright infringement. Except it's not. There are currently no legal protections offered to voice actors for their basic voice. Any piece a voice actor performs is protected by copyright, but that only protects the content of the specific piece of work. If I wanted to take copyrighted audio performances of a voice actor, let's say Scarlett Johansen, and clone her voice, I can legally do that. Before you jump down my throat let me add that this is currently being challenged in court. Scarlett is suing a company which did exactly that. So the legal niceties are being decided in litigation and there is proposed legislation that would close these gaps. Which the big media companies are fighting like hell to prevent, of course.

Some companies are addressing this issue in good faith by signing licensing agreements. James Earl Jones, for example, signed a licensing agreement with ReSpeecher which granted them all rights to using his voice for other projects. Yes, you could get a dubbed porn video with James Earl Jones playing the role of the pizza delivery guy. All this really does, however, is transfer the legal responsibility for suing companies from the actor to the licensing company. There's still no protections. And some companies have been perfectly willing to try to do the right thing and then do the wrong thing when that fails. Reference Ms. Johansen's lawsuit against the company that tried to license her voice and then cloned it when she refused their initial offer.

Which brings us to ethics and morality and practicality. Let's do practicality first. The AI startup that cloned Scarlett's voice made a business decision. They decided that they would make more money cloning her voice and using it than they would lose when the inevitable lawsuit followed. That's where a lot of businesses are landing right now. ElevenLabs, for instance, will let you use the cloned voices of Maya Angelou (who has suddenly realized she can't pay for the pizza), Jerry Garcia, or Deepak Chopra, to name a few. They do this because they know most people can't sue. If you are a voice actor you really need to send a thank you note to Scarlett Johansen.

Which means I wouldn't feel any guilt about taking advantage of these businesses. I could easily download the ElevenLabs free reader for my phone, upload one of my stories in PDF format, and port the output to another device which records the audio and saves it as an MP3. Another variation on this would be to use one of the free reader services online and do the same thing. It took me an hour to do both and I'm not particularly techy. But that would make me complicit.

Which brings us to ethics and morality. What's the right thing to do? Your answer will vary. I decided that I don't want to be complicit in what I view as theft. It may be legal to clone voices, but it isn't right. It might be morally okay to steal from thieves, but it's not legal. Which leaves me with the free sources. Many of which have EULAs or terms of service that prevent commercial use of their voices. I have decided that publishing my free content on a free access site isn't a commercial use. That's as close to the line as I'm willing to go. It means that I'll get some pretty robotic sounding voices reading my stories, but I'm okay with that. I don't want to see voice actors going away. And I don't think they will IN THE NEAR FUTURE. Voice actors bring human emotion and intelligence to their performances that AI can't currently match and is unlikely to match IN THE NEAR FUTURE.

But it's coming. Remember the villainous-ish ElevenLabs from earlier? They are already building a library of TTS audiobooks by allowing authors to use their service for free. I'm not taking advantage of this because of ethics (I don't want to be complicit) and practicality (pretty sure they're avoiding porn). But TTS to audiobook is going to be a real thing and take a huge bite out of the voice acting market that many actors rely upon to make ends meet. Doom. Gloom.

Now, I know of bunch of you are itching to write scathing replies to my decision to use TTS to create an audiobook. Hold that thought and let me really enrage you. That's not my intent, by the way. I'm not trolling you. But what follows is sure to be controversial. What can you do about this? Beyond sending Ms. Johansen a thank you note, I mean. Lean into it. Clone your voice. Establish clear terms of service for using your voice which afford you some degree of legal protection. And start actively working with Literotica authors to convert existing content to audiobooks, which properly credit you and publish your terms of service, using TTS. Protect yourself, enhance your profile, and turn the audio section of Literotica into the most popular section. Which you, as ground floor innovators, would absolutely own. When you do this, please let me know. I'd kill to have some of you reading my works, even if I have to settle for an 80% quality TTS rather than a 100% quality live performance. Am I going to pay you? Nope. Free site, not investing my money, blah, blah, blah, see above. Credit the hell out of you? Oh, yeah.

"But Literotica won't allow AI content!" True. Literotica is dead set against generative AI content. TTS is not generative AI. While Laurel and Manu get the last word on this and their decisions may change based on new information, I have already submitted a TTS story. It was explicitly labeled as a TTS story. And, because it was a test to see if Laurel and Manu would allow it, it was terrible. It was accepted. I asked that it be removed so I could resubmit it with a bunch of changes to improve the quality. TTS is not generative AI. It doesn't create content, it takes existing content and converts it to another form.

If this is food for thought then let the food fight begin. But, please, keep it civil.
 
Last edited:
My fear about AI and the tech you’re using is that being born will make a person a “brand”.

However, you made a key point when you said “voice artist” when it comes to copyright. The voice I use to produce audios for this site is not my consistent every day speaking voice. There is part of it that is a character, a character I created, I voiced, I wrote. For example, Seth McFarlane voices most of the characters on Family Guy. He writes, voices, and presents the product.

I would think that would allow it to fit under copyright law, but unfortunately, we will have to wait on the law to catch up with the technology in this case.

There have also been issues with corporations filing a Trademark on sounds and videos other creators make to gain the financial benefit of the sound etc. There is a lot of fuckery afoot in the online space right now when it comes to those laws.
 
Out of interest, where is the audio you submitted? I started to listen to the one audio file in your profile, thought it was way better than I expected then realized it was from 2008! I had a look through the new audio list and couldn't see anything that looked likely.

I'm conflicted about this. From what I've heard of music and imitators, it's possible to get some very good results with AI, but at the same time, I'd like to keep a very firm distinction between human voices and AI voices, and worry that if it becomes too easy the human voices will get swamped out (fine as long as we can filter).

I'll probably comment on this more once I've heard your audio.
 
Out of interest, where is the audio you submitted? I started to listen to the one audio file in your profile, thought it was way better than I expected then realized it was from 2008! I had a look through the new audio list and couldn't see anything that looked likely.

I'm conflicted about this. From what I've heard of music and imitators, it's possible to get some very good results with AI, but at the same time, I'd like to keep a very firm distinction between human voices and AI voices, and worry that if it becomes too easy the human voices will get swamped out (fine as long as we can filter).

I'll probably comment on this more once I've heard your audio.
You won't hear the audio until two weeks from now, soonest. I haven't resubmitted it yet because I've been mucking about with the various available options. But if you want to hear the available free voices you can use Microsoft Narrator.
 
Update. Never mind. They've taken to rejecting this type of work because they consider text-to-speech to be AI.
 
Before you reflexively blow up at me, please read the full post. There are a LOT of questions which don't currently have answers. I do make some decisions which you are, of course, free to vociferously disagree with and I do advocate for a position of limited scope.

I am interested in turning my original works of fiction into audiobooks. I have priced professional services and they would run into the thousands of dollars for my works. I'm going to publish them for free here on Literotica and I'm not spending that kind of money to give my work away. Likewise, I would not expect a voice actor here on Literotica to invest that kind of time into converting my works into audiobooks. I've asked but gotten no responses and don't expect to.

So that leaves me with some sort of text-to-speech (TTS) option. Which is where matters get complicated. Not that TTS is particularly complicated to implement. There are any number of free applications out there that will allow you to implement a TTS solution. You don't even have to install them on your computer, you can choose from a boatload of online tools. However, you quickly discover that the key ingredient in all of this is high quality voices. Most of the free voices are downright robotic. You can tweak a few of the Microsoft voices to make them tolerable, but tolerable is the best you're going to get unless....

Let's talk voice cloning. This is where the discussion gets contentious. Just as you can choose from any number of free TTS tools, you can download free voice cloning tools or access voice cloning services. Some of those services use AI to improve the quality of the cloning, but not all of them. AI is a red herring. What's really happening here is copyright infringement. Except it's not. There are currently no legal protections offered to voice actors for their basic voice. Any piece a voice actor performs is protected by copyright, but that only protects the content of the specific piece of work. If I wanted to take copyrighted audio performances of a voice actor, let's say Scarlett Johansen, and clone her voice, I can legally do that. Before you jump down my throat let me add that this is currently being challenged in court. Scarlett is suing a company which did exactly that. So the legal niceties are being decided in litigation and there is proposed legislation that would close these gaps. Which the big media companies are fighting like hell to prevent, of course.

Some companies are addressing this issue in good faith by signing licensing agreements. James Earl Jones, for example, signed a licensing agreement with ReSpeecher which granted them all rights to using his voice for other projects. Yes, you could get a dubbed porn video with James Earl Jones playing the role of the pizza delivery guy. All this really does, however, is transfer the legal responsibility for suing companies from the actor to the licensing company. There's still no protections. And some companies have been perfectly willing to try to do the right thing and then do the wrong thing when that fails. Reference Ms. Johansen's lawsuit against the company that tried to license her voice and then cloned it when she refused their initial offer.

Which brings us to ethics and morality and practicality. Let's do practicality first. The AI startup that cloned Scarlett's voice made a business decision. They decided that they would make more money cloning her voice and using it than they would lose when the inevitable lawsuit followed. That's where a lot of businesses are landing right now. ElevenLabs, for instance, will let you use the cloned voices of Maya Angelou (who has suddenly realized she can't pay for the pizza), Jerry Garcia, or Deepak Chopra, to name a few. They do this because they know most people can't sue. If you are a voice actor you really need to send a thank you note to Scarlett Johansen.

Which means I wouldn't feel any guilt about taking advantage of these businesses. I could easily download the ElevenLabs free reader for my phone, upload one of my stories in PDF format, and port the output to another device which records the audio and saves it as an MP3. Another variation on this would be to use one of the free reader services online and do the same thing. It took me an hour to do both and I'm not particularly techy. But that would make me complicit.

Which brings us to ethics and morality. What's the right thing to do? Your answer will vary. I decided that I don't want to be complicit in what I view as theft. It may be legal to clone voices, but it isn't right. It might be morally okay to steal from thieves, but it's not legal. Which leaves me with the free sources. Many of which have EULAs or terms of service that prevent commercial use of their voices. I have decided that publishing my free content on a free access site isn't a commercial use. That's as close to the line as I'm willing to go. It means that I'll get some pretty robotic sounding voices reading my stories, but I'm okay with that. I don't want to see voice actors going away. And I don't think they will IN THE NEAR FUTURE. Voice actors bring human emotion and intelligence to their performances that AI can't currently match and is unlikely to match IN THE NEAR FUTURE.

But it's coming. Remember the villainous-ish ElevenLabs from earlier? They are already building a library of TTS audiobooks by allowing authors to use their service for free. I'm not taking advantage of this because of ethics (I don't want to be complicit) and practicality (pretty sure they're avoiding porn). But TTS to audiobook is going to be a real thing and take a huge bite out of the voice acting market that many actors rely upon to make ends meet. Doom. Gloom.

Now, I know of bunch of you are itching to write scathing replies to my decision to use TTS to create an audiobook. Hold that thought and let me really enrage you. That's not my intent, by the way. I'm not trolling you. But what follows is sure to be controversial. What can you do about this? Beyond sending Ms. Johansen a thank you note, I mean. Lean into it. Clone your voice. Establish clear terms of service for using your voice which afford you some degree of legal protection. And start actively working with Literotica authors to convert existing content to audiobooks, which properly credit you and publish your terms of service, using TTS. Protect yourself, enhance your profile, and turn the audio section of Literotica into the most popular section. Which you, as ground floor innovators, would absolutely own. When you do this, please let me know. I'd kill to have some of you reading my works, even if I have to settle for an 80% quality TTS rather than a 100% quality live performance. Am I going to pay you? Nope. Free site, not investing my money, blah, blah, blah, see above. Credit the hell out of you? Oh, yeah.

"But Literotica won't allow AI content!" True. Literotica is dead set against generative AI content. TTS is not generative AI. While Laurel and Manu get the last word on this and their decisions may change based on new information, I have already submitted a TTS story. It was explicitly labeled as a TTS story. And, because it was a test to see if Laurel and Manu would allow it, it was terrible. It was accepted. I asked that it be removed so I could resubmit it with a bunch of changes to improve the quality. TTS is not generative AI. It doesn't create content, it takes existing content and converts it to another form.

If this is food for thought then let the food fight begin. But, please, keep it civil.
I have had an AI-generated audio story posted here for almost two years as a test of the technology at the time it was created.

When the controversy over AI rejections began here, I mentioned my audio story to Laurel in a PM and told her that I would have no issue with her taking it down. Yet it remains.

I have not posted any additional AI-generated stories here, only because most of what I write would be too long and the files too large if converted to audio. I do post them elsewhere, such as on Audiomack where they can be monetized. Earning a little money for them helps offset the small costs of creating them using the generic voices available on Amazon Polly. Here is a free sample if you're interested in listening.
 
I have had an AI-generated audio story posted here for almost two years as a test of the technology at the time it was created.

When the controversy over AI rejections began here, I mentioned my audio story to Laurel in a PM and told her that I would have no issue with her taking it down. Yet it remains.

I have not posted any additional AI-generated stories here, only because most of what I write would be too long and the files too large if converted to audio. I do post them elsewhere, such as on Audiomack where they can be monetized. Earning a little money for them helps offset the small costs of creating them using the generic voices available on Amazon Polly. Here is a free sample if you're interested in listening.
I don't see that as AI. Text to speech has been around for years. Scientists have tried to achieve this since the 13th century and in the 1950's they finally managed to do it. It's far earlier than AI. The only point at which AI comes into play is in voice generation. And even there it's more a help than a hindrance. You have to work really, really hard to faithfully duplicate a voice with cloning tools. Most AIs, outside of movie studios, will take a sample and generate a voice that has little to do with the sample it was given. I could do it, given the resources, but I don't want to because that would be theft.

And your story is a good example. It's clearly text to speech. I could achieve something a little better with the commercial services, but it would still clearly be recognizable as TTS and would lack all the nuance humans bring to the table. I did enjoy your use to the male and female voice to tell the story. I used the same mechanism in the story I submitted.
 
And your story is a good example. It's clearly text to speech.
That was basically my point in publishing it as an example of simple text to speech technology being used to convert an existing written story. I didn't put hardly any time into manipulating the system used in order to create a better quality product, which can be accomplished with everything from punctuation changes in the text to tonal filters. It takes a lot of time and effort to get AI-generated text to speech outputs to be even moderately passable as human speech.

I took the time when converting my middle-grade adventure story to audio and it turned out well enough to be marketable as an e-book.
 
I've tried -- out of curiosity -- to produce TTS and have yet to find a tool that is convincing in its rendition. At very best, the tools I've found are awful at intonation and emphasis. And, even though with some of them you can set a type of emotion, the emotion still comes out flat. Subtle things, like changing the speed of quoted text to stand out, emphasizing specific words, quoting other characters in an adapted version of the "speaker's" voice -- it still sounds like TTS. Yes, you can change that manually and clip together segments, but even that is a lot of work and is still unconvincing. The uncanny valley is narrowing, but still exists. And, heaven forbid, don't add words from another language, mon amour, or it will butcher the pronunciation like a first-day language student! @BobbyBrant said you use Amazon Poly? I'll try that one see if it's any better. Out of curiosity.

But... at the end of the day, TTS has no heart.

All that said, I support @Laurel on the premise that TTS is "creative" -- how a performer choose to present text in speech is performance art. Honestly... in other threads, I've seen authors bemoan their art being usurped by generative AI. Yet I've also seen authors write text, then use generative AI for their cover art -- or in this case, perhaps, audio. Personally, I believe that our integrity in protecting art for the flesh-and-blood creator applies to our colleagues in all the other creative professions.

Support the soul of the human.
 
Last edited:
Before you reflexively blow up at me, please read the full post. There are a LOT of questions which don't currently have answers. I do make some decisions which you are, of course, free to vociferously disagree with and I do advocate for a position of limited scope.

I am interested in turning my original works of fiction into audiobooks. I have priced professional services and they would run into the thousands of dollars for my works. I'm going to publish them for free here on Literotica and I'm not spending that kind of money to give my work away. Likewise, I would not expect a voice actor here on Literotica to invest that kind of time into converting my works into audiobooks. I've asked but gotten no responses and don't expect to.

So that leaves me with some sort of text-to-speech (TTS) option. Which is where matters get complicated. Not that TTS is particularly complicated to implement. There are any number of free applications out there that will allow you to implement a TTS solution. You don't even have to install them on your computer, you can choose from a boatload of online tools. However, you quickly discover that the key ingredient in all of this is high quality voices. Most of the free voices are downright robotic. You can tweak a few of the Microsoft voices to make them tolerable, but tolerable is the best you're going to get unless....

Let's talk voice cloning. This is where the discussion gets contentious. Just as you can choose from any number of free TTS tools, you can download free voice cloning tools or access voice cloning services. Some of those services use AI to improve the quality of the cloning, but not all of them. AI is a red herring. What's really happening here is copyright infringement. Except it's not. There are currently no legal protections offered to voice actors for their basic voice. Any piece a voice actor performs is protected by copyright, but that only protects the content of the specific piece of work. If I wanted to take copyrighted audio performances of a voice actor, let's say Scarlett Johansen, and clone her voice, I can legally do that. Before you jump down my throat let me add that this is currently being challenged in court. Scarlett is suing a company which did exactly that. So the legal niceties are being decided in litigation and there is proposed legislation that would close these gaps. Which the big media companies are fighting like hell to prevent, of course.

Some companies are addressing this issue in good faith by signing licensing agreements. James Earl Jones, for example, signed a licensing agreement with ReSpeecher which granted them all rights to using his voice for other projects. Yes, you could get a dubbed porn video with James Earl Jones playing the role of the pizza delivery guy. All this really does, however, is transfer the legal responsibility for suing companies from the actor to the licensing company. There's still no protections. And some companies have been perfectly willing to try to do the right thing and then do the wrong thing when that fails. Reference Ms. Johansen's lawsuit against the company that tried to license her voice and then cloned it when she refused their initial offer.

Which brings us to ethics and morality and practicality. Let's do practicality first. The AI startup that cloned Scarlett's voice made a business decision. They decided that they would make more money cloning her voice and using it than they would lose when the inevitable lawsuit followed. That's where a lot of businesses are landing right now. ElevenLabs, for instance, will let you use the cloned voices of Maya Angelou (who has suddenly realized she can't pay for the pizza), Jerry Garcia, or Deepak Chopra, to name a few. They do this because they know most people can't sue. If you are a voice actor you really need to send a thank you note to Scarlett Johansen.

Which means I wouldn't feel any guilt about taking advantage of these businesses. I could easily download the ElevenLabs free reader for my phone, upload one of my stories in PDF format, and port the output to another device which records the audio and saves it as an MP3. Another variation on this would be to use one of the free reader services online and do the same thing. It took me an hour to do both and I'm not particularly techy. But that would make me complicit.

Which brings us to ethics and morality. What's the right thing to do? Your answer will vary. I decided that I don't want to be complicit in what I view as theft. It may be legal to clone voices, but it isn't right. It might be morally okay to steal from thieves, but it's not legal. Which leaves me with the free sources. Many of which have EULAs or terms of service that prevent commercial use of their voices. I have decided that publishing my free content on a free access site isn't a commercial use. That's as close to the line as I'm willing to go. It means that I'll get some pretty robotic sounding voices reading my stories, but I'm okay with that. I don't want to see voice actors going away. And I don't think they will IN THE NEAR FUTURE. Voice actors bring human emotion and intelligence to their performances that AI can't currently match and is unlikely to match IN THE NEAR FUTURE.

But it's coming. Remember the villainous-ish ElevenLabs from earlier? They are already building a library of TTS audiobooks by allowing authors to use their service for free. I'm not taking advantage of this because of ethics (I don't want to be complicit) and practicality (pretty sure they're avoiding porn). But TTS to audiobook is going to be a real thing and take a huge bite out of the voice acting market that many actors rely upon to make ends meet. Doom. Gloom.

Now, I know of bunch of you are itching to write scathing replies to my decision to use TTS to create an audiobook. Hold that thought and let me really enrage you. That's not my intent, by the way. I'm not trolling you. But what follows is sure to be controversial. What can you do about this? Beyond sending Ms. Johansen a thank you note, I mean. Lean into it. Clone your voice. Establish clear terms of service for using your voice which afford you some degree of legal protection. And start actively working with Literotica authors to convert existing content to audiobooks, which properly credit you and publish your terms of service, using TTS. Protect yourself, enhance your profile, and turn the audio section of Literotica into the most popular section. Which you, as ground floor innovators, would absolutely own. When you do this, please let me know. I'd kill to have some of you reading my works, even if I have to settle for an 80% quality TTS rather than a 100% quality live performance. Am I going to pay you? Nope. Free site, not investing my money, blah, blah, blah, see above. Credit the hell out of you? Oh, yeah.

"But Literotica won't allow AI content!" True. Literotica is dead set against generative AI content. TTS is not generative AI. While Laurel and Manu get the last word on this and their decisions may change based on new information, I have already submitted a TTS story. It was explicitly labeled as a TTS story. And, because it was a test to see if Laurel and Manu would allow it, it was terrible. It was accepted. I asked that it be removed so I could resubmit it with a bunch of changes to improve the quality. TTS is not generative AI. It doesn't create content, it takes existing content and converts it to another form.

If this is food for thought then let the food fight begin. But, please, keep it civil.
Love this! Happy to at least try a voice over for you. I’ve a deep tone, but expressive and I’m a smooth reader so it’d be a good exercise for me too. A British accent.
 
Back
Top