Amazon Polly

BobbyBrandt

Virgin Wannabe
Joined
Apr 7, 2014
Posts
1,629
I have had several readers ask if I would consider posting my stories in audio versions. Their explanation is that they are visually impaired, and the standard text to speech engines do a poor job with inflections within dialog, and even worse with erotic descriptions and dialog related to sexual activities.

I have tried various "reader" applications in the past, but these are either cost prohibitive or of a similar poor quality as the standard text to speech engines. I recently came across Amazon Polly and am now curious if any other authors have explored this AI as a means to convert text to more human sounding speech.

I did convert the first chapter of one of my stories (Heavy Traffic) as a test, and while the result isn't suitable for commercial purposes, it turned out much better than I expected. Does anyone else have experience with Amazon Polly and would they be willing to share their tips, tricks, or other feedback? For example, did the neural voices work better for you than the standard voices, did the system respect the pauses intended with punctuation such as commas, and did you use a single-voice story-telling approach rather than using different voices for the characters in the story?
 
Playing with Apple's screen reader recently: it pauses at commas but not at paragraph ends or quotation marks, etc. I keep wanting to do a version with commas strategically located to pace the screen reader.
 
Polly is way better than the average, and a tiny smidge better than the upper tiers of Google's neural voices, but they're both still obvious that they're not human in the end. I'd take Polly over most others, any day, but you really don't want to listen to it read anything lengthy.

It makes it's own decisions about whether a comma should be a pause or not, and can still come away with a bit of a breathless pacing. The overall speed of the voice is consistent in a way that humans just... Aren't. Longterm, that's exhausting to listen to.

Polly is great to read a paragraph or two. Any longer than that, and you begin to tire the listener, whether they consciously recognise it or not. You can reduce some of that by varying the voices, combining multiple, but in the end, it isn't something that you can beat.

Source: TTS and STT are essential and daily tools for me.
 
Polly is way better than the average, and a tiny smidge better than the upper tiers of Google's neural voices, but they're both still obvious that they're not human in the end. I'd take Polly over most others, any day, but you really don't want to listen to it read anything lengthy.

It makes it's own decisions about whether a comma should be a pause or not, and can still come away with a bit of a breathless pacing. The overall speed of the voice is consistent in a way that humans just... Aren't. Longterm, that's exhausting to listen to.

Polly is great to read a paragraph or two. Any longer than that, and you begin to tire the listener, whether they consciously recognise it or not. You can reduce some of that by varying the voices, combining multiple, but in the end, it isn't something that you can beat.

Source: TTS and STT are essential and daily tools for me.
As I mentioned, I did use Amazon Polly to convert one chapter (Approximately 5,600 words). It created a 26+ minute MP3 file that is almost 13,000 KB. I sent it to one reader who had requested audio versions of my stories to get his opinion. He thought it was great, and wants to 'hear' the rest of the chapters as soon as I can convert them.

I am going to convert one of my more erotic stories soon to see how Polly handles that genre. I'm not going to post anything on Lit until I get feedback from a few beta listeners, because this audience in general is likely to be less tolerant of longer recorded stories, especially if done in only a 'story teller' format, with a single voice reading it.
 
Natural Reader is pretty good. The Microsoft's text to speech, as accessed via the Edge web browser's Reader Mode is good too. Text to speech is getting better, but still can't match a human narrator. However, if you edit the text to manually insert pauses, inflections and correct mispronuniations, they can be pretty good. Lots of work to get it right, though.

Carefully read the fine print on the usage agreements. Often there are restrictions and royalties required when using these text to speech things for anything other than your own private purposes.
 
Natural Reader is pretty good. The Microsoft's text to speech, as accessed via the Edge web browser's Reader Mode is good too. Text to speech is getting better, but still can't match a human narrator. However, if you edit the text to manually insert pauses, inflections and correct mispronuniations, they can be pretty good. Lots of work to get it right, though.

Carefully read the fine print on the usage agreements. Often there are restrictions and royalties required when using these text to speech things for anything other than your own private purposes.
That's what I use in my review process. The voices on the paid version are better than the free ones, but they still have an absolute non-human quality. The ability to tweak pronunciations phonetically is a boon that has helped me tremendously since I upgraded to the paid version. The whole point of listening to it read to you in review is to catch things that don't sound right, and when it pronounces a character name differently than you intended or mispronounces a common word like "pussy", that hampers the intended purpose. A couple of tweaks in the pronunciation editor, and those speed bumps vanish.

I'd wager that most all of them have non-commercial and even private use clauses.
 
Natural Reader is pretty good. The Microsoft's text to speech, as accessed via the Edge web browser's Reader Mode is good too. Text to speech is getting better, but still can't match a human narrator. However, if you edit the text to manually insert pauses, inflections and correct mispronuniations, they can be pretty good. Lots of work to get it right, though.

Carefully read the fine print on the usage agreements. Often there are restrictions and royalties required when using these text to speech things for anything other than your own private purposes.
I've used both Natural Reader and Microsoft's text to speech applications and found them tedious and poor quality, but suitable for the review of my writing.

So far, I am impressed with the better quality of the results from the free version of Amazon Polly. The voices sound much more natural, and as I play around with how to get the desired pauses and inflections in the speech, the results get even better. I will explore options that might be offered in a paid subscription, such as additional voices and the ability to convert more than 3,000 characters at a time. That might make the investment worthwhile.
 
I predominantly use my tablet or phone to "read text to speech" I use Cool reader app and Aldiko classic app.
I don't know how you would convert the reading to audio file though but they do a fairly good conversion while listening to them.
 
Back
Top