Let the Robots do the talking — Exploring TTS

Feature image

Speaking has always been a big part of being a lawyer. You use your voice to make submissions in the highest courts of the land. Even in client meetings, you are also using your voice to persuade. Hell, when I write my emails, I imagine saying what I am writing to make sure it is in my voice.

So, thinking about how a synthesized voice can be useful is going to be controversial. You might think that a computer's voice is soulless and not interesting enough to hold on its own against a lawyer. However, with advances led by smart assistants like Google Home and Siri, Text to Speech (TTS) is certainly worth exploring.

Why use robots?

Talking is really convenient, as you would open your mouth and start talking (though some babies will disagree). However, working from home shows how difficult it can be to record and transmit good quality sound. Feedback and distortions are just some problems people regularly face using basic equipment to have online meetings. It's frustrating.

If you think this is an issue that is resolved by having better equipment, it can get expensive very easily. You might notice that several people are involved in producing your favourite podcast. You are going to need all sorts of equipment, like microphones and DAC mixers. Hire Engineers? What does a mixer do, actually?

Furthermore, human performance can be subject to various imperfections. The pitch or tone is not right here. Sometimes you lose concentration or get interrupted in the middle of your speech. All this means you may have to record something several times and hopefully get the delivery you are happy with. If you aren't confident about your English or would like to say something in another language, getting a computer to voice will help overcome it.

So a synthesized voice can be cheap, fast and consistent. If the quality is good enough, you can focus on the script. For me, I am interested in improving the quality of my online training. Explaining stuff doesn't need Leonard Cohen quality delivery. It's probably far less distracting anyway.

Experiments with TTS

I will take two major Text to Speech (TTS) solutions for a spin — Google Cloud and Mozilla's TTS (open source). The Python code used to write these experiments are contained in my Github.

houfu/TTS-experimentsContribute to houfu/TTS-experiments development by creating an account on GitHub.GitHubhoufu

Google Cloud

It's quite easy to try Google Cloud's TTS. A demo allows you to set the text and then convert it with a click of a button. If you want to know how it sounds, try it!

Text-to-Speech: Lifelike Speech Synthesis | Google CloudTurn text into natural-sounding speech in 220+ voices across 40+ languages and variants with an API powered by Google’s machine learning technology.Google Cloud

To generate audio files, you're going to need a Google Cloud account and some programming knowledge. However, it's pretty straightforward, and I mostly copied from the quickstart. You can hear the first two paragraphs of this blog post here.

Here's my personal view of Google Cloud's TTS:

Mozilla's TTS

Using Mozilla's TTS, you get much closer to the machine-training aspects of the text to speech. This includes training your own model, that is, if you have roughly 24 hours of recordings of your voice to spare.

mozilla/TTS:robot: :speech_balloon: Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts) – mozilla/TTSGitHubmozilla

However, for this experiment, we don't need that as we will use pre-trained models. Using python's built-in subprocess module, we can run the command line command that comes with the package. This generates wave files. You can hear the first two paragraphs of this blog post here.

Here's my personal view of Mozilla's TTS:

Conclusion

If you thought robots would replace lawyers in court, this isn't the post to persuade you. However, thinking further, I think some usage cases are certainly worth trying, such as online training courses. In this regard, Google Cloud is production-ready so that you can get the most presentable solutions. Mozilla TTS is open source and definitely far more interesting but needs more time to develop. Do you think there are other ways to use TTS?

#tech #NaturalLanguageProcessing #OpenSource #Programming

Author Portrait Love.Law.Robots. – A blog by Ang Hou Fu