Love.Law.Robots. by Ang Hou Fu

TTS

Feature image

Introduction

You must have worked hard to get here! We are almost at the end now.

Our journey took us from providing the user experience, figuring out what should happen in the background, and interacting with an external service.

In this part, we ask docassemble to provide a file for the user to download.

Provisioning a File

When we left part 2, this was our result screen.

    event: final_screen
    question: |
      Download your sound file here.
    subquestion: |
      The audio on your text has been generated.
      
      You can preview it here too.
      
      <audio controls>
       <source type="audio/mpeg">
       Your browser does not support playing audio.
      </audio>
      
      Press `Back` above if you want to modify the settings and generate a new file,
      or click `Restart` below to begin a new request.
    buttons:
      - Exit: exit
      - Restart: restart

There are two places where you need an audio file.

  1. In the “question”, a link to the file is provided in “here” for download.
  2. The audio preview widget (the thing which you click to play) also needs a link to the file to function.

Luckily for us, docassemble provides a straightforward way to deal with files on the server. Simply stated, create a variable of type DAFile to hold a reference to the file, save the data to the file and then use it for those links in the results screen.

Let’s get started. Add this block to your main.yml file.

    ---
    objects:
      - generated: DAFile
    ---

This block creates an object called “generated”, which is a DAFile. Now your interview code can use “generated”.

Add the new line in the mandatory block we created in Part 2.

    mandatory: True
    code: |
    # The next line is new
      generated.initialize(filename="output.mp3") 
      tts_task
      if tts_task.ready():
        final_screen
      else:
        waiting_screen

This code initialises “generated” by getting the docassemble server to provision it. If you use “generated” before initialising it, docassemble raises an error. 👻 (You will only get this error if you use the DAFile to create a file)

Now your background action needs access to “generated”. Pass it in a keyword parameter in the background action you created in Part 3.

    code: |
      tts_task = background_action(
        'bg_task', 
        text_to_synthesize=text_to_synthesize, 
        voice=voice, 
        speaking_rate=speaking_rate, 
        pitch=pitch,
    # This is the new keyword parameter
        file=generated  
      )

Now that your background action has the file, use it to save the audio content. Add the new lines below to bg_task that you also created in Part 3.

    event: bg_task
    code: |
      audio = get_text_to_speech(
        action_argument('text_to_synthesize'),
        action_argument('voice'),
        action_argument('speaking_rate'),
        action_argument('pitch'),
      )
    # The next three lines are new
      file_output = action_argument('file') 
      file_output.write(audio, binary=True) 
      file_output.commit() 
      background_response()

We assign the file to a new variable in the background task and then use it to write the audio (make sure it is in binary format as MP3s are not text). After that, commit the file to save it in the server or your external storage, depending on your configuration. (The above method are from DAFile. You can read more details about what they do and other methods here.)

Now that the file is ready, we can plunk it into our results screen. We are providing URLs here so that your user can download them from the browser. If you used paths, that would not work because it is the server's file system. Modify the lines in the results screen block.

    event: final_screen
    question: |
    # Modify the next line
      Download your sound file **[here](${generated.url_for(attachment=True)}).** 
    subquestion: |
      The audio on your text has been generated.
      
      You can preview it here too.
      
      <audio controls>
    # Modify the next line
       <source src="${generated.url_for()}" type="audio/mpeg"> 
       Your browser does not support playing audio.
      </audio>
      
      Press `Back` above if you want to modify the settings and generate a new file,
      or click `Restart` below to begin a new request.
    buttons:
      - Exit: exit
      - Restart: restart

To get the URL for a DAFIle, use the url_for method. This lets you have an address you can use for downloading or the web browser.

Conclusion

Congratulations! You are now ready to run the interview. Give it a go and see if you can download the audio of a text you would like spoken. (If you are still at the Playground, you can click “Save and Run” to ensure your work is safe and test it around a bit.)

This Text to Speech docassemble interview is relatively straightforward to me. Nevertheless, its simplicity also showcases several functions which you may want to be familiar with. Hopefully, you now have an idea of dealing with external services. If you manage to hook up something interesting, please share it with the community!

Bonus: Trapping errors and alerting the users

The code so far is enough to provide users with hours of fun (hopefully not at your expense). However, there are edge cases which you should consider if you plan to make your interview more widely available.

Firstly, while it's pretty clear in this tutorial that you should have updated your Configuration so that this interview can find your service account, this doesn't always happen for others. Admins might have overlooked it.

Add this code as the first mandatory code block of main.yml (before the one we wrote in Part 3):

    mandatory: True
    code: |
      if get_config('google') is None or 'tts service account' not in get_config('google'):   
        if get_config('error notification email') is not None:
          send_email(to=get_config('error notification email'), 
            subject='docassemble-Google TTS raised an error', 
            body='You need to set service account credentials in your google configuration.' )
        else:
          log('docassemble-Google TTS raised an error -- You need to set service account credentials in your google configuration.')
          
        message('Error: No service account for Google TTS', 'Please contact your administrator.')

Take note that if you add more than one mandatory block, they are called in the order of their appearance in the interview file. So if you put this after the mandatory code block defining our processes, the process gets called before checking whether we should run this code in the first place.

This code block does a few things. Firstly it checks whether there is a “google” directive or a “tts service account” directive in the “google directive”. If it doesn't find any tts service account information, it checks whether the admin has set an error notification email in the Configuration. If it does, the server will send an email to the admin email to report the issue. If it doesn't, it prints the error on docassemble.log, one of the logs in the server. (If the admin doesn't check his email or logs, I am unsure how we can help the admin.)

This mandatory check before starting the interview is helpful to catch the most obvious error – no configuration. However, you can pass this check by putting nonsense in the “tts service account”. Google is not going to process this. There may be other errors, such as Google being offline.

Narrowing down every possible error will be very challenging. Instead, we will make one crucial check: the code did save a file at the end of the process. Even if we aren't going to be able to tell the user what went wrong, at least we spared the user the confusion of finding out that there was no file to download.

First, let's write the code that makes the check. Add this new code block.

    event: file_check
    code: |
      path = generated.path()
      if not os.path.exists(path):
        if get_config('error notification email') is not None:
          send_email(to=get_config('error notification email'), 
            subject='docassemble-Google TTS raised an error', 
            body='No file was saved in this interview.' )
        else:
          log('docassemble-Google TTS raised an error -- No audio file was saved in this interview.')
        message('Error: No audio file was saved', 'We are not sure why. Please try again. If the problem persists, contact your administrator.')

This code checks whether the audio file (generated, a DAFile) is an actual file or an apparition. If it doesn't exist, the admin receives a message. The user is also alerted to the failure.

We would need to add a need directive to our results screen so that the check is made before the final screen to download the file is shown.

    event: final_screen 
    need:  # Add this line
      - file_check  # Add this line
    question: |
      Download your sound file **[here](${generated.url_for(attachment=True)}).**
    subquestion: |
      The audio on your text has been generated.
      
      You can preview it here too.
      
      <audio controls>
       <source src="${generated.url_for()}" type="audio/mpeg">
       Your browser does not support playing audio.
      </audio>
      
      Press `Back` above if you want to modify the settings and generate a new file,
      or click `Restart` below to begin a new request.
    buttons:
      - Exit: exit
      - Restart: restart

We would also need to import the python os standard library to make the check on our system. Add this new block near the top of our main.yml file.

    imports:
      - os.path

There you have it! The interview checks before you start whether there's a service account. It also checks before showing you the final screen whether your request succeeded and if an audio file is ready to download.

👈🏻 Go to the previous part.

☝🏻Return to the overview of this tutorial.

#tutorial #Python #Programming #docassemble #Google #TTS #LegalTech

Author Portrait Love.Law.Robots. – A blog by Ang Hou Fu

Feature image

Introduction

So far, all our work is on our docassemble install, which has been quite a breeze. Now we come to the most critical part of this tutorial: working with an external service, Google Text to Speech. Different considerations come into play when working with others.

In this part, we will install a client library from Google. We will then configure the setup to interact with Google’s servers and write this code in a separate module, google_tts.py. At the end of this tutorial, your background action will be able to call the function get_text_to_speech and get the audio file from Google.

1. A quick word about APIs

The term “API” can be used loosely nowadays. Some people use it to describe how you can use a program or a software library. In this tutorial, an API refers to a connection between computer programs. Instead of a website or a desktop program, we’re using Python and docassemble to interact with Google Text to Speech. In some cases, like Google Text to Speech, an API is the only way to work with the program.

The two most common ways to work with an API over the internet are (1) using a client library or (2) RESTful APIs. There are pros and cons to working with any one of these options. In this tutorial we are going to go with a client library that Google provided. This allows us to work with the API in a programming language we are familiar with, Python. RESTful APIs can have more support and features than a client language (especially if the programming language is not popular). Still, you’d need to know your way around the requests and similar packages if you want to use them in Python.

2. Install the Client Library in your docassemble

Before we can start using the client library, we need to ensure that it’s there in our docassemble install. Programming in Python can be very challenging because of issues like this:

source: https://imgs.xkcd.com/comics/python_environment.png

Luckily, you will not face this problem if you’re using docker for your docassemble install (which most people do). Do this instead:

  1. Leave the Playground and go to another page called “Package Management”. (If you don’t see this page, you need to be either an admin or a developer)
  2. Under Install or update a package, specify google-cloud-texttospeech as the package to find on PyPI
  3. Click Update, and wait for the screen to show that the install is OK. (This takes time as there are quite a few dependencies to install)
  4. Verify that the google-cloud-texttospeech package has been installed by checking out the list of packages installed.

3. Set up a Text To Speech service account in docassemble

At this point, you should have obtained your Google Cloud Platform service account so that you can access the Text to Speech API. If you haven’t done so, please follow the instructions here. Take note that we will need your key information in JSON format. You don’t need to “Set your authentication environment variable” for this tutorial.

If you have not realised it yet, the key information in JSON format is a secret. While Google’s Text to Speech platform has a generous free tier, the service is not free. So, expect to pay Google if somebody with access to your account tries to read The Lord of the Rings trilogy. In line with best practices, secrets should be kept in a private and secure place, which is not your code repository. Please don’t include your service account details in your playground files!

Luckily, you can store this information in docassemble’s Configuration, which someone can’t access without an admin role and is not generally publicly available. Let’s do that by creating a directive google with a sub-directive of tts service account. Go to your configuration page and add these directives. Then fill out the information in JSON format you received from Google when you set up the service account.

In this example, the lines you will add to the Configuration should look like lines 118 to 131.

4. Putting it all together in the google_tts.py module

Now that our environment is set up, it’s time to create our get_speech_from_text function.

Head back to the Playground, Look for the dropdown titled “Folders”, click it, then select “Modules”.

Look for the editor and rename the file as google_tts.py. This is where you will enter the code to interact with Google Text to Speech. If you recall in part 3, we had left out a function named get_text_to_speech. We were also supposed to feed it with the answers we collected from the interviews we wrote in part 2. Let’s enter the signature of the function now.

    def get_text_to_speech(text_to_synthesize, voice, speaking_rate, pitch):
      //Enter more code here
      return

Since our task is to convert text to speech, we can follow the code in the example provided by Google.

A. Create the Google Text-to-Speech client

Using the Python client library, we can create a client to interact with Google’s service.

We need credentials to use the client to access the service. This is the secret you set up in step 3 above. It’s in docassemble’s configuration, under the directive google with a sub-directive of tts service account. Use docassemble’s get_config to look into your configuration and get the secret tts service account as a JSON.

With the secret to the service account, you can pass it to the class factory function and let it do the work.

    def get_text_to_speech(text_to_synthesize, voice, speaking_rate, pitch):
        from google.cloud import texttospeech
        import json
        from docassemble.base.util import get_config
    
        credential_info = json.loads(get_config('google').get('tts service account'), strict=False)
    
        client = texttospeech.TextToSpeechClient.from_service_account_info(credential_info)

Now that the client is ready with your service account details, let's get some audio.

B. Specify some options and submit the request

The primary function to request Google to turn text into speech is synthesize_speech. The function needs a bunch of stuff — the text to convert, a set of voice options, and options for your audio file. Let’s create some with the answers to the questions in part 2. Add these lines of code to your function.

The text to synthesise:

    input_text = texttospeech.SynthesisInput(text=text_to_synthesize)

The voice options:

    voice = texttospeech.VoiceSelectionParams(
            language_code="en-US",
            name=voice,
        )

The audio options:

    audio_config = texttospeech.AudioConfig(
            audio_encoding=texttospeech.AudioEncoding.MP3,
            speaking_rate=speaking_rate,
            pitch=pitch,
        )

Note that we did not allow all the options to be customised by the user. You can go through the documentation yourself to figure out what options you need or don’t need to worry the user. If you think the user should have more options, you’re free to write your questions and modify the code.

Finally, submit the request and return the audio.

    response = client.synthesize_speech(
            request={"input": input_text, "voice": voice, "audio_config": audio_config}
        )
    
    return response.audio_content

Voila! The client library could call Google using your credentials and get your personalised result.

5. Let’s go back to our interview

Now that you have written your function, it’s time to let our interview know where to find it.

Go back to the playground, and add this new block in your main.yml file.

    ---
    modules:
      - .google_tts
    ---

This block tells the interview that some of our functions (specifically, the get_text_to_speech function) is found in the google_tts module.

Conclusion

At the end of this part, you have written your google_tts.py module and included it in your main.yml. You should also know how to install your python package to docassemble and edit your configuration file.

Well, that leaves us with only one more thing to do. We’ve got our audio content; now we just need to get it to the user. How do we do that? What’s that? DAFile? Find out in the next part.

👉🏻 Go to the final part.

👈🏻 Go back to the previous part.

☝🏻 Check out the overview of this tutorial.

#tutorial #docassemble #LegalTech #Google #TTS #Programming #Python

Author Portrait Love.Law.Robots. – A blog by Ang Hou Fu

Feature image

Introduction

In Part 2, we managed to write a few questions and a result screen. Even with all that eye candy, you will notice that you can’t run this interview. There is no mandatory block, so docassemble does not know what it needs to do to run the interview. In this part, I will talk about the code block required to run the interview, forming the foundation of our next steps.

1. The backbone of this interview

In ordinary docassemble interviews, your endpoint is a template or a form. For this interview, the endpoint is a sound file. So let’s start with this code block. It tells the reader how the interview will run. Since it is a pretty important block, we should put it near the top of the interview file, maybe right under the meta block. (In this tutorial, the order of blocks does not affect how the interview will run. Or at least until you get to the bonus section.)

    mandatory: True
    code: |
      tts_task
      final_screen

If you recall, the user downloads the audio file in the final screen.

So this mandatory code block asks docassemble to “do” the tts_task and then show the final screen. Now we need to define tts_task, and your interview will be ready to run.

So what should tts_task be? The most straightforward answer is that it is the result of the API call to create the sound file. You can write a code block that gets and returns a sound file and assigns it to tts_task.

Well, don’t write that code block yet.

2. Introducing the background action

If you call an API on the other side of the internet, you should know that many things can happen along the way. For example, it takes time for your request to reach Google, then for Google to process it using its fancy robots, send the result back to your server, and then for your server to send it back to the user. In my experience, Google’s and docassemble’s latency is quite good, but it is still noticeable with large requests.

A user is not supposed to notice that a program lags when the interview runs well. If the user realises that the interview is stuck on the same page for too long, the user might get worried that the interview is broken. The truth is that we are waiting for the file to come back. Get back in your chair and wait for it!

To improve user experience, you should have a waiting screen where you tell the user to hold his horses. While this happens, the interview should work in the background. In this manner, your user is assured everything is well while your interview focuses on getting the file back from Google.

docassemble already provides a mechanism for the docassemble program to carry out background actions. It’s aptly called background_action().

Check out a sample of a background action by reading the first example block (”Return a value”) under the Background processes category. Modify our mandatory code block by following the mandatory code block in the example. It should look like this.

    mandatory: True
    code: |
      tts_task
      if tts_task.ready():
        final_screen
      else:
        waiting_screen

So now we tell docassemble to do the tts_task, our background task. Once it is ready, show the final screen. If the task is not ready, show the waiting screen.

3. Define the background action

Now that we have defined the interview flow, it’s time to do the background action. In the spirit of docassemble, we do this by defining tts_task.

The next code block defines the task. Adapt this example into our interview file as follows.

    code: |
      tts_task = background_action(
        'bg_task', 
        text_to_synthesize=text_to_synthesize, 
        voice=voice, 
        speaking_rate=speaking_rate, 
        pitch=pitch,
      )

So we have defined tts_task as a background action. The background action function has two kinds of arguments.

The first positional argument (“bg_task”) is the name of the code block that the background action should execute in the background.

The other keyword arguments are the information you need to pass to this background action like the text_to_synthesize, voice etc. These options you answered earlier during this interview will now be used for this background action. Defining your variables here in a mandatory block indirectly also ensures that docassemble will look for the answers for these variables before performing this code block.

So why do you need to define all the variables in this way? Don’t forget that the background action is a separate process from the rest of your interview so they don’t share the same variables. To enable these processes to share their information, you pass on the variables from the main interview process to the background action.

4. Perform the background action

We have defined the background action. Now let’s code what happens inside the background action.

The background action is defined in an event called bg_task. Now add a new code block as follows:

    event: bg_task
    code: |
      audio = get_text_to_speech(
        action_argument('text_to_synthesize'),
        action_argument('voice'),
        action_argument('speaking_rate'),
        action_argument('pitch'),
      )
      background_response()

So in this code block, we say that the audio is obtained by calling a function named get_text_to_speech. For get_text_to_speech to produce an audio file, it requires the answers to the questions you asked the user earlier. As a background process, it gets access to the variables you defined earlier through the keywords of the background_action function by calling action_argument.

Once get_text_to_speech is completed, we call background_response(). Calling background_response is important for a background action as it tells docassemble that this is the endpoint for the background action. Make sure you don’t leave your background action without it.

5. Provide a waiting screen

Before we leave the example block for background processes, let’s add the question block that tells the user to wait for their audio file. Find the block which defines waiting_screen, and adapt it for your interview as follows.

    event: waiting_screen
    question: |
      Hang tight.
      Google is doing its magic.
    subquestion: |
      This screen will reload every
      few seconds until the file
      is available.
    reload: True

By adding reload: True to the block, you tell docassemble to refresh the screen every 10 seconds. This helps the user to believe that they only need to be patient and some “magic” is going on somewhere.

Conclusion

In the next part of the tutorial, we will dive into get_text_to_speech. (What else, right?) We will need to call Google’s Text-to-Speech API to do this. If you found it easy to follow the code blocks in this part of the tutorial, we will kick this up a notch — the next file we will be working on ends with a “.py”.

👉🏻 Go ahead to the next part

👈🏻 Go to the previous part

👈🏻 Check out the overview of this tutorial.

#tutorial #docassemble #Programming #Python #Google #TTS

Author Portrait Love.Law.Robots. – A blog by Ang Hou Fu

Feature image

Introduction

In Part 1, we talked about what we will do and the things you need to follow in this tutorial. Let’s get our hands wet now!

We are going to get the groundwork done by creating four pages. The first page gets the text to be turned into speech. The second page chooses the voice which Google will use to generate the audio. The third page edits some attributes in the production of the audio. The last page is the results page to download the spoken text.

If you are familiar with docassemble, nothing here is exciting, so you can skip this part. If you’re very new to docassemble, this is a gentle way to introduce you to getting started.

1. All projects begin like this

Log in to your docassemble program and go to the Playground. We will be doing most of the work here.

If you’re working from a clean install, your screen probably looks like this.

The default new project in docassemble's Playground.

  1. Let’s change the name of the interview file from test.yml to main.yml.
  2. Delete all the default blocks/text in the interview file. We are going to replace it with the blocks for this project.

You will have a clean main.yml file at the end.

2. I never Meta an Interview like you

I like to start my interview file with a meta block that tells someone about this interview and who made it.

It’s not easy to remember what a meta block looks like every time. You can use the example blocks in the playground to insert template blocks and modify them.

The example blocks also link to the relevant part of the documentation for easy reference. (It’s the blue “View Documentation” button.)

You should also use the example blocks as much as possible when you’re new to docassemble and writing YAML files. If you keep using those example blocks, you will not forget to separate your blocks with --- and you will minimise errors about indents and lists. After some practice (and lots of mistakes), you should be familiar with the syntax of a YAML file.

So, even though the example blocks section is found below the fold, you should not leave home without it.

You can write anything you like in the meta block as it’s a reference for other users. The field title, for example, is shown as the name of the interview on the “Available Interviews” page.

For this project, this is the meta block I used.

    metadata:
      title: |
        Google TTS Interview
      short title: |
        Have Google read your text
      description: |
        This interview produces a sound file based 
        on the text input by the user and other options.
      revision_date: 2022-05-01

3. Let’s write some questions

This is probably the most visual part of the tutorial, so enjoy it!

An easy way to think about question blocks is that they represent a page in your interview. As long as docassemble can find question blocks that answer all the variables it needs to finish the interview, you can organise and write your question block as you prefer.

So, for example, you can add this text box block which asks you to provide the input text. You can find the example text box block under the Fields category. (Putting no label allows the block to appear as if only one variable is set in this question)

    question: |
      Tell me what text you would like Google to voice.
    fields:
      - no label: text_to_synthesize
        input type: area
      - note: |
          The limit is 5000 characters. (Short paragraphs should be fine)

You can also combine several questions on one page like this question for setting the audio options. Using the range slider example block under the Fields category, you can build this block.

    question: |
      Modify the way Google speaks your text.
    subquestion: |
      You can skip this if you don't need any modifications.
    fields:
      - Speaking pitch: pitch
        datatype: range
        min: -20.0
        max: 20.0
        default: 0
      - note: |
          20 means increase 20 semitones from the original pitch. 
          -20 means decrease 20 semitones from the original pitch. 
      - Speaking rate/speed: speaking_rate
        datatype: range
        min: 0.25
        max: 4.0
        default: 1.0
        step: 0.1
      - note: |
          1.0 is the normal native speed supported by the specific voice. 
          2.0 is twice as fast, and 0.5 is half as fast.

Notice that I have set constraints and defaults in this block based on the documentation of the various options. This will help the user avoid pesky and demoralising error messages from the external API by entering unacceptable values.

A common question for a newcomer is how should present a question to a user? You can use a list of choices like the one below. (Build this question using the Radio buttons example block under the Multiple Choice category.)

    question: |
      Choose the voice that Google will use.
    field: voice
    default: en-US-Wavenet-A
    choices:
      - en-US-Wavenet-A
      - en-US-Wavenet-B
      - en-US-Wavenet-C
      - en-US-Wavenet-D
      - en-US-Wavenet-E
      - en-US-Wavenet-F
      - en-US-Wavenet-G
      - en-US-Wavenet-H
      - en-US-Wavenet-I
      - en-US-Wavenet-J
    under: |
      You can preview the voices [here](<https://cloud.google.com/text-to-speech/docs/voices>).

An interesting side question: When do I use a slider or a text entry box?

It depends on the kind of information you want. If you input numbers, the field's datatype should be a number. If you’re making a choice, a list of options works better.

Honestly, it takes some experience to figure out what works best. Think about all the online forms you have experienced and what you liked or did not like. To gain experience quickly, you can experiment by trying different fields in docassemble and asking yourself whether it gets the job done.

4. The Result Screen

Now that you have asked all your questions, it’s time to give your user the answer.

The result screen is shown when Google’s API has processed the user’s request and sent over the mp3 file containing the synthesised speech. In the result screen, you will be able to download the file. It’s also helpful to allow the user to preview the sound file so that the user can go back and modify any options.

    event: final_screen
    question: |
      Download your sound file here.
    subquestion: |
      The audio on your text has been generated.
      
      You can preview it here too.
      
      <audio controls>
       <source type="audio/mpeg">
       Your browser does not support playing audio.
      </audio>
      
      Press `Back` above if you want to modify the settings and generate a new file,
      or click `Restart` below to begin a new request.
    buttons:
      - Exit: exit
      - Restart: restart

Note: This image shows the completed file with links on how to download it. The reference question block above does not contain any links.

You would notice that I used an audio HTML tag in the subquestion to provide my media previewer. Take note that you can use HTML tags in your markdown text if docassemble does not have an option that meets your needs. However, your HTML hack might vary since this is based on the browser, so try to test as much as possible and avoid complex HTML.

Preview: Let’s do some actual coding

If you followed this tutorial carefully, your main.yml will have a meta block, 3 question blocks and one results screen.

There are a few problems now:

  • You cannot run the interview. The main reason is that there’s no “mandatory” block, so docassemble does not know what it needs to execute to finish the job.
  • The results screen does not contain a link to download or a media to preview.
  • We haven’t even asked Google to provide us with a sound file.

In the next part, we will go through the overall logic of the interview and do some actual coding. Once you are ready, head on over there!

👉🏻 Head to the next part.

👈🏻 Go back to the previous part.

☝🏻 Check out the overview of this tutorial.

#tutorial #docassemble #TTS #Google #Python #Programming

Author Portrait Love.Law.Robots. – A blog by Ang Hou Fu

Feature image

Most people associate docassemble with assembling documents using guided interviews. That’s in the name, right? The program asks a few questions and out pops a completed form, contract or document. However, the documentation makes it quite clear that docassemble can do more:

Though the name emphasizes the document assembly feature, docassemble interviews do not need to assemble a document; they might submit an application, direct the user to other resources on the internet, store user input, interact with APIs, or simply provide the user with information.

In this post, let’s demonstrate how to use docassemble to call an API, get a response and provide it to a user. You can check out the completed code on Github (NB: the git branch I would recommend for following this post is blog. I am actively using this package, so I may add new features to the main branch that I don’t discuss here.)

Problem Statement

I do a lot of internal training on various legal and compliance topics. I think I am a pretty all right speaker, but I have my limitations — I can’t give presentations 24/7, and my performance varies in a particular session. Wouldn’t it be nice if I could give a presentation at any time engagingly and consistently?

I could record my voice, but I did not like the result.

I decided to use a text-to-speech program instead, like the one provided by Google Cloud Platform. I created a computerised version of my speech in the presentation. My audience welcomed this version as it was more engaging than a plain PowerPoint presentation. Staff whose first language was not (Singapore) English also found the voice clear and understandable.

The original code was terminal based. I detailed my early exploits in this blog post last year. The script was great for developing something fast. However, as more of my colleagues became interested in incorporating such speech in their presentations, I needed something more user-friendly.

I already have a docassemble installation at work, so it appears convenient to work on that. The program would have to do the following:

  • Ask the user what text it wants to transform into speech
  • Allow the user to modify some properties of the speech (speed, pitch etc.)
  • Call Google TTS API, grab the sound file and provide it to the user to download

Assumptions

To follow this tutorial, you will need the following:

  • A working docassemble install. You can start up an instance on your laptop by following these instructions.
  • A Google Cloud Platform (GCP) account with a service account enabled for Google TTS. You can follow Google’s instructions here to set one up.
  • Use the Playground provided in docassemble. If you'd like to use an IDE, you can, but I wouldn’t be providing instructions like creating files to follow a docassemble package's directory structure.
  • Some basic knowledge about docassemble. I wouldn’t be going through in detail how to write a block. If you can follow the Hello World example, you should have sufficient knowledge to follow this tutorial.

A Roadmap of this Tutorial

In the next part of this post, I talk about the thinking behind creating this interview and how I got the necessary information (off the web) to make it.

In Part 2, we get the groundwork done by creating four pages. This provides us with a visual idea of what happens in this interview.

In Part 3, I talk about docassemble's background action and why we should use it for this interview. Merging the visual requirements with code gives us a clearer picture of what we need to write.

In Part 4, we work with an external API by using a client library for Python. We install this client library in our docassemble's python environment and write a python module.

In Part 5, we finish the interview by coding the end product: an audio file in the guise of a DAFile. You can run the interview and get your text transformed into speech now! I also give some ideas of what else you might want to do in the project.

Part 1: Familiarise yourself with the requirements

To write a docassemble interview, it makes sense to develop it backwards. In a simple case, you would like docassemble to fill in a form. So you would get a form, figure out its requirements, and then write questions for each requirement.

An API call is not a contract or a form, but your process is the same.

Based on Google’s quickstart, this is the method in the Python library which synthesises speech.

    # Set the text input to be synthesized
        synthesis_input = texttospeech.SynthesisInput(text="Hello, World!")
    
    # Build the voice request, select the language code ("en-US") and the ssml
    # voice gender ("neutral")
        voice = texttospeech.VoiceSelectionParams(
            language_code="en-US", 
            ssml_gender=texttospeech.SsmlVoiceGender.NEUTRAL
        )
    
    # Select the type of audio file you want returned
        audio_config = texttospeech.AudioConfig(
            audio_encoding=texttospeech.AudioEncoding.MP3
        )
    
    # Perform the text-to-speech request on the text input with the selected
    # voice parameters and audio file type
        response = client.synthesize_speech(
            input=synthesis_input, voice=voice, audio_config=audio_config
        )

From this example code, you need to provide the program with the input text (synthesis input), the voice, and audio configuration options to synthesise speech.

That looks pretty straightforward, so you might be tempted to dive into it immediately.

However, I would recommend going through the documents provided online.

  • docassemble provides some of the most helpful documentation, great for varying proficiency levels.
  • Google’s Text To Speech’s documentation is more typical of a product offered by a big tech company. Demos, use cases and guides help you get started quickly. You’re going to have to dig deep to find the one for Python. It receives less love than the other programming languages.

Reading the documentation, especially if you want to use a third-party service, is vital to know what’s available and how to exploit it fully. For example, going through the docs is the best way to find out what docassemble is capable of and learn about existing features — such as transforming a python list of strings into a human-readable list complete with an “and”.

You don’t have to follow the quickstart if it does not meet your use case. Going through the documentation, I figured out that I wanted to give the user a choice of which voice to use rather than letting Google select that for me. Furthermore, audio options like how fast a speaker is will be handy since non-native listeners may appreciate slower speaking. Also, I don’t think I need the user to select a specific file format as mp3s should be fine.

Let’s move on!

This was a pretty short one. I hope I got you curious and excited about what comes next. Continue to the next part, where we get started on a project!

👉🏻 Head to the next part of this tutorial!

#tutorial #docassemble #Python #Programming #TTS #Google

Author Portrait Love.Law.Robots. – A blog by Ang Hou Fu