Generating text-to-speech

This article will show you how to use an AI Speaker to generate text-to-speech in your project. The following is a list of supported text-to-speech languages: 

Available text-to-speech languages
English (US) Finnish Romanian Slovak
Croatian French (FR) Malay Turkish
Czech German Norwegian Spanish (US)
Danish Hungarian Polish Swedish
Dutch Italian Brazilian Portuguese  
Before getting started

You can start writing your text at any time, but to generate the AI voice's audio, keep in mind:

  • You need to be connected to the internet.
  • You will first need to create and select an AI speaker, use an existing one, or choose a stock voice.
  • If creating a custom AI speaker, please note AI Speaker consent statements must be recorded in English. Even if generating non-English text-to-speech audio, the AI Speaker consent statement must be recorded in English.
  • For optimal performance, keep paragraphs under 1800 characters. Shorter text generates speech faster, so breaking up longer sections is recommended. Learn more at AI Speaker usage guidelines.

How to generate text-to-speech

  1. Open or create a project.
  2. Start writing your script in Write mode. Simply start typing on any blank line in your project or click Start writing. You'll know Write mode is enabled when a blue border is around your script and "Write mode" is indicated at the top of the script editor. All text written in Write mode will be blue.
  3. Click Done writing to exit the text editing mode. You can enable this again later on to make corrections to your generated audio as long as it hasn't been converted to audio.
  4. Click Add speaker and choose an existing AI Speaker or an AI stock voice.
    2024-11-04_09-52-32.gif
  5. And that’s it! You will see a loading icon next to the text, and once it is fully generated, you will see a green check mark.

Need to generate AI speech for just a specific section of your text?

If you want to generate audio for only part of your transcript, you can select the specific text and press the @ key to assign a speaker. This will create text-to-speech audio just for that selected portion, leaving the rest untouched.

CleanShot 2024-10-29 at 09.45.16.gif

Converting AI Speech to Editable Audio

After generating text-to-speech using an AI Speaker, you might want to edit the audio further. Converting AI-generated speech into an audio layer allows you to:

  • Edit the audio in the Timeline.
  • Apply effects, trims, or crossfades.
  • Gain more precise control over playback and alignment.

For detailed instructions on converting AI speech to audio, refer to our guide: Converting AI Speech to Editable Audio in Descript.

How text-to-speech usage is calculated

Text-to-speech usage is calculated based on how many minutes of audio you have generated during your current monthly cycle. You can check to see how many minutes of text-to-speech you have left, as well as when your limit resets, by following our guide: Usage Monitoring.

Use Translate for bilingual audio, accents, or a completely different voice

Descript’s Translate feature lets you dub existing audio into another language while either preserving the original speaker’s voice or replacing it with a completely different one. This is the best option if you need a transcription in multiple languages, want to add an accent, or swap your voice for another.

  • Although Descript can not yet do bilingual audio or transcripts, you can use Translate to generate dubbed audio, get translated captions as well as transcripts in other languages.
  • If you want to apply an accent, Translate can create a version of your voice in another language while maintaining its unique characteristics.
  • If you need a completely different voice, you can use one of Descript's AI Stock speakers. Descript also offers Recommended voices (available only to Business and Enterprise users) within Translate. These voices are exclusively for dubbing and cannot be used for text-to-speech from scratch. They are specifically designed to sound more fluent and natural in their respective languages.