Troubleshooting and using AI speakers

This page will guide you through some of the most common questions and issues while using AI speakers.

This page answers some of the most common questions and issues when using AI Speakers in Descript.

I'm not happy with my AI Speaker — how can I improve it?

If you're not happy with the results, create a new AI Speaker with a fresh recording. There's no way to add more voice samples or training data to an existing AI Speaker.

When re-recording:

  • Choose the tone and style you want — casual and friendly, formal and authoritative, or anything else.
  • Speak clearly and intentionally; your delivery during the consent script directly shapes how your AI Speaker will sound.
  • Use your usual microphone and recording setup.
Can I create different voice styles with my AI Speaker?

Descript doesn't support style variations within a single AI Speaker. However, you can create multiple AI Speakers using the same process described above. For example:

  • Create one AI Speaker with a casual, conversational tone.
  • Create another AI Speaker with a more formal, professional delivery.
  • Create additional AI Speakers with different emotional qualities or speaking styles.

Each AI Speaker will appear as a separate option in your speaker dropdown menu, allowing you to choose the most appropriate voice for different sections of your project.

AI Speaker is mispronouncing a word

Sometimes, an AI Speaker might mispronounce a word or phrase. We've created a guide with tips on getting the pronunciation right.

Why is the AI Speaker generating more words than I select?

That's normal! AI Speakers are designed to generate a little more on either side to give you editing flexibility in case words don't transition perfectly. You can use the trim tool to line things up and a cross-fade to blend them together.

When I use AI Speakers, it inserts black frames into the project

Text-to-speech and Regenerate currently do not work over sequences. When used on a sequence, the video will be removed. We're working on improvements (see more info).

In the meantime, try this workaround:

  1. Convert the AI voice clip into an audio layer.
  2. Select the AI audio clip and cut (Cmd + X or Ctrl + X).
  3. Use the Trim tool to restore the original audio/video in the Script track.
  4. Paste the AI audio clip as a layer above the original audio.
  5. Use the Blade icon Blade tool to split the Script track, then mute the replaced portion in the Layer panel.
  6. Adjust as needed.
Audio isn't generating for the selected text

If you're typing text and no audio generates, try:

  • Ensure speech generation is enabled for your speaker. If not, create a new AI Speaker.
  • Try duplicating the project and check again.
  • Try creating a new project to see if the issue persists.
The AI-generated speech is too fast or too slow

AI speech uses a predefined voice model, so cadence and pacing are consistent. Here are some tips that might help.

AI Speaker doesn't match my accent

The current model is based on US English pronunciation. We're exploring broader accent support in the future — upvote this on our feedback board. In the meantime, try using the Translate feature to create a version of your voice in another language.

Unable to adjust the clip speed of AI-generated speech

To adjust the speed of AI-generated clips, you must first click Convert to audio. Once converted, you can adjust speed using the selection toolbar.

The AI-generated speech has unexpected background noise or artifacts

Unexpected audio artifacts are usually caused by issues in the source media. Try to eliminate:

  • Static or sudden loud sounds
  • Background noise (appliances, traffic, music)
  • Excessive mouth noise or breathing
My AI Speaker isn't pronouncing my (non-English) language correctly

AI Speakers currently support English only. Support for additional languages is in development — share your feedback here.