Looking for help with Descript Classic? Classic Help Center

Creating a custom Overdub voice

Descript's Overdub uses AI text-to-speech technology to create an ultra-realistic clone of your voice, so you can type to create audio using your own voice or one of our stock voices. Overdub makes correcting your recordings as simple as editing text. You can even use Overdub to create voice overs or other audio content from scratch.

For now, Overdub is only available in English.

Creating your custom Overdub voice

Requirements for creating a custom Overdub voice

You will need:

  • At least 10 minutes of recorded speech; we recommend 30 to 180 minutes for high quality results.
  • To submit the training session for review; the process should take 2-24 hours
  • A Voice ID is the consent statement of the person whose voice is being submitted for training. This will be recorded when you submit the voice for training or can be added to the beginning of your training session.


Tips for getting a high quality voice

  • Consistency — by far the best way to make your Overdub voice as realistic as possible, and to ensure it blends as closely as possible with the audio in your content, is to use training audio that matches as closely as possible with the audio in the content you’ll be editing.

For best results, import audio from past recording sessions or copy-paste from your existing descript projects. Keep in mind that there is no limit to the number of Overdub voices you can create. You can always create multiple versions of a voice for different contexts (e.g., outdoor, in-studio) and emotional states (e.g., excited, depressed).

  • Quality — the higher quality the audio you use for training, the more realistic your Overdub voice. For example, using an external mic and recording in a quiet room will produce a higher quality Overdub voice than using a laptop microphone in a loud coffee shop.
  • Quantity — we can create an Overdub voice with as little as 10 minutes of sample audio, but we recommend submitting at least 30 minutes. Please see our reference guide for more information. Remember, you can always add recorded or new audio to a training session and resubmit it for training.
  • If you are uploading multiple recordings to your training session, make sure that the audio quality and conditions — your mic, the amount of back noise, the way you speak — are similar between the uploaded files.

Creating a voice from recordings

  1. Select the Voices tab on the left side of the Drive View.
  2. From the Voices window, select + Create new voice.
  3. Name the voice and click Confirm.


  4. Add at least 10 minutes of sample audio to the training session's composition by:
    • Uploading an existing recording — drag and drop a file from your computer onto the Script Editor to add it to your composition
    • Recording  — record directly into Descript to provide an audio sample. Here’s a sample script you can use if you don’t have reference material.
    • Copy  and paste the script content from an existing Descript project into your training session
  5. Once you have added your training audio, click Submit training data in the top right corner of the eidtor.
  6. Record your Voice ID — press record and read the consent statement.

    You will want the sample audio and Voice ID to be similar in audio quality. If the two differ significantly, this can result in a rejected Overdub submission.

  7. Stop the recording and click Submit.

Your Overdub voice has now been submitted for training; we’ll email you at the account associated with your Descript account once the voice is ready. Training can take 2-24 hours.

Was this page helpful?
0 out of 1 found this helpful