Descript's Overdub uses AI text-to-speech technology to create an ultra-realistic clone of your voice, so you can create audio of your own voice just by typing. You can use Overdub to correct your audio without re-recording. You can even use Overdub to create voiceovers or other audio content from scratch.
Overdub is currently only available in English.
Creating your custom Overdub voice
Requirements
You will need:
- At least 10 minutes of recorded speech; we recommend 30 to 180 minutes for high-quality results.
- A Voice ID — this is the consent statement of the person whose voice is being submitted for training (i.e., you). The Voice ID will be recorded when you submit the voice for training; or you can add it to the beginning of your training session.
Creating a Voice from the Drive View
- Select the Voices tab on the left side of the Drive View.
- From the Overdub voices window, select + Create new voice.
- Name the voice and click Confirm.
- Add at least 10 minutes of sample audio to the training session's composition by:
- Uploading an existing recording — drag and drop a file from your computer onto the Script Editor to add it to your composition.
- Recording — record directly into Descript to provide an audio sample. Here’s a sample script you can use if you don’t have reference material.
- Copy and paste the script content from an existing Descript project into your training session.
- Once you have added your training audio, click Submit training data in the top right corner of the editor.
-
Record your Voice ID — press record and read the consent statement.
You will want the sample audio and Voice ID to be similar in quality. If the two differ significantly, your Overdub submission may be rejected.
-
Stop the recording and click Submit.
Your Overdub voice has now been submitted for training; we’ll notify you at the email associated with your Descript account once the voice is ready. Training can take up to 24 hours.
Creating a voice from an existing Descript project
- Open a project and find a composition with at least 10 minutes of sample audio of your voice.
- Click on the speaker label, scroll to the bottom of the list, and select New voice.
- Name the Overdub voice and select Next.
- Select the input device from the dropdown menu.
- Press the red Record button and read the consent statement.
You will want the sample audio and Voice ID to be similar in quality. If the two differ significantly, your Overdub submission may be rejected.
- Stop the recording and click Submit.
Your Overdub voice has now been submitted for training; we’ll notify you at the email associated with your Descript account once the voice is ready. Training can take up to 24 hours.
Tips for getting a high-quality voice
- Consistency — by far the best way to make your Overdub voice as realistic as possible, and to ensure it blends as closely as possible with the audio in your content, is to use training audio that matches as closely as possible with the audio in the content you’ll be editing.
For best results, import audio from past recording sessions or copy-paste from your existing Descript projects. Keep in mind that there is no limit to the number of Overdub voices you can create. You can always create multiple versions of a voice for different contexts (e.g., outdoor, in-studio) and emotional states (e.g., excited, depressed).
- Quality — high-quality source audio leads to more realistic Overdub voices. For example, using an external mic and recording in a quiet room will produce a higher quality Overdub voice than using a laptop microphone in a loud coffee shop.
- Quantity — we can create an Overdub voice with as little as 10 minutes of sample audio, but we recommend submitting at least 30 minutes. Please see our reference guide for more information. Remember, you can always add recorded or new audio to a training session and resubmit it for training.
- If you are uploading multiple recordings to your training session, make sure that the audio quality and conditions — your mic, the amount of background noise, and the way you speak — are similar between the uploaded files.