For now, Overdub is only available in English.
Descript's Overdub uses AI text-to-speech technology to create an ultra-realistic clone of your voice, so you can type to create audio using your own voice or one of our stock voices. Overdub makes correcting your recordings as simple as editing text. You can even use Overdub to create voice overs or other audio content from scratch.
Creating your custom Overdub voice
Requirements for creating a custom Overdub voice
You will need:
- At least 10 minutes of recorded speech; we recommend 30 to 180 minutes for high quality results.
- To submit the training session for review; the process should take 2-24 hours
- A Voice ID is the consent statement of the person whose voice is being submitted for training. This will be recorded when you submit the voice for training or can be added to the beginning of your training session.
Tips for getting a high quality voice
- Consistency — by far the best way to make your Overdub voice as realistic as possible, and to ensure it blends as closely as possible with the audio in your content, is to use training audio that matches as closely as possible with the audio in the content you’ll be editing.
For best results, import audio from past recording sessions or copy-paste from your existing descript projects. Keep in mind that there is no limit to the number of Overdub voices you can create. You can always create multiple versions of a voice for different contexts (e.g., outdoor, in-studio) and emotional states (e.g., excited, depressed).
- Quality — the higher quality the audio you use for training, the more realistic your Overdub voice. For example, using an external mic and recording in a quiet room will produce a higher quality Overdub voice than using a laptop microphone in a loud coffee shop.
- Quantity — we can create an Overdub voice with as little as 10 minutes of sample audio, but we recommend submitting at least 30 minutes. Please see our reference guide for more information. Remember, you can always add recorded or new audio to a training session and resubmit it for training.
- If you are uploading multiple recordings to your training session, make sure that the audio quality and conditions — your mic, the amount of back noise, the way you speak — are similar between the uploaded files.
Creating a voice from recordings
- Select the Overdub tab on the left side of the Drive View.
- From the Overdub voices window, select Create new voice.
- Name the voice and click Confirm.
- Add at least 10 minutes of sample audio to the training session's composition by:
- Uploading an existing recording — drag and drop a file from your computer onto the Script Editor to add it to your composition
- Recording — record directly into Descript to provide an audio sample. Here’s a sample script you can use if you don’t have reference material.
- Copy and paste the script content from an existing Descript project into your training session
- Once you have added your training audio, click Submit training data in the top right corner of the eidtor.
Record your Voice ID — press record and read the consent statement.
You will want the sample audio and Voice ID to be similar in audio quality. If the two differ significantly, this can result in a rejected Overdub submission.
Stop the recording and click Submit.
Your Overdub voice has now been submitted for training; we’ll email you at the account associated with your Descript account once the voice is ready. Training can take 2-24 hours.
Using Overdub in a composition
1. Assign a voice to a speaker label
Click on a speaker label and select one of the suggested speaker labels; you can choose your Overdub voice (if you’ve created it) and our stock voices.
To further manage your speaker labels and Overdub voices, click a speaker label and select ; this will open the Manage speakers menu where you can edit labels, create new ones, and link Overdub voices.
2. Add Overdub to your composition
Once you’ve assigned an Overdub voice to a speaker label, there are two ways to use Overdub:
- Editing existing audio:
Descript may automatically replace the words surrounding your selection with Overdub audio. This happens when the Overdub audio audio you’re creating blends in with surrounding audio (there is not a clear space between the words). You can always adjust the crossover point in the timeline.
- Generate new audio content: enable the writing tool and start typing. Your Overdub should begin generating after a few moments.
If you do not have an Overdub voice linked to your speaker label, typing in Write mode will generate placeholder text.