AI Speaker usage guidelines

For now, AI Speakers are only available in English.

  • You use AI Speakers in Descript with any subscription, but the usage changes from plan to plan:
    • Free and Creator plan users use AI Speakers with a vocabulary of 1000-words. This limitation does not apply to stock voices.
    • Pro and Enterprise users have no vocabulary restrictions when using AI Speakers.
  • If you have multiple drives, create or move your project to the Pro drive to access the Pro-level AI Speaker vocabulary.
  • You can generate up to 1,000 voice clips per month. A voice clip is considered to be:
    • Any string of typed characters (250 characters or less) with a pause of 2 seconds or greater between typing (i.e. if you're typing a sentence, but break for a moment to gather your thoughts, scratch your face, or adjust your ascot).
    • Any sentence (250 characters or less) that is separated by sentence-ending punctuation (period, question mark, exclamation point).
    • A string of 250 characters or less that is NOT separated by ending punctuation.
  • For text-to-speech generation, paragraphs can be up to 1800 characters long. Since text-to-speech generation will happen faster for users with shorter paragraphs, it is recommended to break up paragraphs accordingly. 

1,000 voice clips are more than enough for most use cases. If you think your use case will require a higher limit, please contact us at to discuss options.

