Generative image and video models in Descript

Descript lets you choose which AI model powers your generated images and videos. Different models are better at different things: some work faster, others create more polished results, and some handle complex instructions better. This article explains the available generative media models in plain language, so you can quickly decide, “Which model should I use?” without needing to be an AI expert.

This article covers

Availability & usage

Model availability depends on your plan. Some premium models use more AI Credits than others. You can switch models anytime; choosing a new model only affects generations made after the switch.

Where to pick a model

Choose a model for each type of generative media in your App Settings, under the General tab, in the Advanced section.

ModelPicker.jpg

Clicking this link will take you directly to your App Settings in Descript

If you only need to change the model for a single image or video generation and don't want to modify your overall account settings, select a model for the specific generation you are performing while using the associated AI tool. 

Descript's in-tool model picker

Changing the model in this menu does not change your App Settings.

A note about model descriptions

The below model descriptions are our best-faith effort to characterize each model's strengths based on provider documentation and community feedback. Think of these descriptions as general guidance rather than precise specifications.

Image models

Model Overview Provider Model release date
Nano Banana (default) Versatile all-purpose model for complex prompts and precise editing. Balances speed and quality across various tasks. Google Deepmind November 2025
Flux 2 Pro
Balances speed and quality with enhanced text rendering and image restyling capabilities. Suitable for most image generation needs. Black Forest Labs (BFL) November 2025
Qwen Works well for readable text over artistic imagery. Supports Chinese, Japanese, and English with accurate multi-line layouts. Alibaba / Qwen team August 2025
Nano Banana Pro Renders lighting and textures with greater detail for both artistic and photorealistic prompts. Google Deepmind November 2025
Flux Kontext [pro] Maintains visual consistency across edits and generations. Balances quality and cost for projects requiring visual coherence. Black Forest Labs (BFL) May 2025
GPT Image 1 Designed for complex editing with nuanced instructions. OpenAI April 2025
Flux [dev] Fast, budget-friendly model for quick generation. Excels at creative, fantasy-style imagery and concept work. Uses fewer credits for rapid iteration. Black Forest Labs (BFL) August 2024

Video models

Model Overview Provider Model release date
Pixverse v5
(default)
Well-balanced for most video generation needs. Offers smooth motion, stable camera work, and good lighting with balanced speed, quality, and cost. PixVerse AI August 2025
Kling O1 Generates high-resolution, cinematic visuals up to 10 seconds long. Strong for visual quality and production value. Kling AI (Kuaishou) December 2025
Veo 3.1 High-end model with integrated audio. Features enhanced realism, synchronized dialogue and sound effects, and cinematic output. Google Deepmind October 2025
Veo 3.1 [fast] Faster version of Veo 3.1 with similar visual quality. Cost-effective for iteration. Google Deepmind October 2025
Kling Video 2.5 Turbo Pro Optimized for character animation and motion fluidity with fast rendering. Kling AI (Kuaishou) September 2025
Sora 2 Advanced cinematic model with exceptional realism, realistic physics, and synchronized audio. Features consistent multi-shot sequences and accurate physics simulation.

Note: Cannot accept image inputs with human faces, but can generate humans from scratch.
OpenAI September 2025
Hailuo 02 Physics-focused model excelling at realistic movement and character consistency. Handles complex actions with high fidelity. Alibaba June 2025
Wan v2.2 [turbo] Speed-optimized for rapid experimentation and concept validation. Efficient resource usage ideal for early-stage work. ByteDance March 2025

Avatar generation models

Model Overview Provider Model release date
Hedra Character 2 (default) Reliable general‑purpose talking‑head model. Good baseline for most projects where you want stable, straightforward delivery. 480p resolution.

Maximum generation of 12 minutes.
Hedra October 2024
Kling Avatar v2 Balanced 720p avatar model for everyday use. Good mix of realism, motion, and stability for most production work.

Supports additional attitude prompting.
Kling AI (Kuaishou) December 2025
Kling Avatar v2 Pro High‑fidelity 1080p avatar model. Strong model for photorealism and expressive motion.

Supports additional attitude prompting.
Kling AI (Kuaishou) December 2025

FAQ

Which model should I use?

It depends! The table above can help match your goal (speed vs polish vs complexity) to an available model. You can always choose a different model in your App settings and re-create your content to see what works best for you.

Do premium models use more AI Credits?

Often, yes. Higher-end models and longer videos usually consume more credits. If you’re exploring or drafting, try a “fast/lean” model first; switch to a premium model when you’re ready to finalize. Learn more about how to track and understand your Media Minutes and AI Credits.

If I change models, will my previous generations update?

No. Model changes only affect new generations. Existing images/videos stay as-is.

Why don’t I see a specific model?

Availability can vary by plan. If a model isn't available from your App Settings, check your plan details.

I'm seeing an error message: What does that mean?

These errors come directly from our model providers and can occur for a few different reasons. See Troubleshooting AI image and video generation errors.