Generative image and video models in Descript

Descript lets you choose which AI model powers your generated images and videos. Different models are better at different things: some work faster, others create more polished results, and some handle complex instructions better. This article explains the available generative media models in plain language, so you can quickly decide, “Which model should I use?” without needing to be an AI expert.

This article covers

Availability & usage

Model availability depends on your plan. Some premium models use more AI Credits than others. You can switch models anytime; choosing a new model only affects generations made after the switch.

Where to pick a model

Choose a model for each type of generative media in your App Settings, under the General tab, in the Advanced section.

ModelPicker.jpg

Clicking this link will take you directly to your App Settings in Descript

If you only need to change the model for a single image or video generation and don't want to modify your overall account settings, you can select a model for the specific generation you are performing while using the associated AI tool. 

AIToolsModelPicker.png

Changing the model in this menu does not change your App Settings.

Image models at a glance

Model Overview Strengths Trade-offs

Flux 2 Pro (defaut)

Balances speed and quality, with improved text rendering and restlying
  • Enhanced text rendering
  • Improved restyling capabilities
  • High-quality output
Slightly slower generation time than Flux Kontext
Flux Kontext [pro] Well-balanced model that handles most image generation tasks with consistent, high-quality results
  • Context-aware edits
  • Maintains visual consistency
  • Balances speed and quality
Uses more credits than Flux [dev]
Flux [dev] Fast, budget-friendly option that excels at creative and fantasy-style imagery
  • Quick generation
  • Good for fantasy & illustration
  • Low cost
Optimized for speed over fine details
GPT Image 1 Advanced model designed for complex editing tasks and working with multiple image inputs
  • Follows nuanced instructions
  • Text + multi-image inputs (via Underlord)
  • Strong editing capabilities
Prioritizes capability over speed and cost
Qwen Specialized model engineered for text rendering and multilingual content
  • Text in images
  • Chinese, Japanese, English
  • Multi-line layouts
Focused on text rather than artistic imagery
Nano Banana Versatile model that provides reliable results across a wide range of image tasks
  • Speed-to-quality balance
  • Precise image editing
  • Great at complex prompts
Other models may be better at highly specialized tasks
Nano Banana Pro Advanced model offering enhanced performance for creative and technical image tasks
  • Higher fidelity and realism
  • Improved lighting and texture rendering
  • Handles artistic and photorealistic prompts equally well
May be slower than standard Nano Banana

Video models at a glance

Model Overview Strengths Trade-offs
Pixverse v5 (default) Well-balanced model for most video generation needs with reliable quality and performance
  • Smooth motion & stable camera
  • Good lighting & atmosphere
  • Speed–quality–cost balance
Use specialized models for advanced physics or premium cinematic quality
Hailuo 02 Physics-focused model that excels at realistic movement, character consistency, and complex actions
  • Exceptional physics simulation
  • Character consistency
  • Complex movements & acrobatics
Prioritizes realism over speed and cost efficiency
Kling 01 Next-generation cinematic model with extended video length and enhanced visual quality
  • Generates up to 10 seconds
  • Latest cinematic rendering technology
  • High-resolution output
Higher credit usage per second of video
Kling Video 2.5 Turbo Pro Premium model engineered for professional-quality cinematic video with exceptional detail and motion
  • Movie-quality output
  • Exceptional motion fluidity
  • Strong character animation
  • High-resolution 1080p
Longer generation times and higher credit usage
Veo 3.1 Top-tier model combining highest visual quality with integrated audio for complete video production
  • High-fidelity video
  • Synchronized dialogue & sound
  • Professional cinematic output
Designed for final production rather than rapid iteration
Veo 3.1 [fast] Streamlined version of Veo optimized for faster turnaround while maintaining strong visual quality
  • Much faster than Veo
  • Similar visual quality
  • Cost-effective iteration
Some advanced features are simplified for speed
Wan v2.2 [turbo] Speed-optimized model for rapid experimentation and quick concept validation
  • Very fast generation
  • Efficient resource usage
  • Good for concept testing
Optimized for speed and experimentation over final production quality
Sora 2 Advanced cinematic model that delivers exceptional realism with realistic physics, synchronized audio, and consistent multi-shot sequences
  • Realistic human faces and expressions
  • Accurate physics simulation
  • Synchronized dialogue and sound effects
  • Multi-shot consistency
Cannot accept image inputs with human faces (can generate humans from scratch); premium model with higher credit usage

FAQ

Which model should I use?

It depends! The table above can help match your goal (speed vs polish vs complexity) to an available model. You can always choose a different model in your App settings and re-create your content to see what works best for you.

Do premium models use more AI Credits?

Often, yes. Higher-end models and longer videos usually consume more credits. If you’re exploring or drafting, try a “fast/lean” model first; switch to a premium model when you’re ready to finalize. Learn more about how to track and understand your Media Minutes and AI Credits.

If I change models, will my previous generations update?

No. Model changes only affect new generations. Existing images/videos stay as-is.

Why don’t I see a specific model?

Availability can vary by plan. If a model isn't available from your App Settings, check your plan details.

I'm seeing an error message: What does that mean?

These errors come directly from our model providers and can occur for a few different reasons. See Troubleshooting AI image and video generation errors.