Supported Video Models

Chat Video Pro supports multiple state-of-the-art AI video generation models, each optimized for different use cases.

Model Comparison

Model
Provider
Duration
Resolution
Audio
Best For

Sora 2

OpenAI

4-12s

720p-1080p

Long clips, cinematic quality

Sora 2 Pro

OpenAI

4-12s

1080p-4K

Highest quality, longer clips

Veo 3.1

Google

4-8s

720p-1080p

Dialogue, speaking characters

Veo 3.1 Fast

Google

4-8s

720p-1080p

Quick iterations with audio

Kling 2.6 Pro

Kuaishou

5-10s

720p

Longer clips with audio

Kling 2.1 Pro

Kuaishou

5-10s

720p

Complex camera movement

Kling O1

Kuaishou

5-10s

720p-1080p

Best prompt understanding

Hailuo 2.3

Hailuo

4-8s

720p-1080p

Action, sports, dynamic scenes

WAN 2.5

WAN

4-8s

1080p-4K

High resolution output

Quick Selection Guide

Choose Sora 2 if:

  • ✅ You need clips longer than 8 seconds (up to 12s)

  • ✅ You want the highest cinematic quality

  • ✅ You don't need audio generation

  • ✅ Budget allows for premium model

Choose Veo 3.1 if:

  • ✅ Your scene has dialogue or speaking characters

  • ✅ You need synchronized audio and lip-sync

  • ✅ You want 8-second clips with audio

  • ✅ You need character consistency (with Reference mode)

Choose Kling 2.6 if:

  • ✅ You need audio generation (5-10 seconds)

  • ✅ You want longer clips than Veo (up to 10s)

  • ✅ You need good quality with audio

Choose Kling 2.1 Pro if:

  • ✅ You need complex camera movement (360°, FPV, orbit)

  • ✅ You want cinematic camera work

  • ✅ You don't need audio

Choose Kling O1 if:

  • ✅ You have a complex or detailed prompt

  • ✅ You want the best prompt understanding

  • ✅ You need reference mode with 2-7 images

Choose Hailuo 2.3 if:

  • ✅ Your scene has action, sports, or stunts

  • ✅ You need dynamic, fast-paced movement

  • ✅ You want smooth motion and fluid animation

Choose WAN 2.5 if:

  • ✅ You need native 1080p or 4K output

  • ✅ Resolution is more important than duration

  • ✅ You want high-quality upscaling source

Models with Audio Generation

Veo 3.1 & Veo 3.1 Fast

Audio Capabilities:

  • ✅ Native audio generation

  • ✅ Excellent lip-sync

  • ✅ Sound effects

  • ✅ Music generation

Best For:

  • Dialogue scenes

  • Speaking characters

  • Music videos

  • Scenes requiring synchronized audio

Limitations:

  • 8-second maximum duration

  • Audio quality varies by prompt

Kling 2.6 Pro

Audio Capabilities:

  • ✅ Native audio generation

  • ✅ Good lip-sync

  • ✅ Sound effects

Best For:

  • Longer clips with audio (up to 10s)

  • Scenes needing audio but not dialogue-critical

Limitations:

  • Audio quality may be less precise than Veo

Models Without Audio

For models without audio generation:

  • Add audio in post-production (Premiere Pro)

  • Use music libraries

  • Record voiceover separately

  • Import existing audio tracks

Specialized Models

Transition Models

Kling O1 Transition

  • Optimized for image-to-image transitions

  • Requires exactly 2 images

  • Creates smooth morphing between frames

Kling 2.5 Transition

  • Alternative transition model

  • Good for frame-to-frame transitions

Veo 3.1 First/Last

  • Transition mode with audio support

  • Can generate sound for transitions

Reference Mode Models

Veo 3.1 Reference

  • Maintains character/product consistency

  • Supports 1-3 reference images

  • 8-second duration (locked)

  • Audio generation supported

Kling O1 Reference

  • Advanced reference mode

  • Supports 2-7 reference images

  • 5s or 10s duration options

  • Uses @Image syntax in prompts

Model-Specific Tips

Sora 2 / Sora 2 Pro

Prompt Tips:

  • Use cinematic language

  • Describe camera movement clearly

  • Mention style and mood

  • Longer prompts work well

Best Practices:

  • Great for establishing shots

  • Excellent for B-roll

  • Use for longer narrative sequences

Veo 3.1

Prompt Tips:

  • Include dialogue in quotes: "Hello, welcome to..."

  • Describe speaking character clearly

  • Mention audio needs: "with background music"

  • Reference mode for character consistency

Best Practices:

  • Perfect for talking head videos

  • Great for product demonstrations with voiceover

  • Use for music video concepts

Kling Models

Prompt Tips:

  • Emphasize camera movement: "360° orbit", "FPV drone shot"

  • Describe complex motion clearly

  • Use cinematic terminology

  • O1 models understand detailed prompts

Best Practices:

  • Kling 2.1 Pro for dynamic camera work

  • Kling O1 for complex scenes

  • Kling 2.6 for audio needs

Hailuo 2.3

Prompt Tips:

  • Focus on action and movement

  • Describe dynamic elements

  • Emphasize motion and flow

  • Great for sports and action scenes

Best Practices:

  • Sports highlights

  • Action sequences

  • Dynamic B-roll

  • Fast-paced content

Cost Considerations

Approximate costs per second of video (varies by model and resolution):

  • Sora 2: ~$0.05-0.15/second

  • Veo 3.1: ~$0.04-0.12/second

  • Kling models: ~$0.03-0.10/second

  • Hailuo 2.3: ~$0.02-0.08/second

  • WAN 2.5: ~$0.04-0.12/second

Note: Check fal.ai/pricingarrow-up-right for current rates.

Switching Models

You can easily switch models:

  1. After generation: Click "Regenerate" and select different model

  2. Before generation: Change model in the model selector

  3. Compare results: Generate same prompt with different models

Troubleshooting

"Model not available"

  • Check your Fal.ai account has sufficient credits

  • Verify the model is supported in your region

  • Some models may have temporary availability issues

"Audio not generating"

  • Ensure you selected a model with audio support (Veo 3.1, Kling 2.6)

  • Check "Audio" toggle is enabled

  • Verify your prompt mentions audio needs

"Duration limit reached"

  • Each model has maximum duration limits

  • Use Sora 2 for longer clips (up to 12s)

  • Consider splitting into multiple clips


Next: Learn about Text-to-Video generation workflows.

Last updated