Supported Video Models

Chat Video Pro supports multiple state-of-the-art AI video generation models, each optimized for different use cases.

Model Comparison

Model
Provider
Duration
Resolution
Audio
Best For

Sora 2

OpenAI

4-12s

720p

Long clips, cinematic quality

Sora 2 Pro

OpenAI

4-12s

720p-1080p

Highest quality, longer clips

Veo 3.1

Google

4-8s

720p-4K

Dialogue, speaking characters

Veo 3.1 Fast

Google

4-8s

720p-4K

Quick iterations with audio

Grok

xAI

4-15s

480p-720p

Fast generation, unique aspect ratios

Kling 3.0 Pro

Kuaishou

3-15s

720p

Highest V3 quality, element support

Kling 3.0 Standard

Kuaishou

3-15s

720p

Fast V3 generation with audio

Kling O3 Pro

Kuaishou

3-15s

720p

Advanced motion, scene understanding

Kling O3 Standard

Kuaishou

3-15s

720p

Fast O3 generation with audio

Hailuo 2.3

Hailuo

4-8s

720p-1080p

Action, sports, dynamic scenes

WAN 2.5

WAN

5-10s

720p-1080p

High resolution output

WAN 2.6

WAN

5-15s

720p-1080p

Longer clips, new aspect ratios

Quick Selection Guide

Choose Sora 2 if:

  • ✅ You need clips longer than 8 seconds (up to 12s)

  • ✅ You want the highest cinematic quality

  • ✅ You don't need audio generation

  • ✅ Budget allows for premium model

Choose Veo 3.1 if:

  • ✅ Your scene has dialogue or speaking characters

  • ✅ You need synchronized audio and lip-sync

  • ✅ You want an 8-second clip audio

  • ✅ You need character consistency (with Reference mode)

Choose Kling 3.0 Pro if:

  • ✅ You need the highest V3 quality with audio support (3-15s)

  • ✅ You want element support for character consistency

  • ✅ You need a universal model (T2V, I2V, and Transition in one)

Choose Kling 3.0 Standard if:

  • ✅ You need fast V3 generation with good quality and audio

  • ✅ You want cinematic camera work at lower cost

  • ✅ You need quick iterations

Choose Kling O3 Pro if:

  • ✅ You need the best O3 quality with advanced motion understanding

  • ✅ You want clips from 3-15 seconds with audio

  • ✅ You need superior scene understanding and motion quality

Choose Kling O3 Standard if:

  • ✅ You need fast O3 generation with good quality and audio

  • ✅ You want advanced motion at lower cost

  • ✅ You need quick iterations with O3 quality

Choose Kling O3 Reference if:

  • ✅ You need character consistency with reference images

  • ✅ You want up to 7 reference images with 3-15s duration

  • ✅ You need reference mode with audio support

Choose Hailuo 2.3 if:

  • ✅ Your scene has action, sports, or stunts

  • ✅ You need dynamic, fast-paced movement

  • ✅ You want smooth motion and fluid animation

Choose WAN 2.5 if:

  • ✅ You need native 1080p output

  • ✅ Resolution is more important than duration

  • ✅ You want high-quality upscaling source

Choose WAN 2.6 if:

  • ✅ You need clips up to 15 seconds (longest duration option)

  • ✅ You want 4:3 or 3:4 aspect ratios (unique to WAN 2.6)

  • ✅ You need longer clips than standard models

Choose Grok if:

  • ✅ You need fast video generation (quick turnaround)

  • ✅ You want unique mobile aspect ratios (19.5:9, 9:19.5, 20:9, 9:20)

  • ✅ You need panoramic formats (2:1, 1:2)

  • ✅ You're working on mobile-first content

  • ✅ Speed is more important than maximum resolution

Models with Audio Generation

Veo 3.1 & Veo 3.1 Fast

Audio Capabilities:

  • ✅ Native audio generation

  • ✅ Excellent lip-sync

  • ✅ Sound effects

  • ✅ Music generation

Best For:

  • Dialogue scenes

  • Speaking characters

  • Music videos

  • Scenes requiring synchronized audio

Limitations:

  • 8-second maximum duration

  • Audio quality varies by prompt

Kling 3.0 & O3 (All Models)

Audio Capabilities:

  • ✅ Native audio generation (all Kling 3.0 and O3 models)

  • ✅ Good lip-sync

  • ✅ Sound effects

Best For:

  • Longer clips with audio (3-15 seconds)

  • Scenes needing audio with flexible duration

  • Includes V3 Pro/Standard, O3 Pro/Standard, O3 Transition, and O3 Reference

Limitations:

  • Audio quality may be less precise than Veo for dialogue

Models Without Audio

For models without audio generation:

  • Add audio in post-production (Premiere Pro)

  • Use music libraries

  • Record voiceover separately

  • Import existing audio tracks

Specialized Models

Video-to-Video Models

VEO Extend (+7s)

  • Extends any video by 7 seconds

  • Maximum total duration: 30 seconds

  • Input video must be 23 seconds or less

  • Output resolution: 720p (fixed)

  • No text prompt required (optional)

  • Best for: Extending existing videos seamlessly

Transition Models

Kling O3 Transition (Pro/Standard)

  • Dedicated transition model with superior morphing quality

  • Attach exactly 2 images for start and end frames

  • 3-15 second duration with audio support

  • Pro/Standard quality toggle

  • Best quality transitions with character consistency

Kling 3.0 Pro (Transition Mode)

  • Attach 2 images for automatic transition mode

  • 3-15 second duration with audio support

  • Universal model (also works for T2V and I2V)

Kling 3.0 Standard (Transition Mode)

  • Same as Pro but faster generation

  • 3-15 second duration with audio support

  • Good for quick iterations

Veo 3.1 First/Last

  • Transition mode with audio support

  • 4-8 second duration

  • Can generate sound for transitions

Note: Kling 3.0 models are universal - they work for T2V (0 images), I2V (1 image), and Transition (2 images). Kling O3 Transition is a dedicated transition model that appears when you attach 2 images.

Reference Mode Models

Veo 3.1 Reference

  • Maintains character/product consistency

  • Supports 2-4 reference images

  • 8-second duration (locked)

  • Aspect ratio: 16:9 or 9:16 only (auto-determined from images)

  • Audio generation supported

  • Resolution: 720p-1080p

Kling O3 Reference (Pro/Standard)

  • Advanced reference mode with latest Kling technology

  • Supports 1-7 reference images

  • 3-15 second duration (flexible, not fixed)

  • Pro/Standard quality toggle

  • Audio generation supported

  • Also supports transition mode when given start and end frames

  • Uses @Element syntax for character references

  • Improved character consistency over O1

Video-to-Video Models

Kling O3 VFX (Pro/Standard)

  • AI-powered VFX effects and character replacement

  • Input: 1 video (3-10 seconds, max 200MB) + 0-4 optional reference images

  • Pro/Standard quality toggle

  • Can keep original audio

  • Best for: VFX effects, character replacement, video transformations

Kling O3 Multi-Cam

  • Generate new camera angles and shots from reference video

  • Input: 1 video (3-10 seconds, max 200MB)

  • 3-15 second output duration

  • Aspect ratio: auto, 16:9, 9:16, 1:1

  • Can keep original audio

  • Best for: Reverse angles, wide shots, alternative perspectives

Kling Motion Control

  • Transfer movements from a reference video to a character image

  • Input: 1 video (1-30 seconds, max 200MB) + 1 character image (required)

  • Best for: Motion transfer, character animation from reference

Model-Specific Tips

Sora 2 / Sora 2 Pro

Prompt Tips:

  • Use cinematic language

  • Describe camera movement clearly

  • Mention style and mood

  • Longer prompts work well

Best Practices:

  • Great for establishing shots

  • Excellent for B-roll

  • Use for longer narrative sequences

Veo 3.1

Prompt Tips:

  • Include dialogue in quotes: "Hello, welcome to..."

  • Describe the speaking character clearly

  • Mention audio needs: "with background music."

  • Reference mode for character consistency

Best Practices:

  • Perfect for talking head videos

  • Great for product demonstrations with voiceover

  • Use for music video concepts

Kling 3.0 & O3 Models

Prompt Tips:

  • Emphasize camera movement: "360° orbit", "FPV drone shot"

  • Describe complex motion clearly

  • Use cinematic terminology

  • O3 and 3.0 models understand detailed prompts

Best Practices:

  • Kling 3.0 Pro for highest V3 quality with audio and element support

  • Kling O3 Pro for best O3 quality with advanced motion understanding

  • Kling 3.0/O3 Standard for fast iterations

  • Use 3-15 second duration slider for flexible duration

  • Enable audio for dialogue or ambient sound needs

Hailuo 2.3

Prompt Tips:

  • Focus on action and movement

  • Describe dynamic elements

  • Emphasize motion and flow

  • Great for sports and action scenes

Best Practices:

  • Sports highlights

  • Action sequences

  • Dynamic B-roll

  • Fast-paced content

Grok (xAI)

Prompt Tips:

  • Natural language works well

  • Describe motion and mood clearly

  • Works great with simple, direct prompts

  • Supports image-to-video and video-to-video

Best Practices:

  • Mobile-first content (use unique mobile aspect ratios)

  • Quick iterations and previews

  • Social media shorts

  • Panoramic landscape content (2:1 ratio)

  • Fast turnaround projects

Unique Features:

  • Text-to-Video, Image-to-Video, and Video-to-Video modes

  • Unique aspect ratios: 2:1, 1:2, 20:9, 19.5:9, 9:19.5, 9:20

  • Duration range: 4-15 seconds

  • Resolution: 480p or 720p

Cost Considerations

Approximate costs per second of video (varies by model and resolution):

  • Sora 2: ~$0.05-0.15/second

  • Veo 3.1: ~$0.04-0.12/second

  • Grok: ~$0.03-0.08/second

  • Kling models: ~$0.03-0.10/second

  • Hailuo 2.3: ~$0.02-0.08/second

  • WAN 2.5: ~$0.04-0.12/second

Note: Check fal.ai/pricingarrow-up-right for current rates.

Switching Models

You can easily switch models:

  1. After generation: Click "Regenerate" and select a different model

  2. Before generation: Change the model in the model selector

  3. Compare results: Generate the same prompt with different models

Troubleshooting

"a Model not available."

  • Check your Fal.ai account has sufficient credits

  • Verify the model is supported in your region

  • Some models may have temporary availability issues

"Audio not generating."

  • Ensure that you selected a model with audio support (Veo 3.1, Kling 3.0, or Kling O3)

  • Check "Audio" toggle is enabled

  • Verify your prompt mentions audio needs

"Duration limit reached"

  • Each model has maximum duration limits

  • Use Sora 2 for longer clips (up to 12s)

  • Consider splitting into multiple clips


Next: Learn about Text-to-Video generation workflows.

Last updated