Supported Video Models
Chat Video Pro supports multiple state-of-the-art AI video generation models, each optimized for different use cases.
Model Comparison
Sora 2
OpenAI
4-12s
720p
❌
Long clips, cinematic quality
Sora 2 Pro
OpenAI
4-12s
720p-1080p
❌
Highest quality, longer clips
Veo 3.1
4-8s
720p-4K
✅
Dialogue, speaking characters
Veo 3.1 Fast
4-8s
720p-4K
✅
Quick iterations with audio
Grok
xAI
4-15s
480p-720p
❌
Fast generation, unique aspect ratios
Kling 3.0 Pro
Kuaishou
3-15s
720p
✅
Highest V3 quality, element support
Kling 3.0 Standard
Kuaishou
3-15s
720p
✅
Fast V3 generation with audio
Kling O3 Pro
Kuaishou
3-15s
720p
✅
Advanced motion, scene understanding
Kling O3 Standard
Kuaishou
3-15s
720p
✅
Fast O3 generation with audio
Hailuo 2.3
Hailuo
4-8s
720p-1080p
❌
Action, sports, dynamic scenes
WAN 2.5
WAN
5-10s
720p-1080p
❌
High resolution output
WAN 2.6
WAN
5-15s
720p-1080p
❌
Longer clips, new aspect ratios
Quick Selection Guide
Choose Sora 2 if:
✅ You need clips longer than 8 seconds (up to 12s)
✅ You want the highest cinematic quality
✅ You don't need audio generation
✅ Budget allows for premium model
Choose Veo 3.1 if:
✅ Your scene has dialogue or speaking characters
✅ You need synchronized audio and lip-sync
✅ You want an 8-second clip audio
✅ You need character consistency (with Reference mode)
Choose Kling 3.0 Pro if:
✅ You need the highest V3 quality with audio support (3-15s)
✅ You want element support for character consistency
✅ You need a universal model (T2V, I2V, and Transition in one)
Choose Kling 3.0 Standard if:
✅ You need fast V3 generation with good quality and audio
✅ You want cinematic camera work at lower cost
✅ You need quick iterations
Choose Kling O3 Pro if:
✅ You need the best O3 quality with advanced motion understanding
✅ You want clips from 3-15 seconds with audio
✅ You need superior scene understanding and motion quality
Choose Kling O3 Standard if:
✅ You need fast O3 generation with good quality and audio
✅ You want advanced motion at lower cost
✅ You need quick iterations with O3 quality
Choose Kling O3 Reference if:
✅ You need character consistency with reference images
✅ You want up to 7 reference images with 3-15s duration
✅ You need reference mode with audio support
Choose Hailuo 2.3 if:
✅ Your scene has action, sports, or stunts
✅ You need dynamic, fast-paced movement
✅ You want smooth motion and fluid animation
Choose WAN 2.5 if:
✅ You need native 1080p output
✅ Resolution is more important than duration
✅ You want high-quality upscaling source
Choose WAN 2.6 if:
✅ You need clips up to 15 seconds (longest duration option)
✅ You want 4:3 or 3:4 aspect ratios (unique to WAN 2.6)
✅ You need longer clips than standard models
Choose Grok if:
✅ You need fast video generation (quick turnaround)
✅ You want unique mobile aspect ratios (19.5:9, 9:19.5, 20:9, 9:20)
✅ You need panoramic formats (2:1, 1:2)
✅ You're working on mobile-first content
✅ Speed is more important than maximum resolution
Models with Audio Generation
Veo 3.1 & Veo 3.1 Fast
Audio Capabilities:
✅ Native audio generation
✅ Excellent lip-sync
✅ Sound effects
✅ Music generation
Best For:
Dialogue scenes
Speaking characters
Music videos
Scenes requiring synchronized audio
Limitations:
8-second maximum duration
Audio quality varies by prompt
Kling 3.0 & O3 (All Models)
Audio Capabilities:
✅ Native audio generation (all Kling 3.0 and O3 models)
✅ Good lip-sync
✅ Sound effects
Best For:
Longer clips with audio (3-15 seconds)
Scenes needing audio with flexible duration
Includes V3 Pro/Standard, O3 Pro/Standard, O3 Transition, and O3 Reference
Limitations:
Audio quality may be less precise than Veo for dialogue
Models Without Audio
For models without audio generation:
Add audio in post-production (Premiere Pro)
Use music libraries
Record voiceover separately
Import existing audio tracks
Specialized Models
Video-to-Video Models
VEO Extend (+7s)
Extends any video by 7 seconds
Maximum total duration: 30 seconds
Input video must be 23 seconds or less
Output resolution: 720p (fixed)
No text prompt required (optional)
Best for: Extending existing videos seamlessly
Transition Models
Kling O3 Transition (Pro/Standard)
Dedicated transition model with superior morphing quality
Attach exactly 2 images for start and end frames
3-15 second duration with audio support
Pro/Standard quality toggle
Best quality transitions with character consistency
Kling 3.0 Pro (Transition Mode)
Attach 2 images for automatic transition mode
3-15 second duration with audio support
Universal model (also works for T2V and I2V)
Kling 3.0 Standard (Transition Mode)
Same as Pro but faster generation
3-15 second duration with audio support
Good for quick iterations
Veo 3.1 First/Last
Transition mode with audio support
4-8 second duration
Can generate sound for transitions
Note: Kling 3.0 models are universal - they work for T2V (0 images), I2V (1 image), and Transition (2 images). Kling O3 Transition is a dedicated transition model that appears when you attach 2 images.
Reference Mode Models
Veo 3.1 Reference
Maintains character/product consistency
Supports 2-4 reference images
8-second duration (locked)
Aspect ratio: 16:9 or 9:16 only (auto-determined from images)
Audio generation supported
Resolution: 720p-1080p
Kling O3 Reference (Pro/Standard)
Advanced reference mode with latest Kling technology
Supports 1-7 reference images
3-15 second duration (flexible, not fixed)
Pro/Standard quality toggle
Audio generation supported
Also supports transition mode when given start and end frames
Uses @Element syntax for character references
Improved character consistency over O1
Video-to-Video Models
Kling O3 VFX (Pro/Standard)
AI-powered VFX effects and character replacement
Input: 1 video (3-10 seconds, max 200MB) + 0-4 optional reference images
Pro/Standard quality toggle
Can keep original audio
Best for: VFX effects, character replacement, video transformations
Kling O3 Multi-Cam
Generate new camera angles and shots from reference video
Input: 1 video (3-10 seconds, max 200MB)
3-15 second output duration
Aspect ratio: auto, 16:9, 9:16, 1:1
Can keep original audio
Best for: Reverse angles, wide shots, alternative perspectives
Kling Motion Control
Transfer movements from a reference video to a character image
Input: 1 video (1-30 seconds, max 200MB) + 1 character image (required)
Best for: Motion transfer, character animation from reference
Model-Specific Tips
Sora 2 / Sora 2 Pro
Prompt Tips:
Use cinematic language
Describe camera movement clearly
Mention style and mood
Longer prompts work well
Best Practices:
Great for establishing shots
Excellent for B-roll
Use for longer narrative sequences
Veo 3.1
Prompt Tips:
Include dialogue in quotes: "Hello, welcome to..."
Describe the speaking character clearly
Mention audio needs: "with background music."
Reference mode for character consistency
Best Practices:
Perfect for talking head videos
Great for product demonstrations with voiceover
Use for music video concepts
Kling 3.0 & O3 Models
Prompt Tips:
Emphasize camera movement: "360° orbit", "FPV drone shot"
Describe complex motion clearly
Use cinematic terminology
O3 and 3.0 models understand detailed prompts
Best Practices:
Kling 3.0 Pro for highest V3 quality with audio and element support
Kling O3 Pro for best O3 quality with advanced motion understanding
Kling 3.0/O3 Standard for fast iterations
Use 3-15 second duration slider for flexible duration
Enable audio for dialogue or ambient sound needs
Hailuo 2.3
Prompt Tips:
Focus on action and movement
Describe dynamic elements
Emphasize motion and flow
Great for sports and action scenes
Best Practices:
Sports highlights
Action sequences
Dynamic B-roll
Fast-paced content
Grok (xAI)
Prompt Tips:
Natural language works well
Describe motion and mood clearly
Works great with simple, direct prompts
Supports image-to-video and video-to-video
Best Practices:
Mobile-first content (use unique mobile aspect ratios)
Quick iterations and previews
Social media shorts
Panoramic landscape content (2:1 ratio)
Fast turnaround projects
Unique Features:
Text-to-Video, Image-to-Video, and Video-to-Video modes
Unique aspect ratios: 2:1, 1:2, 20:9, 19.5:9, 9:19.5, 9:20
Duration range: 4-15 seconds
Resolution: 480p or 720p
Cost Considerations
Approximate costs per second of video (varies by model and resolution):
Sora 2: ~$0.05-0.15/second
Veo 3.1: ~$0.04-0.12/second
Grok: ~$0.03-0.08/second
Kling models: ~$0.03-0.10/second
Hailuo 2.3: ~$0.02-0.08/second
WAN 2.5: ~$0.04-0.12/second
Note: Check fal.ai/pricing for current rates.
Switching Models
You can easily switch models:
After generation: Click "Regenerate" and select a different model
Before generation: Change the model in the model selector
Compare results: Generate the same prompt with different models
Troubleshooting
"a Model not available."
Check your Fal.ai account has sufficient credits
Verify the model is supported in your region
Some models may have temporary availability issues
"Audio not generating."
Ensure that you selected a model with audio support (Veo 3.1, Kling 3.0, or Kling O3)
Check "Audio" toggle is enabled
Verify your prompt mentions audio needs
"Duration limit reached"
Each model has maximum duration limits
Use Sora 2 for longer clips (up to 12s)
Consider splitting into multiple clips
Next: Learn about Text-to-Video generation workflows.
Last updated