Supported Video Models
Chat Video Pro supports multiple state-of-the-art AI video generation models, each optimized for different use cases.
Model Comparison
Sora 2
OpenAI
4-12s
720p-1080p
❌
Long clips, cinematic quality
Sora 2 Pro
OpenAI
4-12s
1080p-4K
❌
Highest quality, longer clips
Veo 3.1
4-8s
720p-1080p
✅
Dialogue, speaking characters
Veo 3.1 Fast
4-8s
720p-1080p
✅
Quick iterations with audio
Kling 2.6 Pro
Kuaishou
5-10s
720p
✅
Longer clips with audio
Kling 2.1 Pro
Kuaishou
5-10s
720p
❌
Complex camera movement
Kling O1
Kuaishou
5-10s
720p-1080p
❌
Best prompt understanding
Hailuo 2.3
Hailuo
4-8s
720p-1080p
❌
Action, sports, dynamic scenes
WAN 2.5
WAN
4-8s
1080p-4K
❌
High resolution output
Quick Selection Guide
Choose Sora 2 if:
✅ You need clips longer than 8 seconds (up to 12s)
✅ You want the highest cinematic quality
✅ You don't need audio generation
✅ Budget allows for premium model
Choose Veo 3.1 if:
✅ Your scene has dialogue or speaking characters
✅ You need synchronized audio and lip-sync
✅ You want 8-second clips with audio
✅ You need character consistency (with Reference mode)
Choose Kling 2.6 if:
✅ You need audio generation (5-10 seconds)
✅ You want longer clips than Veo (up to 10s)
✅ You need good quality with audio
Choose Kling 2.1 Pro if:
✅ You need complex camera movement (360°, FPV, orbit)
✅ You want cinematic camera work
✅ You don't need audio
Choose Kling O1 if:
✅ You have a complex or detailed prompt
✅ You want the best prompt understanding
✅ You need reference mode with 2-7 images
Choose Hailuo 2.3 if:
✅ Your scene has action, sports, or stunts
✅ You need dynamic, fast-paced movement
✅ You want smooth motion and fluid animation
Choose WAN 2.5 if:
✅ You need native 1080p or 4K output
✅ Resolution is more important than duration
✅ You want high-quality upscaling source
Models with Audio Generation
Veo 3.1 & Veo 3.1 Fast
Audio Capabilities:
✅ Native audio generation
✅ Excellent lip-sync
✅ Sound effects
✅ Music generation
Best For:
Dialogue scenes
Speaking characters
Music videos
Scenes requiring synchronized audio
Limitations:
8-second maximum duration
Audio quality varies by prompt
Kling 2.6 Pro
Audio Capabilities:
✅ Native audio generation
✅ Good lip-sync
✅ Sound effects
Best For:
Longer clips with audio (up to 10s)
Scenes needing audio but not dialogue-critical
Limitations:
Audio quality may be less precise than Veo
Models Without Audio
For models without audio generation:
Add audio in post-production (Premiere Pro)
Use music libraries
Record voiceover separately
Import existing audio tracks
Specialized Models
Transition Models
Kling O1 Transition
Optimized for image-to-image transitions
Requires exactly 2 images
Creates smooth morphing between frames
Kling 2.5 Transition
Alternative transition model
Good for frame-to-frame transitions
Veo 3.1 First/Last
Transition mode with audio support
Can generate sound for transitions
Reference Mode Models
Veo 3.1 Reference
Maintains character/product consistency
Supports 1-3 reference images
8-second duration (locked)
Audio generation supported
Kling O1 Reference
Advanced reference mode
Supports 2-7 reference images
5s or 10s duration options
Uses @Image syntax in prompts
Model-Specific Tips
Sora 2 / Sora 2 Pro
Prompt Tips:
Use cinematic language
Describe camera movement clearly
Mention style and mood
Longer prompts work well
Best Practices:
Great for establishing shots
Excellent for B-roll
Use for longer narrative sequences
Veo 3.1
Prompt Tips:
Include dialogue in quotes: "Hello, welcome to..."
Describe speaking character clearly
Mention audio needs: "with background music"
Reference mode for character consistency
Best Practices:
Perfect for talking head videos
Great for product demonstrations with voiceover
Use for music video concepts
Kling Models
Prompt Tips:
Emphasize camera movement: "360° orbit", "FPV drone shot"
Describe complex motion clearly
Use cinematic terminology
O1 models understand detailed prompts
Best Practices:
Kling 2.1 Pro for dynamic camera work
Kling O1 for complex scenes
Kling 2.6 for audio needs
Hailuo 2.3
Prompt Tips:
Focus on action and movement
Describe dynamic elements
Emphasize motion and flow
Great for sports and action scenes
Best Practices:
Sports highlights
Action sequences
Dynamic B-roll
Fast-paced content
Cost Considerations
Approximate costs per second of video (varies by model and resolution):
Sora 2: ~$0.05-0.15/second
Veo 3.1: ~$0.04-0.12/second
Kling models: ~$0.03-0.10/second
Hailuo 2.3: ~$0.02-0.08/second
WAN 2.5: ~$0.04-0.12/second
Note: Check fal.ai/pricing for current rates.
Switching Models
You can easily switch models:
After generation: Click "Regenerate" and select different model
Before generation: Change model in the model selector
Compare results: Generate same prompt with different models
Troubleshooting
"Model not available"
Check your Fal.ai account has sufficient credits
Verify the model is supported in your region
Some models may have temporary availability issues
"Audio not generating"
Ensure you selected a model with audio support (Veo 3.1, Kling 2.6)
Check "Audio" toggle is enabled
Verify your prompt mentions audio needs
"Duration limit reached"
Each model has maximum duration limits
Use Sora 2 for longer clips (up to 12s)
Consider splitting into multiple clips
Next: Learn about Text-to-Video generation workflows.
Last updated