Supported Video Models
Chat Video Pro supports multiple state-of-the-art AI video generation models, each optimized for different use cases.
Model Comparison
Sora 2
OpenAI
4-12s
720p
❌
Long clips, cinematic quality
Sora 2 Pro
OpenAI
4-12s
720p-1080p
❌
Highest quality, longer clips
Veo 3.1
4-8s
720p-4K
✅
Dialogue, speaking characters
Veo 3.1 Fast
4-8s
720p-4K
✅
Quick iterations with audio
Veo 3.1 Lite
4-8s
720p-1080p
❌
Budget Veo: text, image, and transition modes; no generated audio
Grok
xAI
4-15s
480p-720p
❌
Fast generation, unique aspect ratios
Kling 3.0 Pro
Kuaishou
3-15s
720p-4K
✅
Highest V3 quality, element support
Kling 3.0 Standard
Kuaishou
3-15s
720p-4K
✅
Fast V3 generation with audio
Kling O3 Pro
Kuaishou
3-15s
720p-4K
✅
Advanced motion, scene understanding
Kling O3 Standard
Kuaishou
3-15s
720p-4K
✅
Fast O3 generation with audio
Hailuo 2.3
Hailuo
4-8s
720p-1080p
❌
Action, sports, dynamic scenes
Seedance 2
ByteDance
4-15s
480p-1080p
✅
Natural motion, cinematic with audio
Seedance 2 Fast
ByteDance
4-15s
480p-1080p
✅
Quick iterations with audio
Wan 2.7
Wan
2-15s
720p-1080p
❌
High resolution, reference mode, video edit
Quick Selection Guide
Choose Sora 2 if:
✅ You need clips longer than 8 seconds (up to 12s)
✅ You want the highest cinematic quality
✅ You don't need audio generation
✅ Budget allows for premium model
Choose Veo 3.1 if:
✅ Your scene has dialogue or speaking characters
✅ You need synchronized audio and lip-sync
✅ You want an 8-second clip audio
✅ You need character consistency (with Reference mode)
Choose Veo 3.1 Lite if:
✅ You want the lowest-cost Veo option for everyday drafts and iterations
✅ You need text-to-video, image-to-video, or transition mode (two images / first–last frame) in one model
✅ 720p or 1080p is enough (Lite does not offer 4K)
✅ You will add dialogue, music, or SFX in Premiere — Lite does not generate audio
Choose Kling 3.0 Pro if:
✅ You need the highest V3 quality with audio support (3-15s)
✅ You want element support for character consistency
✅ You need a universal model (T2V, I2V, and Transition in one)
Choose Kling 3.0 Standard if:
✅ You need fast V3 generation with good quality and audio
✅ You want cinematic camera work at lower cost
✅ You need quick iterations
Choose Kling O3 Pro if:
✅ You need the best O3 quality with advanced motion understanding
✅ You want clips from 3-15 seconds with audio
✅ You need superior scene understanding and motion quality
Choose Kling O3 Standard if:
✅ You need fast O3 generation with good quality and audio
✅ You want advanced motion at lower cost
✅ You need quick iterations with O3 quality
Choose Kling O3 Reference if:
✅ You need character consistency with reference images
✅ You want up to 7 reference images with 3-15s duration
✅ You need reference mode with audio support
Choose Hailuo 2.3 if:
✅ Your scene has action, sports, or stunts
✅ You need dynamic, fast-paced movement
✅ You want smooth motion and fluid animation
Choose Seedance 2 if:
✅ You need natural, stable motion with cinematic quality
✅ You want native audio generation (default ON)
✅ You need up to 1080p native resolution with audio
✅ You need ultra-wide 21:9 aspect ratio support
✅ You want image-to-video with start+end frames for transitions
✅ You need reference mode with up to 9 images, 3 audio files, and audio generation
❌ Not recommended for: Human faces, close-up character shots, or dialogue-driven scenes — use Kling 3.0 Pro instead for realistic human subjects
Choose Wan 2.7 if:
✅ You need native 1080p output at up to 15 seconds
✅ You want reference mode with up to 9 images
✅ You need interpolation between two images (start+end)
✅ You want instruction-based video editing (V2V)
✅ You need flexible aspect ratios (16:9, 9:16, 1:1, 4:3, 3:4)
Choose Grok if:
✅ You need fast video generation (quick turnaround)
✅ You want unique mobile aspect ratios (19.5:9, 9:19.5, 20:9, 9:20)
✅ You need panoramic formats (2:1, 1:2)
✅ You're working on mobile-first content
✅ Speed is more important than maximum resolution
Models with Audio Generation
Veo 3.1 & Veo 3.1 Fast
Audio Capabilities:
✅ Native audio generation
✅ Excellent lip-sync
✅ Sound effects
✅ Music generation
Best For:
Dialogue scenes
Speaking characters
Music videos
Scenes requiring synchronized audio
Limitations:
8-second maximum duration
Audio quality varies by prompt
Veo 3.1 Lite uses the same duration options (4s / 6s / 8s) for text, image, and transition workflows but does not generate audio — add sound in post.
Kling 3.0 & O3 (All Models)
Audio Capabilities:
✅ Native audio generation (all Kling 3.0 and O3 models)
✅ Good lip-sync
✅ Sound effects
Best For:
Longer clips with audio (3-15 seconds)
Scenes needing audio with flexible duration
Includes V3 Pro/Standard, O3 Pro/Standard, O3 Transition, and O3 Reference
Limitations:
Audio quality may be less precise than Veo for dialogue
Seedance 2 & Seedance 2 Fast
Audio Capabilities:
✅ Native audio generation (enabled by default)
✅ Ambient sound effects
✅ Scene-appropriate audio
Best For:
Cinematic scenes with natural sound
Clips up to 15 seconds with audio
Reference mode with audio (up to 9 images)
Limitations:
Audio style less dialogue-focused than Veo
Models Without Audio
For models without audio generation:
Add audio in post-production (Premiere Pro)
Use music libraries
Record voiceover separately
Import existing audio tracks
Specialized Models
Video-to-Video Models
VEO Extend (+7s)
Extends any video by 7 seconds
Maximum total duration: 30 seconds
Input video must be 23 seconds or less
Output resolution: 720p (fixed)
No text prompt required (optional)
Best for: Extending existing videos seamlessly
Transition Models
Kling O3 Transition (Pro/Standard)
Dedicated transition model with superior morphing quality
Attach exactly 2 images for start and end frames
3-15 second duration with audio support
Pro/Standard quality toggle
Best quality transitions with character consistency
Kling 3.0 Pro (Transition Mode)
Attach 2 images for automatic transition mode
3-15 second duration with audio support
Universal model (also works for T2V and I2V)
Kling 3.0 Standard (Transition Mode)
Same as Pro but faster generation
3-15 second duration with audio support
Good for quick iterations
Veo 3.1 First/Last
Transition mode with audio support
4-8 second duration
Can generate sound for transitions
Veo 3.1 Lite (first–last / transition)
Same two-image transition workflow; no generated audio
720p or 1080p only
Seedance 2 (Transition Mode)
Attach 2 images for start and end frames
4-15 second duration with audio support
Also available as Seedance 2 Fast for quicker iterations
Wan 2.7 (Interpolation Mode)
Attach 2 images for start and end frames
Supports interpolation between two keyframes
2-15 second duration, 720p-1080p
Note: Kling 3.0 models are universal - they work for T2V (0 images), I2V (1 image), and Transition (2 images). Kling O3 Transition is a dedicated transition model that appears when you attach 2 images.
Reference Mode Models
Veo 3.1 Reference
Maintains character/product consistency
Supports 2-4 reference images
8-second duration (locked)
Aspect ratio: 16:9 or 9:16 only (auto-determined from images)
Audio generation supported
Resolution: 720p-1080p
Kling O3 Reference (Pro/Standard)
Advanced reference mode with latest Kling technology
Supports 1-7 reference images
3-15 second duration (flexible, not fixed)
Pro/Standard quality toggle
Audio generation supported
Also supports transition mode when given start and end frames
Uses @Element syntax for character references
Improved character consistency over O1
Seedance 2 Reference
Maintains character/subject consistency with reference images
Supports up to 9 reference images and up to 3 audio reference files
4-15 second duration with audio generation support
Also available as Seedance 2 Fast Reference
Resolution: 480p-1080p
Wan 2.7 Reference
Reference mode with flexible image count
Supports 1-9 reference images
2-10 second duration
Resolution: 720p-1080p
No audio generation
Video-to-Video Models
Kling O3 VFX (Pro/Standard)
AI-powered VFX effects and character replacement
Input: 1 video (3-10 seconds, max 200MB) + 0-4 optional reference images
Pro/Standard quality toggle
Can keep original audio
Best for: VFX effects, character replacement, video transformations
Kling O3 Multi-Cam
Generate new camera angles and shots from reference video
Input: 1 video (3-10 seconds, max 200MB)
3-15 second output duration
Aspect ratio: auto, 16:9, 9:16, 1:1
Can keep original audio
Best for: Reverse angles, wide shots, alternative perspectives
Kling Motion Control
Transfer movements from a reference video to a character image
Input: 1 video (1-30 seconds, max 200MB) + 1 character image (required)
Best for: Motion transfer, character animation from reference
Wan 2.7 Edit
Instruction-based video-to-video editing
Input: 1 video + text instruction describing the edit
Optional reference image for style guidance
2-10 second duration, 720p-1080p
Best for: Modifying existing footage with natural language instructions
Model-Specific Tips
Sora 2 / Sora 2 Pro
Prompt Tips:
Use cinematic language
Describe camera movement clearly
Mention style and mood
Longer prompts work well
Best Practices:
Great for establishing shots
Excellent for B-roll
Use for longer narrative sequences
Veo 3.1
Prompt Tips:
Include dialogue in quotes: "Hello, welcome to..."
Describe the speaking character clearly
Mention audio needs: "with background music."
Reference mode for character consistency
Best Practices:
Perfect for talking head videos
Great for product demonstrations with voiceover
Use for music video concepts
Veo 3.1 Lite
Prompt Tips:
Same framing as Veo 3.1 for visuals — describe subject, motion, camera, and style clearly
Do not rely on spoken dialogue in the generated clip; plan voiceover or subtitles in Premiere if needed
Best Practices:
Ideal for fast previews, B-roll concepts, and transitions when cost matters more than onboard audio
Use transition mode with two images for first-to-last-frame motion between keyframes
Kling 3.0 & O3 Models
Prompt Tips:
Emphasize camera movement: "360° orbit", "FPV drone shot"
Describe complex motion clearly
Use cinematic terminology
O3 and 3.0 models understand detailed prompts
Best Practices:
Kling 3.0 Pro for highest V3 quality with audio and element support
Kling O3 Pro for best O3 quality with advanced motion understanding
Kling 3.0/O3 Standard for fast iterations
Use 3-15 second duration slider for flexible duration
Enable audio for dialogue or ambient sound needs
Hailuo 2.3
Prompt Tips:
Focus on action and movement
Describe dynamic elements
Emphasize motion and flow
Great for sports and action scenes
Best Practices:
Sports highlights
Action sequences
Dynamic B-roll
Fast-paced content
Grok (xAI)
Prompt Tips:
Natural language works well
Describe motion and mood clearly
Works great with simple, direct prompts
Supports image-to-video and video-to-video
Best Practices:
Mobile-first content (use unique mobile aspect ratios)
Quick iterations and previews
Social media shorts
Panoramic landscape content (2:1 ratio)
Fast turnaround projects
Unique Features:
Text-to-Video, Image-to-Video, and Video-to-Video modes
Unique aspect ratios: 2:1, 1:2, 20:9, 19.5:9, 9:19.5, 9:20
Duration range: 4-15 seconds
Resolution: 480p or 720p
Seedance 2
Prompt Tips:
Describe subject, environment, and motion clearly
Audio is generated by default — mention specific sounds if desired
Use cinematic language for best visual results
Reference mode: provide consistent subject images for character continuity
Best Practices:
Cinematic scenes with ambient audio
Natural motion and stable subjects
Ultra-wide 21:9 content for letterbox presentations
Reference mode for maintaining subject consistency across clips
Wan 2.7
Prompt Tips:
Describe motion, camera, and scene composition in detail
For video edit mode, write clear instructions: "change the sky to sunset"
Reference mode: attach up to 9 images for strongest consistency
Interpolation: attach exactly 2 images for smooth keyframe transitions
Best Practices:
High-resolution 1080p output when clarity matters
Longer clips up to 15 seconds (T2V and I2V)
Instruction-based video editing for modifying existing footage
Reference mode with large image sets for maximum consistency
Cost Considerations
Approximate costs per second of video (varies by model and resolution):
Sora 2: ~$0.05-0.15/second
Veo 3.1: ~$0.04-0.12/second
Veo 3.1 Lite: typically lower per-second cost than Veo 3.1 / Fast (check fal.ai/pricing)
Grok: ~$0.03-0.08/second
Kling models: ~$0.03-0.10/second
Hailuo 2.3: ~$0.02-0.08/second
Seedance 2: check fal.ai/pricing for current rates
Wan 2.7: check fal.ai/pricing for current rates
Note: Check fal.ai/pricing for current rates.
Switching Models
You can easily switch models:
After generation: Click "Regenerate" and select a different model
Before generation: Change the model in the model selector
Compare results: Generate the same prompt with different models
Troubleshooting
"a Model not available."
Check your Fal.ai account has sufficient credits
Verify the model is supported in your region
Some models may have temporary availability issues
"Audio not generating."
Ensure that you selected a model with audio support (Veo 3.1, Veo 3.1 Fast, Seedance 2, Kling 3.0, or Kling O3 — not Veo 3.1 Lite or Wan 2.7)
Check "Audio" toggle is enabled
Verify your prompt mentions audio needs
"Duration limit reached"
Each model has maximum duration limits
Use Sora 2 for longer clips (up to 12s)
Consider splitting into multiple clips
Next: Learn about Text-to-Video generation workflows.
Last updated