# Supported Video Models

### Model Comparison

<table><thead><tr><th width="128">Model</th><th width="119">Provider</th><th width="103">Duration</th><th width="134">Resolution</th><th width="90">Audio</th><th>Best For</th></tr></thead><tbody><tr><td><strong>Sora 2</strong></td><td>OpenAI</td><td>4-12s</td><td>720p</td><td>❌</td><td>Long clips, cinematic quality</td></tr><tr><td><strong>Sora 2 Pro</strong></td><td>OpenAI</td><td>4-12s</td><td>720p-1080p</td><td>❌</td><td>Highest quality, longer clips</td></tr><tr><td><strong>Veo 3.1</strong></td><td>Google</td><td>4-8s</td><td>720p-4K</td><td>✅</td><td>Dialogue, speaking characters</td></tr><tr><td><strong>Veo 3.1 Fast</strong></td><td>Google</td><td>4-8s</td><td>720p-4K</td><td>✅</td><td>Quick iterations with audio</td></tr><tr><td><strong>Veo 3.1 Lite</strong></td><td>Google</td><td>4-8s</td><td>720p-1080p</td><td>❌</td><td>Budget Veo: text, image, and transition modes; no generated audio</td></tr><tr><td><strong>Grok</strong></td><td>xAI</td><td>4-15s</td><td>480p-720p</td><td>❌</td><td>Fast generation, unique aspect ratios</td></tr><tr><td><strong>Kling 3.0 Pro</strong></td><td>Kuaishou</td><td>3-15s</td><td>720p-4K</td><td>✅</td><td>Highest V3 quality, element support</td></tr><tr><td><strong>Kling 3.0 Standard</strong></td><td>Kuaishou</td><td>3-15s</td><td>720p-4K</td><td>✅</td><td>Fast V3 generation with audio</td></tr><tr><td><strong>Kling O3 Pro</strong></td><td>Kuaishou</td><td>3-15s</td><td>720p-4K</td><td>✅</td><td>Advanced motion, scene understanding</td></tr><tr><td><strong>Kling O3 Standard</strong></td><td>Kuaishou</td><td>3-15s</td><td>720p-4K</td><td>✅</td><td>Fast O3 generation with audio</td></tr><tr><td><strong>Hailuo 2.3</strong></td><td>Hailuo</td><td>4-8s</td><td>720p-1080p</td><td>❌</td><td>Action, sports, dynamic scenes</td></tr><tr><td><strong>Seedance 2</strong></td><td>ByteDance</td><td>4-15s</td><td>480p-1080p</td><td>✅</td><td>Natural motion, cinematic with audio</td></tr><tr><td><strong>Seedance 2 Fast</strong></td><td>ByteDance</td><td>4-15s</td><td>480p-1080p</td><td>✅</td><td>Quick iterations with audio</td></tr><tr><td><strong>Wan 2.7</strong></td><td>Wan</td><td>2-15s</td><td>720p-1080p</td><td>❌</td><td>High resolution, reference mode, video edit</td></tr></tbody></table>

### Quick Selection Guide

#### Choose Sora 2 if:

* ✅ You need clips longer than 8 seconds (up to 12s)
* ✅ You want the highest cinematic quality
* ✅ You don't need audio generation
* ✅ Budget allows for premium model

#### Choose Veo 3.1 if:

* ✅ Your scene has dialogue or speaking characters
* ✅ You need synchronized audio and lip-sync
* ✅ You want an 8-second clip audio
* ✅ You need character consistency (with Reference mode)

#### Choose Veo 3.1 Lite if:

* ✅ You want the lowest-cost **Veo** option for everyday drafts and iterations
* ✅ You need **text-to-video**, **image-to-video**, or **transition mode** (two images / first–last frame) in one model
* ✅ **720p or 1080p** is enough (Lite does not offer 4K)
* ✅ You will add dialogue, music, or SFX in Premiere — Lite does **not** generate audio

#### Choose Kling 3.0 Pro if:

* ✅ You need the highest V3 quality with audio support (3-15s)
* ✅ You want element support for character consistency
* ✅ You need a universal model (T2V, I2V, and Transition in one)

#### Choose Kling 3.0 Standard if:

* ✅ You need fast V3 generation with good quality and audio
* ✅ You want cinematic camera work at lower cost
* ✅ You need quick iterations

#### Choose Kling O3 Pro if:

* ✅ You need the best O3 quality with advanced motion understanding
* ✅ You want clips from 3-15 seconds with audio
* ✅ You need superior scene understanding and motion quality

#### Choose Kling O3 Standard if:

* ✅ You need fast O3 generation with good quality and audio
* ✅ You want advanced motion at lower cost
* ✅ You need quick iterations with O3 quality

#### Choose Kling O3 Reference if:

* ✅ You need character consistency with reference images
* ✅ You want up to 7 reference images with 3-15s duration
* ✅ You need reference mode with audio support

#### Choose Hailuo 2.3 if:

* ✅ Your scene has action, sports, or stunts
* ✅ You need dynamic, fast-paced movement
* ✅ You want smooth motion and fluid animation

#### Choose Seedance 2 if:

* ✅ You need natural, stable motion with cinematic quality
* ✅ You want native audio generation (default ON)
* ✅ You need up to 1080p native resolution with audio
* ✅ You need ultra-wide 21:9 aspect ratio support
* ✅ You want image-to-video with start+end frames for transitions
* ✅ You need reference mode with up to 9 images, 3 audio files, and audio generation
* ❌ **Not recommended for:** Human faces, close-up character shots, or dialogue-driven scenes — use Kling 3.0 Pro instead for realistic human subjects

#### Choose Wan 2.7 if:

* ✅ You need native 1080p output at up to 15 seconds
* ✅ You want reference mode with up to 9 images
* ✅ You need interpolation between two images (start+end)
* ✅ You want instruction-based video editing (V2V)
* ✅ You need flexible aspect ratios (16:9, 9:16, 1:1, 4:3, 3:4)

#### Choose Grok if:

* ✅ You need fast video generation (quick turnaround)
* ✅ You want unique mobile aspect ratios (19.5:9, 9:19.5, 20:9, 9:20)
* ✅ You need panoramic formats (2:1, 1:2)
* ✅ You're working on mobile-first content
* ✅ Speed is more important than maximum resolution

### Models with Audio Generation

#### Veo 3.1 & Veo 3.1 Fast

**Audio Capabilities:**

* ✅ Native audio generation
* ✅ Excellent lip-sync
* ✅ Sound effects
* ✅ Music generation

**Best For:**

* Dialogue scenes
* Speaking characters
* Music videos
* Scenes requiring synchronized audio

**Limitations:**

* 8-second maximum duration
* Audio quality varies by prompt

**Veo 3.1 Lite** uses the same duration options (4s / 6s / 8s) for text, image, and transition workflows but **does not** generate audio — add sound in post.

#### Kling 3.0 & O3 (All Models)

**Audio Capabilities:**

* ✅ Native audio generation (all Kling 3.0 and O3 models)
* ✅ Good lip-sync
* ✅ Sound effects

**Best For:**

* Longer clips with audio (3-15 seconds)
* Scenes needing audio with flexible duration
* Includes V3 Pro/Standard, O3 Pro/Standard, O3 Transition, and O3 Reference

**Limitations:**

* Audio quality may be less precise than Veo for dialogue

#### Seedance 2 & Seedance 2 Fast

**Audio Capabilities:**

* ✅ Native audio generation (enabled by default)
* ✅ Ambient sound effects
* ✅ Scene-appropriate audio

**Best For:**

* Cinematic scenes with natural sound
* Clips up to 15 seconds with audio
* Reference mode with audio (up to 9 images)

**Limitations:**

* Audio style less dialogue-focused than Veo

### Models Without Audio

For models without audio generation:

* Add audio in post-production (Premiere Pro)
* Use music libraries
* Record voiceover separately
* Import existing audio tracks

### Specialized Models

#### Video-to-Video Models

**VEO Extend (+7s)**

* Extends any video by 7 seconds
* Maximum total duration: 30 seconds
* Input video must be 23 seconds or less
* Output resolution: 720p (fixed)
* No text prompt required (optional)
* Best for: Extending existing videos seamlessly

#### Transition Models

**Kling O3 Transition (Pro/Standard)**

* Dedicated transition model with superior morphing quality
* Attach exactly 2 images for start and end frames
* 3-15 second duration with audio support
* Pro/Standard quality toggle
* Best quality transitions with character consistency

**Kling 3.0 Pro (Transition Mode)**

* Attach 2 images for automatic transition mode
* 3-15 second duration with audio support
* Universal model (also works for T2V and I2V)

**Kling 3.0 Standard (Transition Mode)**

* Same as Pro but faster generation
* 3-15 second duration with audio support
* Good for quick iterations

**Veo 3.1 First/Last**

* Transition mode with audio support
* 4-8 second duration
* Can generate sound for transitions

**Veo 3.1 Lite** (first–last / transition)

* Same two-image transition workflow; **no** generated audio
* 720p or 1080p only

**Seedance 2 (Transition Mode)**

* Attach 2 images for start and end frames
* 4-15 second duration with audio support
* Also available as Seedance 2 Fast for quicker iterations

**Wan 2.7 (Interpolation Mode)**

* Attach 2 images for start and end frames
* Supports interpolation between two keyframes
* 2-15 second duration, 720p-1080p

> **Note:** Kling 3.0 models are **universal** - they work for T2V (0 images), I2V (1 image), and Transition (2 images). Kling O3 Transition is a **dedicated** transition model that appears when you attach 2 images.

#### Reference Mode Models

**Veo 3.1 Reference**

* Maintains character/product consistency
* Supports 2-4 reference images
* 8-second duration (locked)
* Aspect ratio: 16:9 or 9:16 only (auto-determined from images)
* Audio generation supported
* Resolution: 720p-1080p

**Kling O3 Reference (Pro/Standard)**

* Advanced reference mode with latest Kling technology
* Supports 1-7 reference images
* 3-15 second duration (flexible, not fixed)
* Pro/Standard quality toggle
* Audio generation supported
* Also supports transition mode when given start and end frames
* Uses @Element syntax for character references
* Improved character consistency over O1

**Seedance 2 Reference**

* Maintains character/subject consistency with reference images
* Supports up to 9 reference images and up to 3 audio reference files
* 4-15 second duration with audio generation support
* Also available as Seedance 2 Fast Reference
* Resolution: 480p-1080p

**Wan 2.7 Reference**

* Reference mode with flexible image count
* Supports 1-9 reference images
* 2-10 second duration
* Resolution: 720p-1080p
* No audio generation

#### Video-to-Video Models

**Kling O3 VFX (Pro/Standard)**

* AI-powered VFX effects and character replacement
* Input: 1 video (3-10 seconds, max 200MB) + 0-4 optional reference images
* Pro/Standard quality toggle
* Can keep original audio
* Best for: VFX effects, character replacement, video transformations

**Kling O3 Multi-Cam**

* Generate new camera angles and shots from reference video
* Input: 1 video (3-10 seconds, max 200MB)
* 3-15 second output duration
* Aspect ratio: auto, 16:9, 9:16, 1:1
* Can keep original audio
* Best for: Reverse angles, wide shots, alternative perspectives

**Kling Motion Control**

* Transfer movements from a reference video to a character image
* Input: 1 video (1-30 seconds, max 200MB) + 1 character image (required)
* Best for: Motion transfer, character animation from reference

**Wan 2.7 Edit**

* Instruction-based video-to-video editing
* Input: 1 video + text instruction describing the edit
* Optional reference image for style guidance
* 2-10 second duration, 720p-1080p
* Best for: Modifying existing footage with natural language instructions

### Model-Specific Tips

#### Sora 2 / Sora 2 Pro

**Prompt Tips:**

* Use cinematic language
* Describe camera movement clearly
* Mention style and mood
* Longer prompts work well

**Best Practices:**

* Great for establishing shots
* Excellent for B-roll
* Use for longer narrative sequences

#### Veo 3.1

**Prompt Tips:**

* Include dialogue in quotes: "Hello, welcome to..."
* Describe the speaking character clearly
* Mention audio needs: "with background music."
* Reference mode for character consistency

**Best Practices:**

* Perfect for talking head videos
* Great for product demonstrations with voiceover
* Use for music video concepts

#### Veo 3.1 Lite

**Prompt Tips:**

* Same framing as Veo 3.1 for visuals — describe subject, motion, camera, and style clearly
* Do not rely on spoken dialogue in the generated clip; plan voiceover or subtitles in Premiere if needed

**Best Practices:**

* Ideal for fast previews, B-roll concepts, and transitions when cost matters more than onboard audio
* Use **transition mode** with two images for first-to-last-frame motion between keyframes

#### Kling 3.0 & O3 Models

**Prompt Tips:**

* Emphasize camera movement: "360° orbit", "FPV drone shot"
* Describe complex motion clearly
* Use cinematic terminology
* O3 and 3.0 models understand detailed prompts

**Best Practices:**

* Kling 3.0 Pro for highest V3 quality with audio and element support
* Kling O3 Pro for best O3 quality with advanced motion understanding
* Kling 3.0/O3 Standard for fast iterations
* Use 3-15 second duration slider for flexible duration
* Enable audio for dialogue or ambient sound needs

#### Hailuo 2.3

**Prompt Tips:**

* Focus on action and movement
* Describe dynamic elements
* Emphasize motion and flow
* Great for sports and action scenes

**Best Practices:**

* Sports highlights
* Action sequences
* Dynamic B-roll
* Fast-paced content

#### Grok (xAI)

**Prompt Tips:**

* Natural language works well
* Describe motion and mood clearly
* Works great with simple, direct prompts
* Supports image-to-video and video-to-video

**Best Practices:**

* Mobile-first content (use unique mobile aspect ratios)
* Quick iterations and previews
* Social media shorts
* Panoramic landscape content (2:1 ratio)
* Fast turnaround projects

**Unique Features:**

* Text-to-Video, Image-to-Video, and Video-to-Video modes
* Unique aspect ratios: 2:1, 1:2, 20:9, 19.5:9, 9:19.5, 9:20
* Duration range: 4-15 seconds
* Resolution: 480p or 720p

#### Seedance 2

**Prompt Tips:**

* Describe subject, environment, and motion clearly
* Audio is generated by default — mention specific sounds if desired
* Use cinematic language for best visual results
* Reference mode: provide consistent subject images for character continuity

**Best Practices:**

* Cinematic scenes with ambient audio
* Natural motion and stable subjects
* Ultra-wide 21:9 content for letterbox presentations
* Reference mode for maintaining subject consistency across clips

#### Wan 2.7

**Prompt Tips:**

* Describe motion, camera, and scene composition in detail
* For video edit mode, write clear instructions: "change the sky to sunset"
* Reference mode: attach up to 9 images for strongest consistency
* Interpolation: attach exactly 2 images for smooth keyframe transitions

**Best Practices:**

* High-resolution 1080p output when clarity matters
* Longer clips up to 15 seconds (T2V and I2V)
* Instruction-based video editing for modifying existing footage
* Reference mode with large image sets for maximum consistency

### Cost Considerations

Approximate costs per second of video (varies by model and resolution):

* **Sora 2:** \~$0.05-0.15/second
* **Veo 3.1:** \~$0.04-0.12/second
* **Veo 3.1 Lite:** typically lower per-second cost than Veo 3.1 / Fast (check [fal.ai/pricing](https://fal.ai/pricing))
* **Grok:** \~$0.03-0.08/second
* **Kling models:** \~$0.03-0.10/second
* **Hailuo 2.3:** \~$0.02-0.08/second
* **Seedance 2:** check [fal.ai/pricing](https://fal.ai/pricing) for current rates
* **Wan 2.7:** check [fal.ai/pricing](https://fal.ai/pricing) for current rates

**Note:** Check [fal.ai/pricing](https://fal.ai/pricing) for current rates.

### Switching Models

You can easily switch models:

1. **After generation:** Click "Regenerate" and select a different model
2. **Before generation:** Change the model in the model selector
3. **Compare results:** Generate the same prompt with different models

### Troubleshooting

#### "a Model not available."

* Check your Fal.ai account has sufficient credits
* Verify the model is supported in your region
* Some models may have temporary availability issues

#### "Audio not generating."

* Ensure that you selected a model with audio support (Veo 3.1, Veo 3.1 Fast, Seedance 2, Kling 3.0, or Kling O3 — not **Veo 3.1 Lite** or **Wan 2.7**)
* Check "Audio" toggle is enabled
* Verify your prompt mentions audio needs

#### "Duration limit reached"

* Each model has maximum duration limits
* Use Sora 2 for longer clips (up to 12s)
* Consider splitting into multiple clips

***

**Next:** Learn about [Text-to-Video generation](/features/video-generation/text-to-video.md) workflows.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.chatvideopro.com/features/video-generation/supported-video-models.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
