Reference Mode

Generate videos with consistent characters using 1-9 reference images. Perfect for maintaining character appearance across multiple shots, creating talking head videos, or ensuring visual consistency

circle-info

Attaching exactly 2 images and seeing Transition Mode? By default, 2 uploaded images trigger Transition Mode (start frame → end frame morph). To use them as character reference photos instead, select a dedicated reference model from the model picker: Kling O3 Reference, Seedance 2 Reference, or Wan 2.7 Reference. These models treat multiple images as reference material, not as start/end frames. See Transition Mode if you actually want a morph.

How It Works

  1. Enable Generate Media - Toggle the Generate Media button in the composer

  2. Attach reference images - Upload 1-9 images of the same character/subject (count depends on model)

  3. Reference Mode activates - System detects multiple images and switches to reference model

  4. Select reference model - Veo 3.1 Reference, Kling O3, Wan 2.7 Reference, or Seedance 2 Reference (auto-selected)

  5. Describe the video - Write a prompt describing the scene and action

  6. Configure settings - Set duration (varies by model: 8s for Veo, 3-15s for Kling O3, 2-10s for Wan, 4-15s for Seedance)

  7. Generate - Click send to create a video with consistent character

Automatic Activation

Reference Mode activates when:

  • 1-9 images attached - Reference images detected (minimum count varies by model)

  • No videos attached - Reference mode is for generation, not editing

  • Model auto-switches - Compatible reference models appear

  • Element tags - Using @Element tags also triggers reference mode

Supported Models

Model
Reference Images
Duration
Audio
Best For

Veo 3.1 Reference

2-4 images

8s (fixed)

Character consistency with audio

Kling O3

2-7 images

3-15s

Maximum reference images, audio support, best quality

Wan 2.7 Reference

1-9 images

2-10s

Most reference images, high resolution

Seedance 2 Reference

Up to 9 images

4-15s

Reference with native audio

Reference Image Guidelines

Image Requirements

What makes good reference images:

  1. Same subject - All images should show the same person/character

  2. Clear face/features - Good visibility of key characteristics

  3. Varied angles - Different views help the model understand the subject

  4. Consistent lighting - Similar lighting conditions work best

  5. High quality - Clear, well-lit images produce better results

Image Count Strategy

1-3 images:

  • Minimum for reference mode (1 image minimum with Wan 2.7 Reference)

  • Works well for simple characters

  • Faster processing

4-7 images:

  • Maximum consistency with Kling O3

  • Better for complex characters

  • More reference points for the model

Up to 9 images (Wan 2.7 Reference / Seedance 2 Reference):

  • Highest number of reference inputs

  • Best for complex subjects requiring many angles

  • Wan 2.7 Reference supports 1-9 images; Seedance 2 Reference supports up to 9

What to Include

Good Reference Images:

  • Front-facing portrait

  • Side profile

  • 3/4 angle

  • Different expressions

  • Various lighting conditions

  • Different outfits (same person)

Poor Reference Images:

  • Different people

  • Unclear/blurry faces

  • Extreme angles

  • Very different styles

  • Inconsistent subject

Writing Reference Mode Prompts

What to Describe

Focus on the scene and action, not the character (reference images handle that):

  1. Setting - Where the scene takes place

  2. Action - What the character is doing

  3. Camera movement - How it's shot

  4. Style - Visual aesthetic and mood

  5. Dialogue (Veo only) - What the character says

Good Reference Mode Prompts

With Action:

With Dialogue (Veo):

With Camera Movement:

Bad Reference Mode Prompts

Describes Character:

(Reference images already provide this)

Too Vague:

(Not specific enough)

Missing Context:

(Needs setting, style, camera work)

Use Cases

Talking Head Videos

Example:

  • 3-4 reference images of the speaker

  • Prompt: "The speaker addresses the camera with confidence, explaining key concepts in a professional setting, and with natural lighting."

  • Model: Veo 3.1 Reference (for audio)

  • Result: Consistent talking head video with audio

Character Consistency

Example:

  • 4-5 reference images of a character

  • Prompt: "The character walks through a futuristic city, looking around with curiosity, cinematic style, neon lighting."

  • Model: Kling O3 (for quality)

  • Result: Character maintains appearance across shots

Product Demonstrations

Example:

  • 2-3 reference images of a presenter

  • Prompt: "The presenter demonstrates a product, showing features with enthusiasm, bright studio lighting, professional setting."

  • Model: Veo 3.1 Reference (for audio)

  • Result: Consistent presenter across multiple shots

Narrative Scenes

Example:

  • 5-7 reference images of the main character

  • Prompt: "The character enters a mysterious room, the camera follows their gaze, cinematic style, moody lighting, suspenseful atmosphere."

  • Model: Kling O3 (for quality and longer duration)

  • Result: Consistent character in narrative scene

Model-Specific Behavior

Veo 3.1 Reference

Characteristics:

  • Reference images: 2-4

  • Duration: 8 seconds (fixed)

  • Aspect ratio: 16:9 or 9:16 only (auto-determined from reference images)

  • Audio: Supported (dialogue generation)

  • Resolution: 720p-1080p

Best for:

  • Talking head videos

  • When you need audio/dialogue

  • Character consistency with speech

  • Professional presentations

Limitations:

  • Fixed 8-second duration

  • Maximum 4 reference images

  • Aspect ratio limited to 16:9 or 9:16 (auto-determined from images)

Kling O3

Characteristics:

  • Reference images: 1-7

  • Duration: 3-15 seconds (variable, Pro/Standard toggle)

  • Aspect ratio: 16:9, 9:16, or 1:1

  • Audio: Supported (native audio generation)

  • Resolution: 720p

Best for:

  • Maximum character consistency with audio

  • Flexible duration (3-15 seconds)

  • More reference images (up to 7)

  • High-quality results with sound

Limitations:

  • Fixed aspect ratios (no auto)

Wan 2.7 Reference

Characteristics:

  • Reference images: 1-9

  • Duration: 2-10 seconds (variable)

  • Aspect ratio: 16:9, 9:16, 1:1, 4:3, or 3:4

  • Audio: Not supported

  • Resolution: 720p-1080p

Best for:

  • Maximum number of reference images (up to 9)

  • High resolution output

  • When you need the most reference angles

  • Flexible aspect ratios

Limitations:

  • No audio generation

  • Shorter maximum duration than other models

Seedance 2 Reference

Characteristics:

  • Reference images: Up to 9

  • Audio reference files: Up to 3 (attach audio to influence generated sound)

  • Duration: 4-15 seconds (variable)

  • Aspect ratio: Auto-detected from input images

  • Audio: Supported (native audio generation)

  • Resolution: 480p-1080p

Best for:

  • Reference videos that need a native soundtrack

  • Longer reference outputs (up to 15s)

  • When you want audio generated automatically

  • When providing reference audio to guide the sound design

Limitations:

  • Audio may need refinement in post

Tips for Best Results

  1. Use varied reference images - Different angles and expressions help

  2. Keep images consistent - Same person, similar quality

  3. Describe the scene, not the character - Reference images handle appearance

  4. Use appropriate model - Veo for audio, Kling for more references

  5. Match aspect ratios - Reference images should have similar ratios

  6. Be specific about action - Describe what the character does

  7. Include camera movement - Helps create dynamic videos

  8. Test with different image counts - Find what works best for your character

Common Workflows

Quick Talking Head

  1. Prepare 2-3 reference images

  2. Attach images

  3. Select Veo 3.1 Reference (auto-selected)

  4. Prompt: "Character speaks to camera: '[dialogue]', professional setting"

  5. Enable audio

  6. Generate (8 seconds)

High-Quality Character Video

  1. Prepare 5-7 reference images

  2. Attach images

  3. Select Kling O3 (auto-selected)

  4. Detailed prompt with scene, action, and style

  5. Generate (3-15 seconds)

Consistent Character Series

  1. Prepare 4-5 reference images

  2. Create multiple videos with different prompts

  3. Character appearance stays consistent

  4. Use for series or multiple shots

Reference with Audio (Seedance 2)

  1. Prepare up to 9 reference images

  2. Attach images

  3. Select Seedance 2 Reference

  4. Describe the scene, action, and desired audio

  5. Audio generates automatically

  6. Generate (4-15 seconds)

Troubleshooting

"Reference Mode not activating."

Solutions:

  • Ensure you have 1-9 images attached

  • Check that images are properly uploaded

  • Remove any attached videos (reference mode is for generation)

  • Verify you're in Generate Media mode

"Character doesn't look consistent."

Solutions:

  • Use more reference images (4-7 for best results)

  • Ensure all images show the same person

  • Use varied angles and expressions

  • Try Kling O3 for better consistency

  • Check image quality (clear, well-lit)

"Model doesn't support reference mode."

Solutions:

  • Switch to Veo 3.1 Reference, Kling O3, Wan 2.7 Reference, or Seedance 2 Reference

  • Check Supported Video Models

  • Most models don't support reference mode

  • Reference mode requires specific models

"Duration is wrong."

Solutions:

  • Veo 3.1 Reference: Fixed at 8 seconds (cannot change)

  • Kling O3: 3-15 seconds (variable, set in settings)

  • Wan 2.7 Reference: 2-10 seconds (variable, set in settings)

  • Seedance 2 Reference: 4-15 seconds (variable, set in settings)

  • Reference mode has limited duration control

  • Plan your content for the available duration


Next: Learn about Image Generation for creating still images.

Last updated