Reference Mode

Generate videos with consistent characters using 2-7 reference images. Perfect for maintaining character appearance across multiple shots, creating talking head videos, or ensuring visual consistency

How It Works

  1. Enable Generate Media - Toggle the Generate Media button in the composer

  2. Attach 2-7 images - Upload reference images of the same character/subject

  3. Reference Mode activates - System detects multiple images and switches to reference model

  4. Select reference model - Veo 3.1 Reference or Kling O1 (auto-selected)

  5. Describe the video - Write a prompt describing the scene and action

  6. Configure settings - Duration is fixed (8s for Veo, 5-10s for Kling)

  7. Generate - Click send to create video with consistent character

Automatic Activation

Reference Mode activates when:

  • 2-7 images attached - Multiple reference images detected

  • No videos attached - Reference mode is for generation, not editing

  • Model auto-switches - Compatible reference models appear

  • Element tags - Using @Element tags also triggers reference mode

Supported Models

Model
Reference Images
Duration
Audio
Best For

Veo 3.1 Reference

2-4 images

8s (fixed)

Character consistency with audio

Kling O1

2-7 images

5-10s

Maximum reference images, best quality

Reference Image Guidelines

Image Requirements

What makes good reference images:

  1. Same subject - All images should show the same person/character

  2. Clear face/features - Good visibility of key characteristics

  3. Varied angles - Different views help the model understand the subject

  4. Consistent lighting - Similar lighting conditions work best

  5. High quality - Clear, well-lit images produce better results

Image Count Strategy

2-3 images:

  • Minimum for reference mode

  • Works well for simple characters

  • Faster processing

4-7 images (Kling O1 only):

  • Maximum consistency

  • Better for complex characters

  • More reference points for the model

What to Include

Good Reference Images:

  • Front-facing portrait

  • Side profile

  • 3/4 angle

  • Different expressions

  • Various lighting conditions

  • Different outfits (same person)

Poor Reference Images:

  • Different people

  • Unclear/blurry faces

  • Extreme angles

  • Very different styles

  • Inconsistent subject

Writing Reference Mode Prompts

What to Describe

Focus on the scene and action, not the character (reference images handle that):

  1. Setting - Where the scene takes place

  2. Action - What the character is doing

  3. Camera movement - How it's shot

  4. Style - Visual aesthetic and mood

  5. Dialogue (Veo only) - What the character says

Good Reference Mode Prompts

With Action:

With Dialogue (Veo):

With Camera Movement:

Bad Reference Mode Prompts

Describes Character:

(Reference images already provide this)

Too Vague:

(Not specific enough)

Missing Context:

(Needs setting, style, camera work)

Use Cases

Talking Head Videos

Example:

  • 3-4 reference images of the speaker

  • Prompt: "The speaker addresses the camera with confidence, explaining key concepts, professional setting, natural lighting"

  • Model: Veo 3.1 Reference (for audio)

  • Result: Consistent talking head video with audio

Character Consistency

Example:

  • 4-5 reference images of a character

  • Prompt: "The character walks through a futuristic city, looking around with curiosity, cinematic style, neon lighting"

  • Model: Kling O1 (for quality)

  • Result: Character maintains appearance across shots

Product Demonstrations

Example:

  • 2-3 reference images of a presenter

  • Prompt: "The presenter demonstrates a product, showing features with enthusiasm, bright studio lighting, professional setting"

  • Model: Veo 3.1 Reference (for audio)

  • Result: Consistent presenter across multiple shots

Narrative Scenes

Example:

  • 5-7 reference images of main character

  • Prompt: "The character enters a mysterious room, camera follows their gaze, cinematic style, moody lighting, suspenseful atmosphere"

  • Model: Kling O1 (for quality and longer duration)

  • Result: Consistent character in narrative scene

Model-Specific Behavior

Veo 3.1 Reference

Characteristics:

  • Reference images: 2-4

  • Duration: 8 seconds (fixed)

  • Aspect ratio: Auto (from reference images)

  • Audio: Supported (dialogue generation)

  • Resolution: 720p-1080p

Best for:

  • Talking head videos

  • When you need audio/dialogue

  • Character consistency with speech

  • Professional presentations

Limitations:

  • Fixed 8-second duration

  • Maximum 4 reference images

  • Aspect ratio auto-determined

Kling O1

Characteristics:

  • Reference images: 2-7

  • Duration: 5-10 seconds (variable)

  • Aspect ratio: 16:9, 9:16, or 1:1

  • Audio: Not supported

  • Resolution: 720p-1080p

Best for:

  • Maximum character consistency

  • Longer videos (up to 10s)

  • More reference images (up to 7)

  • High-quality results

Limitations:

  • No audio generation

  • Fixed aspect ratios (no auto)

Tips for Best Results

  1. Use varied reference images - Different angles and expressions help

  2. Keep images consistent - Same person, similar quality

  3. Describe the scene, not the character - Reference images handle appearance

  4. Use appropriate model - Veo for audio, Kling for more references

  5. Match aspect ratios - Reference images should have similar ratios

  6. Be specific about action - Describe what the character does

  7. Include camera movement - Helps create dynamic videos

  8. Test with different image counts - Find what works best for your character

Common Workflows

Quick Talking Head

  1. Prepare 2-3 reference images

  2. Attach images

  3. Select Veo 3.1 Reference (auto-selected)

  4. Prompt: "Character speaks to camera: '[dialogue]', professional setting"

  5. Enable audio

  6. Generate (8 seconds)

High-Quality Character Video

  1. Prepare 5-7 reference images

  2. Attach images

  3. Select Kling O1 (auto-selected)

  4. Detailed prompt with scene, action, and style

  5. Generate (5-10 seconds)

Consistent Character Series

  1. Prepare 4-5 reference images

  2. Create multiple videos with different prompts

  3. Character appearance stays consistent

  4. Use for series or multiple shots

Troubleshooting

"Reference Mode not activating"

Solutions:

  • Ensure you have 2-7 images attached

  • Check that images are properly uploaded

  • Remove any attached videos (reference mode is for generation)

  • Verify you're in Generate Media mode

"Character doesn't look consistent"

Solutions:

  • Use more reference images (4-7 for best results)

  • Ensure all images show the same person

  • Use varied angles and expressions

  • Try Kling O1 for better consistency

  • Check image quality (clear, well-lit)

"Model doesn't support reference mode"

Solutions:

  • Switch to Veo 3.1 Reference or Kling O1

  • Check Supported Video Models

  • Most models don't support reference mode

  • Reference mode requires specific models

"Duration is wrong"

Solutions:

  • Veo 3.1 Reference: Fixed at 8 seconds (cannot change)

  • Kling O1: 5-10 seconds (variable, set in settings)

  • Reference mode has limited duration control

  • Plan your content for the available duration


Next: Learn about Image Generation for creating still images.

Last updated