Reference Mode
Generate videos with consistent characters using 1-9 reference images. Perfect for maintaining character appearance across multiple shots, creating talking head videos, or ensuring visual consistency
Attaching exactly 2 images and seeing Transition Mode? By default, 2 uploaded images trigger Transition Mode (start frame → end frame morph). To use them as character reference photos instead, select a dedicated reference model from the model picker: Kling O3 Reference, Seedance 2 Reference, or Wan 2.7 Reference. These models treat multiple images as reference material, not as start/end frames. See Transition Mode if you actually want a morph.
How It Works
Enable Generate Media - Toggle the Generate Media button in the composer
Attach reference images - Upload 1-9 images of the same character/subject (count depends on model)
Reference Mode activates - System detects multiple images and switches to reference model
Select reference model - Veo 3.1 Reference, Kling O3, Wan 2.7 Reference, or Seedance 2 Reference (auto-selected)
Describe the video - Write a prompt describing the scene and action
Configure settings - Set duration (varies by model: 8s for Veo, 3-15s for Kling O3, 2-10s for Wan, 4-15s for Seedance)
Generate - Click send to create a video with consistent character
Automatic Activation
Reference Mode activates when:
1-9 images attached - Reference images detected (minimum count varies by model)
No videos attached - Reference mode is for generation, not editing
Model auto-switches - Compatible reference models appear
Element tags - Using @Element tags also triggers reference mode
Supported Models
Veo 3.1 Reference
2-4 images
8s (fixed)
✅
Character consistency with audio
Kling O3
2-7 images
3-15s
✅
Maximum reference images, audio support, best quality
Wan 2.7 Reference
1-9 images
2-10s
❌
Most reference images, high resolution
Seedance 2 Reference
Up to 9 images
4-15s
✅
Reference with native audio
Reference Image Guidelines
Image Requirements
What makes good reference images:
Same subject - All images should show the same person/character
Clear face/features - Good visibility of key characteristics
Varied angles - Different views help the model understand the subject
Consistent lighting - Similar lighting conditions work best
High quality - Clear, well-lit images produce better results
Image Count Strategy
1-3 images:
Minimum for reference mode (1 image minimum with Wan 2.7 Reference)
Works well for simple characters
Faster processing
4-7 images:
Maximum consistency with Kling O3
Better for complex characters
More reference points for the model
Up to 9 images (Wan 2.7 Reference / Seedance 2 Reference):
Highest number of reference inputs
Best for complex subjects requiring many angles
Wan 2.7 Reference supports 1-9 images; Seedance 2 Reference supports up to 9
What to Include
✅ Good Reference Images:
Front-facing portrait
Side profile
3/4 angle
Different expressions
Various lighting conditions
Different outfits (same person)
❌ Poor Reference Images:
Different people
Unclear/blurry faces
Extreme angles
Very different styles
Inconsistent subject
Writing Reference Mode Prompts
What to Describe
Focus on the scene and action, not the character (reference images handle that):
Setting - Where the scene takes place
Action - What the character is doing
Camera movement - How it's shot
Style - Visual aesthetic and mood
Dialogue (Veo only) - What the character says
Good Reference Mode Prompts
✅ With Action:
✅ With Dialogue (Veo):
✅ With Camera Movement:
Bad Reference Mode Prompts
❌ Describes Character:
(Reference images already provide this)
❌ Too Vague:
(Not specific enough)
❌ Missing Context:
(Needs setting, style, camera work)
Use Cases
Talking Head Videos
Example:
3-4 reference images of the speaker
Prompt: "The speaker addresses the camera with confidence, explaining key concepts in a professional setting, and with natural lighting."
Model: Veo 3.1 Reference (for audio)
Result: Consistent talking head video with audio
Character Consistency
Example:
4-5 reference images of a character
Prompt: "The character walks through a futuristic city, looking around with curiosity, cinematic style, neon lighting."
Model: Kling O3 (for quality)
Result: Character maintains appearance across shots
Product Demonstrations
Example:
2-3 reference images of a presenter
Prompt: "The presenter demonstrates a product, showing features with enthusiasm, bright studio lighting, professional setting."
Model: Veo 3.1 Reference (for audio)
Result: Consistent presenter across multiple shots
Narrative Scenes
Example:
5-7 reference images of the main character
Prompt: "The character enters a mysterious room, the camera follows their gaze, cinematic style, moody lighting, suspenseful atmosphere."
Model: Kling O3 (for quality and longer duration)
Result: Consistent character in narrative scene
Model-Specific Behavior
Veo 3.1 Reference
Characteristics:
Reference images: 2-4
Duration: 8 seconds (fixed)
Aspect ratio: 16:9 or 9:16 only (auto-determined from reference images)
Audio: Supported (dialogue generation)
Resolution: 720p-1080p
Best for:
Talking head videos
When you need audio/dialogue
Character consistency with speech
Professional presentations
Limitations:
Fixed 8-second duration
Maximum 4 reference images
Aspect ratio limited to 16:9 or 9:16 (auto-determined from images)
Kling O3
Characteristics:
Reference images: 1-7
Duration: 3-15 seconds (variable, Pro/Standard toggle)
Aspect ratio: 16:9, 9:16, or 1:1
Audio: Supported (native audio generation)
Resolution: 720p
Best for:
Maximum character consistency with audio
Flexible duration (3-15 seconds)
More reference images (up to 7)
High-quality results with sound
Limitations:
Fixed aspect ratios (no auto)
Wan 2.7 Reference
Characteristics:
Reference images: 1-9
Duration: 2-10 seconds (variable)
Aspect ratio: 16:9, 9:16, 1:1, 4:3, or 3:4
Audio: Not supported
Resolution: 720p-1080p
Best for:
Maximum number of reference images (up to 9)
High resolution output
When you need the most reference angles
Flexible aspect ratios
Limitations:
No audio generation
Shorter maximum duration than other models
Seedance 2 Reference
Characteristics:
Reference images: Up to 9
Audio reference files: Up to 3 (attach audio to influence generated sound)
Duration: 4-15 seconds (variable)
Aspect ratio: Auto-detected from input images
Audio: Supported (native audio generation)
Resolution: 480p-1080p
Best for:
Reference videos that need a native soundtrack
Longer reference outputs (up to 15s)
When you want audio generated automatically
When providing reference audio to guide the sound design
Limitations:
Audio may need refinement in post
Tips for Best Results
Use varied reference images - Different angles and expressions help
Keep images consistent - Same person, similar quality
Describe the scene, not the character - Reference images handle appearance
Use appropriate model - Veo for audio, Kling for more references
Match aspect ratios - Reference images should have similar ratios
Be specific about action - Describe what the character does
Include camera movement - Helps create dynamic videos
Test with different image counts - Find what works best for your character
Common Workflows
Quick Talking Head
Prepare 2-3 reference images
Attach images
Select Veo 3.1 Reference (auto-selected)
Prompt: "Character speaks to camera: '[dialogue]', professional setting"
Enable audio
Generate (8 seconds)
High-Quality Character Video
Prepare 5-7 reference images
Attach images
Select Kling O3 (auto-selected)
Detailed prompt with scene, action, and style
Generate (3-15 seconds)
Consistent Character Series
Prepare 4-5 reference images
Create multiple videos with different prompts
Character appearance stays consistent
Use for series or multiple shots
Reference with Audio (Seedance 2)
Prepare up to 9 reference images
Attach images
Select Seedance 2 Reference
Describe the scene, action, and desired audio
Audio generates automatically
Generate (4-15 seconds)
Troubleshooting
"Reference Mode not activating."
Solutions:
Ensure you have 1-9 images attached
Check that images are properly uploaded
Remove any attached videos (reference mode is for generation)
Verify you're in Generate Media mode
"Character doesn't look consistent."
Solutions:
Use more reference images (4-7 for best results)
Ensure all images show the same person
Use varied angles and expressions
Try Kling O3 for better consistency
Check image quality (clear, well-lit)
"Model doesn't support reference mode."
Solutions:
Switch to Veo 3.1 Reference, Kling O3, Wan 2.7 Reference, or Seedance 2 Reference
Check Supported Video Models
Most models don't support reference mode
Reference mode requires specific models
"Duration is wrong."
Solutions:
Veo 3.1 Reference: Fixed at 8 seconds (cannot change)
Kling O3: 3-15 seconds (variable, set in settings)
Wan 2.7 Reference: 2-10 seconds (variable, set in settings)
Seedance 2 Reference: 4-15 seconds (variable, set in settings)
Reference mode has limited duration control
Plan your content for the available duration
Next: Learn about Image Generation for creating still images.
Last updated