Reference Mode
Generate videos with consistent characters using 2-7 reference images. Perfect for maintaining character appearance across multiple shots, creating talking head videos, or ensuring visual consistency
How It Works
Enable Generate Media - Toggle the Generate Media button in the composer
Attach 2-7 images - Upload reference images of the same character/subject
Reference Mode activates - System detects multiple images and switches to reference model
Select reference model - Veo 3.1 Reference or Kling O1 (auto-selected)
Describe the video - Write a prompt describing the scene and action
Configure settings - Duration is fixed (8s for Veo, 5-10s for Kling)
Generate - Click send to create video with consistent character
Automatic Activation
Reference Mode activates when:
2-7 images attached - Multiple reference images detected
No videos attached - Reference mode is for generation, not editing
Model auto-switches - Compatible reference models appear
Element tags - Using @Element tags also triggers reference mode
Supported Models
Veo 3.1 Reference
2-4 images
8s (fixed)
✅
Character consistency with audio
Kling O1
2-7 images
5-10s
❌
Maximum reference images, best quality
Reference Image Guidelines
Image Requirements
What makes good reference images:
Same subject - All images should show the same person/character
Clear face/features - Good visibility of key characteristics
Varied angles - Different views help the model understand the subject
Consistent lighting - Similar lighting conditions work best
High quality - Clear, well-lit images produce better results
Image Count Strategy
2-3 images:
Minimum for reference mode
Works well for simple characters
Faster processing
4-7 images (Kling O1 only):
Maximum consistency
Better for complex characters
More reference points for the model
What to Include
✅ Good Reference Images:
Front-facing portrait
Side profile
3/4 angle
Different expressions
Various lighting conditions
Different outfits (same person)
❌ Poor Reference Images:
Different people
Unclear/blurry faces
Extreme angles
Very different styles
Inconsistent subject
Writing Reference Mode Prompts
What to Describe
Focus on the scene and action, not the character (reference images handle that):
Setting - Where the scene takes place
Action - What the character is doing
Camera movement - How it's shot
Style - Visual aesthetic and mood
Dialogue (Veo only) - What the character says
Good Reference Mode Prompts
✅ With Action:
✅ With Dialogue (Veo):
✅ With Camera Movement:
Bad Reference Mode Prompts
❌ Describes Character:
(Reference images already provide this)
❌ Too Vague:
(Not specific enough)
❌ Missing Context:
(Needs setting, style, camera work)
Use Cases
Talking Head Videos
Example:
3-4 reference images of the speaker
Prompt: "The speaker addresses the camera with confidence, explaining key concepts, professional setting, natural lighting"
Model: Veo 3.1 Reference (for audio)
Result: Consistent talking head video with audio
Character Consistency
Example:
4-5 reference images of a character
Prompt: "The character walks through a futuristic city, looking around with curiosity, cinematic style, neon lighting"
Model: Kling O1 (for quality)
Result: Character maintains appearance across shots
Product Demonstrations
Example:
2-3 reference images of a presenter
Prompt: "The presenter demonstrates a product, showing features with enthusiasm, bright studio lighting, professional setting"
Model: Veo 3.1 Reference (for audio)
Result: Consistent presenter across multiple shots
Narrative Scenes
Example:
5-7 reference images of main character
Prompt: "The character enters a mysterious room, camera follows their gaze, cinematic style, moody lighting, suspenseful atmosphere"
Model: Kling O1 (for quality and longer duration)
Result: Consistent character in narrative scene
Model-Specific Behavior
Veo 3.1 Reference
Characteristics:
Reference images: 2-4
Duration: 8 seconds (fixed)
Aspect ratio: Auto (from reference images)
Audio: Supported (dialogue generation)
Resolution: 720p-1080p
Best for:
Talking head videos
When you need audio/dialogue
Character consistency with speech
Professional presentations
Limitations:
Fixed 8-second duration
Maximum 4 reference images
Aspect ratio auto-determined
Kling O1
Characteristics:
Reference images: 2-7
Duration: 5-10 seconds (variable)
Aspect ratio: 16:9, 9:16, or 1:1
Audio: Not supported
Resolution: 720p-1080p
Best for:
Maximum character consistency
Longer videos (up to 10s)
More reference images (up to 7)
High-quality results
Limitations:
No audio generation
Fixed aspect ratios (no auto)
Tips for Best Results
Use varied reference images - Different angles and expressions help
Keep images consistent - Same person, similar quality
Describe the scene, not the character - Reference images handle appearance
Use appropriate model - Veo for audio, Kling for more references
Match aspect ratios - Reference images should have similar ratios
Be specific about action - Describe what the character does
Include camera movement - Helps create dynamic videos
Test with different image counts - Find what works best for your character
Common Workflows
Quick Talking Head
Prepare 2-3 reference images
Attach images
Select Veo 3.1 Reference (auto-selected)
Prompt: "Character speaks to camera: '[dialogue]', professional setting"
Enable audio
Generate (8 seconds)
High-Quality Character Video
Prepare 5-7 reference images
Attach images
Select Kling O1 (auto-selected)
Detailed prompt with scene, action, and style
Generate (5-10 seconds)
Consistent Character Series
Prepare 4-5 reference images
Create multiple videos with different prompts
Character appearance stays consistent
Use for series or multiple shots
Troubleshooting
"Reference Mode not activating"
Solutions:
Ensure you have 2-7 images attached
Check that images are properly uploaded
Remove any attached videos (reference mode is for generation)
Verify you're in Generate Media mode
"Character doesn't look consistent"
Solutions:
Use more reference images (4-7 for best results)
Ensure all images show the same person
Use varied angles and expressions
Try Kling O1 for better consistency
Check image quality (clear, well-lit)
"Model doesn't support reference mode"
Solutions:
Switch to Veo 3.1 Reference or Kling O1
Check Supported Video Models
Most models don't support reference mode
Reference mode requires specific models
"Duration is wrong"
Solutions:
Veo 3.1 Reference: Fixed at 8 seconds (cannot change)
Kling O1: 5-10 seconds (variable, set in settings)
Reference mode has limited duration control
Plan your content for the available duration
Next: Learn about Image Generation for creating still images.
Last updated