# Reference Mode

{% hint style="info" %}
**Attaching exactly 2 images and seeing Transition Mode?** By default, 2 uploaded images trigger **Transition Mode** (start frame → end frame morph). To use them as **character reference photos** instead, select a dedicated reference model from the model picker: **Kling O3 Reference**, **Seedance 2 Reference**, or **Wan 2.7 Reference**. These models treat multiple images as reference material, not as start/end frames. See [Transition Mode](/features/video-generation/transition-mode.md) if you actually want a morph.
{% endhint %}

### How It Works

1. **Enable Generate Media** - Toggle the Generate Media button in the composer
2. **Attach reference images** - Upload 1-9 images of the same character/subject (count depends on model)
3. **Reference Mode activates** - System detects multiple images and switches to reference model
4. **Select reference model** - Veo 3.1 Reference, Kling O3, Wan 2.7 Reference, or Seedance 2 Reference (auto-selected)
5. **Describe the video** - Write a prompt describing the scene and action
6. **Configure settings** - Set duration (varies by model: 8s for Veo, 3-15s for Kling O3, 2-10s for Wan, 4-15s for Seedance)
7. **Generate** - Click send to create a video with consistent character

### Automatic Activation

Reference Mode activates when:

* **1-9 images attached** - Reference images detected (minimum count varies by model)
* **No videos attached** - Reference mode is for generation, not editing
* **Model auto-switches** - Compatible reference models appear
* **Element tags** - Using @Element tags also triggers reference mode

### Supported Models

<table><thead><tr><th width="124">Model</th><th width="177">Reference Images</th><th width="113">Duration</th><th width="93">Audio</th><th>Best For</th></tr></thead><tbody><tr><td><strong>Veo 3.1 Reference</strong></td><td>2-4 images</td><td>8s (fixed)</td><td>✅</td><td>Character consistency with audio</td></tr><tr><td><strong>Kling O3</strong></td><td>2-7 images</td><td>3-15s</td><td>✅</td><td>Maximum reference images, audio support, best quality</td></tr><tr><td><strong>Wan 2.7 Reference</strong></td><td>1-9 images</td><td>2-10s</td><td>❌</td><td>Most reference images, high resolution</td></tr><tr><td><strong>Seedance 2 Reference</strong></td><td>Up to 9 images</td><td>4-15s</td><td>✅</td><td>Reference with native audio</td></tr></tbody></table>

### Reference Image Guidelines

#### Image Requirements

**What makes good reference images:**

1. **Same subject** - All images should show the same person/character
2. **Clear face/features** - Good visibility of key characteristics
3. **Varied angles** - Different views help the model understand the subject
4. **Consistent lighting** - Similar lighting conditions work best
5. **High quality** - Clear, well-lit images produce better results

#### Image Count Strategy

**1-3 images:**

* Minimum for reference mode (1 image minimum with Wan 2.7 Reference)
* Works well for simple characters
* Faster processing

**4-7 images:**

* Maximum consistency with Kling O3
* Better for complex characters
* More reference points for the model

**Up to 9 images (Wan 2.7 Reference / Seedance 2 Reference):**

* Highest number of reference inputs
* Best for complex subjects requiring many angles
* Wan 2.7 Reference supports 1-9 images; Seedance 2 Reference supports up to 9

#### What to Include

✅ **Good Reference Images:**

* Front-facing portrait
* Side profile
* 3/4 angle
* Different expressions
* Various lighting conditions
* Different outfits (same person)

❌ **Poor Reference Images:**

* Different people
* Unclear/blurry faces
* Extreme angles
* Very different styles
* Inconsistent subject

### Writing Reference Mode Prompts

#### What to Describe

Focus on **the scene and action**, not the character (reference images handle that):

1. **Setting** - Where the scene takes place
2. **Action** - What the character is doing
3. **Camera movement** - How it's shot
4. **Style** - Visual aesthetic and mood
5. **Dialogue** (Veo only) - What the character says

#### Good Reference Mode Prompts

✅ **With Action:**

```
The character walks confidently through a modern office, greeting 
colleagues with a warm smile, a professional setting, and natural lighting
```

✅ **With Dialogue (Veo):**

```
The character speaks directly to the camera: "Welcome to our presentation. 
Today we'll explore innovative solutions." Professional setting, 
confident delivery
```

✅ **With Camera Movement:**

```
Slow push-in on the character as they explain a concept, cinematic 
style, soft lighting, engaging and professional atmosphere
```

#### Bad Reference Mode Prompts

❌ **Describes Character:**

```
A person with brown hair and blue eyes
```

*(Reference images already provide this)*

❌ **Too Vague:**

```
The character does something
```

*(Not specific enough)*

❌ **Missing Context:**

```
Person talking
```

*(Needs setting, style, camera work)*

### Use Cases

#### Talking Head Videos

**Example:**

* 3-4 reference images of the speaker
* Prompt: "The speaker addresses the camera with confidence, explaining key concepts in a professional setting, and with natural lighting."
* Model: Veo 3.1 Reference (for audio)
* Result: Consistent talking head video with audio

#### Character Consistency

**Example:**

* 4-5 reference images of a character
* Prompt: "The character walks through a futuristic city, looking around with curiosity, cinematic style, neon lighting."
* Model: Kling O3 (for quality)
* Result: Character maintains appearance across shots

#### Product Demonstrations

**Example:**

* 2-3 reference images of a presenter
* Prompt: "The presenter demonstrates a product, showing features with enthusiasm, bright studio lighting, professional setting."
* Model: Veo 3.1 Reference (for audio)
* Result: Consistent presenter across multiple shots

#### Narrative Scenes

**Example:**

* 5-7 reference images of the main character
* Prompt: "The character enters a mysterious room, the camera follows their gaze, cinematic style, moody lighting, suspenseful atmosphere."
* Model: Kling O3 (for quality and longer duration)
* Result: Consistent character in narrative scene

### Model-Specific Behavior

#### Veo 3.1 Reference

**Characteristics:**

* Reference images: 2-4
* Duration: 8 seconds (fixed)
* Aspect ratio: 16:9 or 9:16 only (auto-determined from reference images)
* Audio: Supported (dialogue generation)
* Resolution: 720p-1080p

**Best for:**

* Talking head videos
* When you need audio/dialogue
* Character consistency with speech
* Professional presentations

**Limitations:**

* Fixed 8-second duration
* Maximum 4 reference images
* Aspect ratio limited to 16:9 or 9:16 (auto-determined from images)

#### Kling O3

**Characteristics:**

* Reference images: 1-7
* Duration: 3-15 seconds (variable, Pro/Standard toggle)
* Aspect ratio: 16:9, 9:16, or 1:1
* Audio: Supported (native audio generation)
* Resolution: 720p

**Best for:**

* Maximum character consistency with audio
* Flexible duration (3-15 seconds)
* More reference images (up to 7)
* High-quality results with sound

**Limitations:**

* Fixed aspect ratios (no auto)

#### Wan 2.7 Reference

**Characteristics:**

* Reference images: 1-9
* Duration: 2-10 seconds (variable)
* Aspect ratio: 16:9, 9:16, 1:1, 4:3, or 3:4
* Audio: Not supported
* Resolution: 720p-1080p

**Best for:**

* Maximum number of reference images (up to 9)
* High resolution output
* When you need the most reference angles
* Flexible aspect ratios

**Limitations:**

* No audio generation
* Shorter maximum duration than other models

#### Seedance 2 Reference

**Characteristics:**

* Reference images: Up to 9
* Audio reference files: Up to 3 (attach audio to influence generated sound)
* Duration: 4-15 seconds (variable)
* Aspect ratio: Auto-detected from input images
* Audio: Supported (native audio generation)
* Resolution: 480p-1080p

**Best for:**

* Reference videos that need a native soundtrack
* Longer reference outputs (up to 15s)
* When you want audio generated automatically
* When providing reference audio to guide the sound design

**Limitations:**

* Audio may need refinement in post

### Tips for Best Results

1. **Use varied reference images** - Different angles and expressions help
2. **Keep images consistent** - Same person, similar quality
3. **Describe the scene, not the character** - Reference images handle appearance
4. **Use appropriate model** - Veo for audio, Kling for more references
5. **Match aspect ratios** - Reference images should have similar ratios
6. **Be specific about action** - Describe what the character does
7. **Include camera movement** - Helps create dynamic videos
8. **Test with different image counts** - Find what works best for your character

### Common Workflows

#### Quick Talking Head

1. Prepare 2-3 reference images
2. Attach images
3. Select Veo 3.1 Reference (auto-selected)
4. Prompt: "Character speaks to camera: '\[dialogue]', professional setting"
5. Enable audio
6. Generate (8 seconds)

#### High-Quality Character Video

1. Prepare 5-7 reference images
2. Attach images
3. Select Kling O3 (auto-selected)
4. Detailed prompt with scene, action, and style
5. Generate (3-15 seconds)

#### Consistent Character Series

1. Prepare 4-5 reference images
2. Create multiple videos with different prompts
3. Character appearance stays consistent
4. Use for series or multiple shots

#### Reference with Audio (Seedance 2)

1. Prepare up to 9 reference images
2. Attach images
3. Select Seedance 2 Reference
4. Describe the scene, action, and desired audio
5. Audio generates automatically
6. Generate (4-15 seconds)

### Troubleshooting

#### "Reference Mode not activating."

**Solutions:**

* Ensure you have 1-9 images attached
* Check that images are properly uploaded
* Remove any attached videos (reference mode is for generation)
* Verify you're in Generate Media mode

#### "Character doesn't look consistent."

**Solutions:**

* Use more reference images (4-7 for best results)
* Ensure all images show the same person
* Use varied angles and expressions
* Try Kling O3 for better consistency
* Check image quality (clear, well-lit)

#### "Model doesn't support reference mode."

**Solutions:**

* Switch to Veo 3.1 Reference, Kling O3, Wan 2.7 Reference, or Seedance 2 Reference
* Check Supported Video Models
* Most models don't support reference mode
* Reference mode requires specific models

#### "Duration is wrong."

**Solutions:**

* Veo 3.1 Reference: Fixed at 8 seconds (cannot change)
* Kling O3: 3-15 seconds (variable, set in settings)
* Wan 2.7 Reference: 2-10 seconds (variable, set in settings)
* Seedance 2 Reference: 4-15 seconds (variable, set in settings)
* Reference mode has limited duration control
* Plan your content for the available duration

***

**Next:** Learn about[ Image Generation ](/features/image-generation.md)for creating still images.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.chatvideopro.com/features/video-generation/reference-mode.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
