# Story Scribe

<figure><img src="/files/3tdBK0vkNZeccnwVQ9hO" alt=""><figcaption></figcaption></figure>

If you only ever transcribe English with Premiere's built-in panel, Story Scribe gives you three big upgrades: **more languages**, **better word-boundary precision** (so Story Cutter's cuts land cleaner), and a **private offline option** that never touches the cloud.

***

### What Story Scribe is designed to do

Story Scribe exists for one job: produce the **most accurate possible transcript** of your dialogue so that downstream Story Cutter edits land on exact word boundaries instead of guessing in the middle of a sentence.

* **Word-level timestamps.** Every word in the transcript carries its own start/end time. When Story Cutter trims a soundbite, the cut snaps to the boundary between two words — not somewhere in the middle.
* **Multilingual.** 25+ languages out of the box (English, German, Spanish, French, Italian, Portuguese, Dutch, Polish, Swedish, Norwegian, Danish, Finnish, Greek, Czech, Ukrainian, Russian, Turkish, Arabic, Hebrew, Hindi, Vietnamese, Indonesian, Korean, Japanese, Mandarin, Cantonese, Thai). Premiere's native transcription panel is English-heavy — Story Scribe is built for the rest.
* **Speaker labels (diarization).** When the audio has multiple speakers, Story Scribe assigns each line to a "Speaker 1 / Speaker 2 / …" label that you can rename inline before handing off.
* **Three engines.** Pick the right tradeoff between speed, accuracy, cost, and privacy on a per-job basis — no global setting to change.
* **Story Cutter handoff.** One click ships the finished transcript into a new chat with Story Cutter, with the transcript already attached and the sequence linked.

{% hint style="info" %}
**Story Scribe is the front end for Story Cutter.** If you already have a transcript that Premiere produced (or you got it from somewhere else), Story Scribe can also **convert** it into the format Story Cutter needs — see Source 4: Upload existing transcript below.
{% endhint %}

***

### Getting Started

#### Step 1: Open Story Scribe

From the Studio launchpad, click the **Story Scribe** card (audio department, violet accent). You'll land on the **Setup** screen with four controls:

1. **Attach…** — pick a source (timeline, in/out range, audio file, or existing transcript)
2. **Language** — tell the engine what language to expect, or let it auto-detect
3. **Engine** — pick between FAST, STUDIO, or PRIVATE
4. **Transcribe / Convert & Import** — kick it off

The **Transcribe** button only enables when you've picked a source AND your chosen engine supports the language you picked. (More on language compatibility in the Engines section.)

***

<figure><img src="/files/w60xK0jeX99ysjsGPS8r" alt=""><figcaption></figcaption></figure>

### Source options

You have four ways to feed audio into Story Scribe. Each one is tuned for a different real-world workflow.

#### Source 1: Active timeline

> **Best for:** The full pass — you want a transcript of everything in your current sequence.

* Make sure the sequence you want to transcribe is the **active** one in Premiere (i.e. the one with focus in the Program monitor).
* Click **Attach…** → **Active timeline**.
* Story Scribe will export the audio of the entire active sequence to a temp WAV, then send it to your chosen engine.

This is the default workflow. Use this for podcasts, interviews, vlogs, and any project where you want the cutter to have the whole conversation to choose from.

#### Source 2: In/Out range

> **Best for:** Long sequences where you only want to work with one section.

* In Premiere, scrub to the start of the section you want and press **`I`** to set the in-mark.
* Scrub to the end and press **`O`** to set the out-mark.
* Come back to the Story Scribe panel — within \~1 second the **In/Out range** option will light up in the **Attach…** menu, showing the approximate duration (e.g. `≈4m 30s`).
* Click **Attach…** → **In/Out range**.

{% hint style="success" %}
Story Scribe **respects the in-mark offset**. If your in-mark is at `00:05:00`, the transcript's first word will be timestamped at `00:05:00` (not `00:00:00`) — so when Story Cutter places cuts, they land in the right spot on your full timeline, not a phantom 5-minutes-earlier position.
{% endhint %}

If the **In/Out range** option is greyed out, it means Premiere doesn't currently report both marks set on the active sequence. Re-set them with `I` and `O` — the panel polls once a second and re-checks immediately when it regains focus.

#### Source 3: Upload audio file

> **Best for:** Raw camera audio, podcast stems, voice memos, anything that lives as a file on disk rather than on a Premiere timeline.

* Click **Attach…** → **Upload audio file**.
* Pick an **MP3**, **WAV**, **M4A**, or **MP4** from disk.
* Story Scribe sends the file directly to the engine — no AME export needed.

Useful when the speakers you care about live in raw camera files you haven't even imported into Premiere yet, or when you want a transcript of a podcast you're about to cut up.

#### Source 4: Upload existing transcript

> **Best for:** You already have a transcript and just want to use it in Story Cutter.

* Click **Attach…** → **Upload existing transcript**.
* Pick an **SRT**, **VTT**, or **JSON** file from disk.
* Story Scribe normalizes it into the canonical format Story Cutter expects, then drops you straight into the **Review** screen.
* The button label changes from **Transcribe** to **Convert & Import** to make this clear — no engine call is made, no cost is incurred, and the engine selector is ignored.

Supports:

* **SRT** — standard subtitle/caption format
* **VTT** — WebVTT format
* **JSON** — Premiere Pro's exported transcript JSON

{% hint style="info" %}
**Why does my Premiere transcript work but lose precision?** Premiere exports sentence-level timing only. Story Scribe will accept it and pass it through, but Story Cutter will warn that cuts will snap to sentence boundaries instead of word boundaries — they may feel slightly loose. For tight cuts, re-transcribe with Scribe v2 to get word timing back. (See Precision badge below.)
{% endhint %}

***

### Choosing an engine

Story Scribe ships with three engines. The cards are colour-coded by the trade-off they represent.

<table><thead><tr><th width="157">Engine</th><th width="168">Best for</th><th width="124">Speed</th><th width="158">Cost (est.)</th><th>Privacy</th></tr></thead><tbody><tr><td><strong>Fal Whisper v3</strong></td><td>Single speaker, clean audio</td><td>Very fast</td><td><strong>~$0.10 / hr</strong></td><td>Cloud (Fal.ai)</td></tr><tr><td><strong>Fal Scribe v2</strong> ⭐</td><td>Multiple speakers, accents, noisy audio</td><td>Fast</td><td><strong>~$0.48 / hr</strong></td><td>Cloud (Fal.ai)</td></tr><tr><td><strong>Local Whisper</strong></td><td>Sensitive content, offline work</td><td><strong>Hardware-dependent</strong></td><td><strong>Free</strong></td><td>100% on your machine</td></tr></tbody></table>

#### Fal Whisper v3 — FAST

The cheapest cloud option. Sends your audio to Fal.ai's hosted Whisper v3 endpoint. Excellent for clean single-speaker audio (one person at the mic, no overlapping voices, low background noise).

**Use it when:** you've got a podcast monologue, a single-presenter tutorial, a voiceover record, or any "one clean voice" job and you want a fast, cheap turnaround.

**Don't use it when:** your audio has multiple speakers talking over each other, heavy accents, or background music — accuracy can dip noticeably compared to Scribe v2.

**Language note:** Fal Whisper v3 doesn't currently support Cantonese (`yue`). If you pick Cantonese in the language dropdown, this card is auto-locked and the panel routes you to Scribe v2 or Local Whisper.

#### Fal Scribe v2 — STUDIO (recommended)

The accuracy default. This is the engine most people should pick for most jobs — it's what we recommend for serious production work.

**Use it when:** you have multiple speakers, accents, code-switching, music in the background, or you just want the most accurate result available. It's also the engine the Local Whisper hardware-warning modal will offer to switch you to for long clips.

**Cost example:** a 60-minute interview runs about **$0.48** total. A 4-hour podcast is about **$1.92**.

{% hint style="success" %}
**Default to Scribe v2 unless you have a specific reason not to.** The accuracy gap over Fal Whisper v3 is real for multi-speaker and noisy audio, and the cost difference is rarely meaningful for occasional jobs.
{% endhint %}

#### Local Whisper — PRIVATE

Runs `whisper.cpp` on your own machine. Audio never leaves your computer — there's no cloud upload, no API key, no usage tracking. Zero cost per minute.

{% hint style="warning" %}
**Windows only for now.** Local Whisper is available on Windows 10/11 (x64). macOS support isn't shipped yet — if you're on a Mac, the card will be locked with a "Windows only" badge and the panel will steer you to one of the cloud engines.
{% endhint %}

{% hint style="warning" %}
**Speed depends entirely on your hardware.** Local Whisper runs on your CPU. A 60-minute clip can take anywhere from **5 to 30 minutes** depending on your processor — a modern multi-core desktop will be near the fast end, an older laptop or a small Mac mini-class machine will be near the slow end. **Results may vary.** The panel won't freeze while it runs — you can keep editing in Premiere — but if you're on a deadline and you don't already know how fast your machine is, the cloud engines are a safer bet.
{% endhint %}

**First-run setup.** The first time you pick Local Whisper, Story Scribe shows an inline tile that downloads the `whisper.cpp` binary + the language model. The download is a one-time hit (about a few hundred MB depending on the model size); after that it runs offline forever. Downloads can be resumed if interrupted.

**Long-clip pre-flight warning.** If you pick Local Whisper for a clip 10 minutes or longer, Story Scribe shows a modal that estimates the time, shows you the equivalent cost on Scribe v2 (usually a few cents), and offers a one-click switch. You can dismiss the modal with **Don't show again** once you know your machine's profile.

***

### Picking a language

The language dropdown above the engine selector controls how the engine interprets the audio.

* **Auto-detect** — let the engine figure it out from the first few seconds. Good for unknown source material; not recommended if you already know the language because manual selection is always more reliable.
* **Manual selection** — pick from the 25+ language list. Always preferred when you know the language up front.

When you change the language, the engine cards reactively re-evaluate:

* If an engine doesn't support the language you picked, its card locks with a **🔒 Not available for \[Language]** badge.
* The most common case: switching to **Cantonese** locks **Fal Whisper v3** — use **Scribe v2** or **Local Whisper** instead.

***

<figure><img src="/files/HLSZLT6VwNFbRmfUb4DS" alt=""><figcaption></figcaption></figure>

### The Transcribing screen

Once you click **Transcribe**, you'll see the live transcription HUD with:

* **A halo waveform animation** breathing in violet
* **A progress percent** with a live sub-stage label ("Encoding in Media Encoder…", "Transcribing chunk 2 of 5…", etc.)
* **An elapsed counter** showing how long the run has been going

You can cancel at any time with the **Cancel** button. If anything fails, the panel shows the error and offers a retry.

{% hint style="info" %}
**Long clips are chunked automatically.** For audio over 20 minutes, cloud engines split the file into 20-minute chunks, transcribe each in parallel, and stitch the results back together with correct timestamps. You'll see the chunk progress in the live HUD ("chunk 3 of 7…").
{% endhint %}

***

<figure><img src="/files/2wWIZCwdyh4ohHVF5iiG" alt=""><figcaption></figcaption></figure>

### The Review screen

When transcription finishes, you land on the **Review** screen with the full transcript displayed line-by-line.

#### The precision badge

In the header you'll see one of two pills:

* **Word-precise** (green) — every word has its own timestamp. Story Cutter has full precision to land cuts on exact word boundaries.
* **Sentence-precise** (amber) — only sentence-level timing is available. Cuts will snap to the start/end of each sentence — usually fine, but cuts may feel slightly loose if the AI wants to trim mid-sentence.

The label also tells you whether speakers were detected (`· Speakers detected`) or whether it was treated as a single speaker (`· Single speaker`).

#### Click-to-jump

Click any **word** or **segment** in the transcript and Premiere's playhead snaps to that exact timestamp and **starts playing**. The same behaviour Story Cutter uses for timestamp links — gives you instant audible confirmation that the click landed where you expected. A small **"Snapping…"** pill briefly appears while the jump is in flight.

#### Renaming speakers

If diarization gave you `Speaker 1`, `Speaker 2`, etc., click any speaker label inline to rename it. The new name propagates to every line that speaker said, and it persists when you save the JSON or hand off to Story Cutter.

#### The three actions

* **Save as JSON** — write the transcript to disk in the canonical Premiere transcript JSON format. Use this when you want a sidecar file you can re-use later.
* **Re-run** — go back to Setup with your source still attached. Useful if Scribe v2 gave you a sentence-precise result on a transcript upload and you want to re-transcribe from the original audio for word precision.
* **Use in Story Cutter** ✨ — the primary action. Saves the transcript, opens a new chat with the **Story Cutter Assistant** conversation starter, and attaches the transcript automatically. You're one message away from a rough cut.

{% hint style="warning" %}
**Sentence-precise transcripts trigger a confirmation modal.** When you click **Use in Story Cutter** on a sentence-precise transcript, Story Scribe asks you to confirm — and offers a one-click re-run with Scribe v2 to upgrade to word precision. You can dismiss this if you know what you're doing.
{% endhint %}

***

### Tips for getting the best results

#### Audio quality matters more than engine choice

The single biggest factor in transcript accuracy is the **audio that goes in**. A clean lapel mic on Fal Whisper v3 will beat a noisy room mic on Scribe v2 every time. Before reaching for a better engine, ask:

* Is the speaker close to the mic?
* Is the background noise low?
* Are there overlapping speakers? (Even diarization struggles with cross-talk.)
* Is there music underneath the dialogue?

If the answer to any of those is concerning, pick **Scribe v2** — it's more forgiving.

#### Set the language manually when you know it

Auto-detect is convenient but can occasionally pick the wrong language on the first few seconds (especially if the speaker starts with a non-native word, music plays first, or there's a long silence). When you know the language, **just pick it from the dropdown** — accuracy is always at least as good and usually better.

#### Use in/out marks to scope long sequences

If you only need to transcribe a 5-minute section of a 90-minute timeline, **set in/out marks** instead of transcribing the whole sequence. You'll get the result faster, pay less (cloud engines) or wait less (Local Whisper), and avoid clutter in the review screen.

#### For sensitive content, go local

Interview footage of a source who needs anonymity, medical content, legal depositions, anything under NDA — Local Whisper sends nothing to a cloud. The audio is read from disk, processed by `whisper.cpp` on your CPU, and the transcript is written back to disk. No network call is made.

#### Default to Scribe v2, drop down to Fal Whisper v3 only when it's truly a single clean speaker

Scribe v2's accuracy advantage is biggest on:

* Multiple speakers (interviews, panels, conversations)
* Heavy accents
* Background music or noise
* Code-switching between languages

If your material has none of those things — a solo voiceover record, a single-presenter tutorial recorded in a treated room — Fal Whisper v3 will save you about 5× on cost with negligible accuracy loss.

#### Don't re-transcribe what you can convert

If you've already got an `SRT`, `VTT`, or Premiere transcript JSON sitting on disk, **use Upload existing transcript** instead of re-transcribing the audio. It's free, it's instant, and the only downside is sentence-level precision (which you can upgrade later with Re-run if Story Cutter needs the word boundaries).

#### Rename speakers before handoff

Story Cutter uses the speaker labels you see in the Review screen. Renaming `Speaker 1` → `Sarah` before clicking **Use in Story Cutter** means Story Cutter's output will reference `Sarah` everywhere — much easier to scan than `Speaker 1`. Two seconds well spent.

#### Don't move clips on the source timeline after transcribing

Just like Story Cutter, Story Scribe locks timestamps to clip positions. If you transcribe a sequence, then re-arrange clips, then send the transcript to Story Cutter — the cuts will land on the **old** positions. Either transcribe last, or re-transcribe after any timeline reorganisation.

***

### Troubleshooting

**"In/Out range" stays greyed out** Premiere doesn't currently report both an in-mark AND an out-mark on the active sequence. Re-press `I` and `O`, then click out and back into the Premiere panel — the Story Scribe poller will pick it up within 1 second.

**Local Whisper card is locked with "Windows only"** You're on macOS. Use Scribe v2 (recommended) or Fal Whisper v3 — both are cloud-based and run on any platform.

**Engine card is locked with "Language not supported"** The language you picked isn't in that engine's supported set. The most common case is Cantonese on Fal Whisper v3 — switch to Scribe v2.

**"Sentence-precise" badge after a re-run from audio** This usually means the audio path actually returned word timestamps, but the diarization step couldn't subdivide cleanly. Try re-running with Scribe v2 if you weren't already on it.

**Transcript looks correct but Story Cutter cuts feel loose** Check the precision badge — if it's amber (sentence-precise), Story Cutter doesn't have word-level boundaries to snap to. Re-run with Scribe v2 from the audio source.

**Long clip on Local Whisper feels frozen** It's not — `whisper.cpp` is just slow on CPU. The panel won't show per-second progress like the cloud engines because `whisper.cpp` reports progress less granularly. Switch to Scribe v2 if you need faster turnaround.

***

### See also

* [Story Cutter Assistant](/conversation-starters/story-cutter-assistant.md) — the downstream tool that uses Story Scribe's transcripts to build rough cuts.
* [How to cut videos faster with AI-assisted story editing in Premiere Pro](/workflows/how-to-cut-videos-faster-with-ai-assisted-story-editing-in-premiere-pro.md) — end-to-end workflow combining Story Scribe + Story Cutter.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.chatvideopro.com/features/studio/story-scribe.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
