Story Scribe

Story Scribe is the transcription stage that feeds Story Cutter. It produces a transcript with word-level timestamps and speaker labels.

If you only ever transcribe English with Premiere's built-in panel, Story Scribe gives you three big upgrades: more languages, better word-boundary precision (so Story Cutter's cuts land cleaner), and a private offline option that never touches the cloud.


What Story Scribe is designed to do

Story Scribe exists for one job: produce the most accurate possible transcript of your dialogue so that downstream Story Cutter edits land on exact word boundaries instead of guessing in the middle of a sentence.

  • Word-level timestamps. Every word in the transcript carries its own start/end time. When Story Cutter trims a soundbite, the cut snaps to the boundary between two words — not somewhere in the middle.

  • Multilingual. 25+ languages out of the box (English, German, Spanish, French, Italian, Portuguese, Dutch, Polish, Swedish, Norwegian, Danish, Finnish, Greek, Czech, Ukrainian, Russian, Turkish, Arabic, Hebrew, Hindi, Vietnamese, Indonesian, Korean, Japanese, Mandarin, Cantonese, Thai). Premiere's native transcription panel is English-heavy — Story Scribe is built for the rest.

  • Speaker labels (diarization). When the audio has multiple speakers, Story Scribe assigns each line to a "Speaker 1 / Speaker 2 / …" label that you can rename inline before handing off.

  • Three engines. Pick the right tradeoff between speed, accuracy, cost, and privacy on a per-job basis — no global setting to change.

  • Story Cutter handoff. One click ships the finished transcript into a new chat with Story Cutter, with the transcript already attached and the sequence linked.

Story Scribe is the front end for Story Cutter. If you already have a transcript that Premiere produced (or you got it from somewhere else), Story Scribe can also convert it into the format Story Cutter needs — see Source 4: Upload existing transcript below.


Getting Started

Step 1: Open Story Scribe

From the Studio launchpad, click the Story Scribe card (audio department, violet accent). You'll land on the Setup screen with four controls:

  1. Attach… — pick a source (timeline, in/out range, audio file, or existing transcript)

  2. Language — tell the engine what language to expect, or let it auto-detect

  3. Engine — pick between FAST, STUDIO, or PRIVATE

  4. Transcribe / Convert & Import — kick it off

The Transcribe button only enables when you've picked a source AND your chosen engine supports the language you picked. (More on language compatibility in the Engines section.)


Source options

You have four ways to feed audio into Story Scribe. Each one is tuned for a different real-world workflow.

Source 1: Active timeline

Best for: The full pass — you want a transcript of everything in your current sequence.

  • Make sure the sequence you want to transcribe is the active one in Premiere (i.e. the one with focus in the Program monitor).

  • Click Attach…Active timeline.

  • Story Scribe will export the audio of the entire active sequence to a temp WAV, then send it to your chosen engine.

This is the default workflow. Use this for podcasts, interviews, vlogs, and any project where you want the cutter to have the whole conversation to choose from.

Source 2: In/Out range

Best for: Long sequences where you only want to work with one section.

  • In Premiere, scrub to the start of the section you want and press I to set the in-mark.

  • Scrub to the end and press O to set the out-mark.

  • Come back to the Story Scribe panel — within ~1 second the In/Out range option will light up in the Attach… menu, showing the approximate duration (e.g. ≈4m 30s).

  • Click Attach…In/Out range.

If the In/Out range option is greyed out, it means Premiere doesn't currently report both marks set on the active sequence. Re-set them with I and O — the panel polls once a second and re-checks immediately when it regains focus.

Source 3: Upload audio file

Best for: Raw camera audio, podcast stems, voice memos, anything that lives as a file on disk rather than on a Premiere timeline.

  • Click Attach…Upload audio file.

  • Pick an MP3, WAV, M4A, or MP4 from disk.

  • Story Scribe sends the file directly to the engine — no AME export needed.

Useful when the speakers you care about live in raw camera files you haven't even imported into Premiere yet, or when you want a transcript of a podcast you're about to cut up.

Source 4: Upload existing transcript

Best for: You already have a transcript and just want to use it in Story Cutter.

  • Click Attach…Upload existing transcript.

  • Pick an SRT, VTT, or JSON file from disk.

  • Story Scribe normalizes it into the canonical format Story Cutter expects, then drops you straight into the Review screen.

  • The button label changes from Transcribe to Convert & Import to make this clear — no engine call is made, no cost is incurred, and the engine selector is ignored.

Supports:

  • SRT — standard subtitle/caption format

  • VTT — WebVTT format

  • JSON — Premiere Pro's exported transcript JSON

Why does my Premiere transcript work but lose precision? Premiere exports sentence-level timing only. Story Scribe will accept it and pass it through, but Story Cutter will warn that cuts will snap to sentence boundaries instead of word boundaries — they may feel slightly loose. For tight cuts, re-transcribe with Scribe v2 to get word timing back. (See Precision badge below.)


Choosing an engine

Story Scribe ships with three engines. The cards are colour-coded by the trade-off they represent.

Engine
Best for
Speed
Cost (est.)
Privacy

Fal Whisper v3

Single speaker, clean audio

Very fast

~$0.10 / hr

Cloud (Fal.ai)

Fal Scribe v2

Multiple speakers, accents, noisy audio

Fast

~$0.48 / hr

Cloud (Fal.ai)

Local Whisper

Sensitive content, offline work

Hardware-dependent

Free

100% on your machine

Fal Whisper v3 — FAST

The cheapest cloud option. Sends your audio to Fal.ai's hosted Whisper v3 endpoint. Excellent for clean single-speaker audio (one person at the mic, no overlapping voices, low background noise).

Use it when: you've got a podcast monologue, a single-presenter tutorial, a voiceover record, or any "one clean voice" job and you want a fast, cheap turnaround.

Don't use it when: your audio has multiple speakers talking over each other, heavy accents, or background music — accuracy can dip noticeably compared to Scribe v2.

Language note: Fal Whisper v3 doesn't currently support Cantonese (yue). If you pick Cantonese in the language dropdown, this card is auto-locked and the panel routes you to Scribe v2 or Local Whisper.

The accuracy default. This is the engine most people should pick for most jobs — it's what we recommend for serious production work.

Use it when: you have multiple speakers, accents, code-switching, music in the background, or you just want the most accurate result available. It's also the engine the Local Whisper hardware-warning modal will offer to switch you to for long clips.

Cost example: a 60-minute interview runs about $0.48 total. A 4-hour podcast is about $1.92.

Local Whisper — PRIVATE

Runs whisper.cpp on your own machine. Audio never leaves your computer — there's no cloud upload, no API key, no usage tracking. Zero cost per minute.

First-run setup. The first time you pick Local Whisper, Story Scribe shows an inline tile that downloads the whisper.cpp binary + the language model. The download is a one-time hit (about a few hundred MB depending on the model size); after that it runs offline forever. Downloads can be resumed if interrupted.

Long-clip pre-flight warning. If you pick Local Whisper for a clip 10 minutes or longer, Story Scribe shows a modal that estimates the time, shows you the equivalent cost on Scribe v2 (usually a few cents), and offers a one-click switch. You can dismiss the modal with Don't show again once you know your machine's profile.


Picking a language

The language dropdown above the engine selector controls how the engine interprets the audio.

  • Auto-detect — let the engine figure it out from the first few seconds. Good for unknown source material; not recommended if you already know the language because manual selection is always more reliable.

  • Manual selection — pick from the 25+ language list. Always preferred when you know the language up front.

When you change the language, the engine cards reactively re-evaluate:

  • If an engine doesn't support the language you picked, its card locks with a 🔒 Not available for [Language] badge.

  • The most common case: switching to Cantonese locks Fal Whisper v3 — use Scribe v2 or Local Whisper instead.


The Transcribing screen

Once you click Transcribe, you'll see the live transcription HUD with:

  • A halo waveform animation breathing in violet

  • A progress percent with a live sub-stage label ("Encoding in Media Encoder…", "Transcribing chunk 2 of 5…", etc.)

  • An elapsed counter showing how long the run has been going

You can cancel at any time with the Cancel button. If anything fails, the panel shows the error and offers a retry.

Long clips are chunked automatically. For audio over 20 minutes, cloud engines split the file into 20-minute chunks, transcribe each in parallel, and stitch the results back together with correct timestamps. You'll see the chunk progress in the live HUD ("chunk 3 of 7…").


The Review screen

When transcription finishes, you land on the Review screen with the full transcript displayed line-by-line.

The precision badge

In the header you'll see one of two pills:

  • Word-precise (green) — every word has its own timestamp. Story Cutter has full precision to land cuts on exact word boundaries.

  • Sentence-precise (amber) — only sentence-level timing is available. Cuts will snap to the start/end of each sentence — usually fine, but cuts may feel slightly loose if the AI wants to trim mid-sentence.

The label also tells you whether speakers were detected (· Speakers detected) or whether it was treated as a single speaker (· Single speaker).

Click-to-jump

Click any word or segment in the transcript and Premiere's playhead snaps to that exact timestamp and starts playing. The same behaviour Story Cutter uses for timestamp links — gives you instant audible confirmation that the click landed where you expected. A small "Snapping…" pill briefly appears while the jump is in flight.

Renaming speakers

If diarization gave you Speaker 1, Speaker 2, etc., click any speaker label inline to rename it. The new name propagates to every line that speaker said, and it persists when you save the JSON or hand off to Story Cutter.

The three actions

  • Save as JSON — write the transcript to disk in the canonical Premiere transcript JSON format. Use this when you want a sidecar file you can re-use later.

  • Re-run — go back to Setup with your source still attached. Useful if Scribe v2 gave you a sentence-precise result on a transcript upload and you want to re-transcribe from the original audio for word precision.

  • Use in Story Cutter ✨ — the primary action. Saves the transcript, opens a new chat with the Story Cutter Assistant conversation starter, and attaches the transcript automatically. You're one message away from a rough cut.


Tips for getting the best results

Audio quality matters more than engine choice

The single biggest factor in transcript accuracy is the audio that goes in. A clean lapel mic on Fal Whisper v3 will beat a noisy room mic on Scribe v2 every time. Before reaching for a better engine, ask:

  • Is the speaker close to the mic?

  • Is the background noise low?

  • Are there overlapping speakers? (Even diarization struggles with cross-talk.)

  • Is there music underneath the dialogue?

If the answer to any of those is concerning, pick Scribe v2 — it's more forgiving.

Set the language manually when you know it

Auto-detect is convenient but can occasionally pick the wrong language on the first few seconds (especially if the speaker starts with a non-native word, music plays first, or there's a long silence). When you know the language, just pick it from the dropdown — accuracy is always at least as good and usually better.

Use in/out marks to scope long sequences

If you only need to transcribe a 5-minute section of a 90-minute timeline, set in/out marks instead of transcribing the whole sequence. You'll get the result faster, pay less (cloud engines) or wait less (Local Whisper), and avoid clutter in the review screen.

For sensitive content, go local

Interview footage of a source who needs anonymity, medical content, legal depositions, anything under NDA — Local Whisper sends nothing to a cloud. The audio is read from disk, processed by whisper.cpp on your CPU, and the transcript is written back to disk. No network call is made.

Default to Scribe v2, drop down to Fal Whisper v3 only when it's truly a single clean speaker

Scribe v2's accuracy advantage is biggest on:

  • Multiple speakers (interviews, panels, conversations)

  • Heavy accents

  • Background music or noise

  • Code-switching between languages

If your material has none of those things — a solo voiceover record, a single-presenter tutorial recorded in a treated room — Fal Whisper v3 will save you about 5× on cost with negligible accuracy loss.

Don't re-transcribe what you can convert

If you've already got an SRT, VTT, or Premiere transcript JSON sitting on disk, use Upload existing transcript instead of re-transcribing the audio. It's free, it's instant, and the only downside is sentence-level precision (which you can upgrade later with Re-run if Story Cutter needs the word boundaries).

Rename speakers before handoff

Story Cutter uses the speaker labels you see in the Review screen. Renaming Speaker 1Sarah before clicking Use in Story Cutter means Story Cutter's output will reference Sarah everywhere — much easier to scan than Speaker 1. Two seconds well spent.

Don't move clips on the source timeline after transcribing

Just like Story Cutter, Story Scribe locks timestamps to clip positions. If you transcribe a sequence, then re-arrange clips, then send the transcript to Story Cutter — the cuts will land on the old positions. Either transcribe last, or re-transcribe after any timeline reorganisation.


Troubleshooting

"In/Out range" stays greyed out Premiere doesn't currently report both an in-mark AND an out-mark on the active sequence. Re-press I and O, then click out and back into the Premiere panel — the Story Scribe poller will pick it up within 1 second.

Local Whisper card is locked with "Windows only" You're on macOS. Use Scribe v2 (recommended) or Fal Whisper v3 — both are cloud-based and run on any platform.

Engine card is locked with "Language not supported" The language you picked isn't in that engine's supported set. The most common case is Cantonese on Fal Whisper v3 — switch to Scribe v2.

"Sentence-precise" badge after a re-run from audio This usually means the audio path actually returned word timestamps, but the diarization step couldn't subdivide cleanly. Try re-running with Scribe v2 if you weren't already on it.

Transcript looks correct but Story Cutter cuts feel loose Check the precision badge — if it's amber (sentence-precise), Story Cutter doesn't have word-level boundaries to snap to. Re-run with Scribe v2 from the audio source.

Long clip on Local Whisper feels frozen It's not — whisper.cpp is just slow on CPU. The panel won't show per-second progress like the cloud engines because whisper.cpp reports progress less granularly. Switch to Scribe v2 if you need faster turnaround.


See also

Last updated