LTX 2.3 on 8GB VRAM: Consistent Characters, Dialogue, and Multi-Scene Video in ComfyUI

What you will build

This guide turns the video tutorial into a repeatable operating procedure. The goal is not just to “run a workflow.” The goal is to produce a coherent short scene: same characters, logical cuts, readable motion, voice lines or sound effects in the prompt, and a final combined MP4.

Best use cases

Short films and story sequences.
Two-character dialogue scenes.
Commercial or product-ad style clips.
Character-consistent social video snippets.

Hardware target

8 GB VRAM is possible with the workflow’s low-VRAM strategy.
12 GB VRAM gives more breathing room.
Use conservative resolution and segment lengths first.

Primary constraints

Resolution must be divisible by 32.
Frame lengths must use supported values.
Timeline segment length must match segment frame values.
Starting images must match the beginning of the action.

Plain-English model: each uploaded image is the first frame of a scene. The timeline prompt tells LTX what happens after that frame. The workflow then morphs those keyframes into one multi-segment video and generates the audio at the same time.

How the shared workflow is organized

The tutorial describes the ComfyUI graph from left to right. Treat it as five zones:

1. Loaders

Base model: LTX 2.3 distilled 1.1.
LoRA: LTX 2.3 IC LoRA Dual Character from Civitai.
VAEs: separate video VAE, audio VAE, and tiny preview VAE.

2. Performance patches

Chunked feed-forward: keep enabled for 8 GB / 12 GB cards.
Sage attention nodes: left bypassed in the shared workflow because Windows support may be unstable.

3. Settings and story inputs

Width, height, FPS, length, and image strength.
Up to four starting images.
Main style prompt plus per-segment timeline prompts.

4. Two-pass engine

Pass 1 creates a rough low-resolution draft with CFG set around 1.0 in the tutorial.
The latent upscaler doubles the working size.
Pass 2 adds sharper faces, textures, and environment detail.

Final assembly: the video stream and audio stream meet in the Video Combine node, which outputs the ready-to-watch MP4.

Prerequisites

Required

A working ComfyUI installation.
Enough free disk space for model files, workflow files, temporary outputs, and final MP4s.
An NVIDIA GPU or other ComfyUI-supported acceleration setup. The workflow is specifically presented as usable on 8 GB VRAM with careful settings.
The shared workflow JSON from the video description: LTX.2.3 Latest (LTX MultiScene).json.
The models and LoRA referenced by the workflow, installed where ComfyUI expects them.

Prepare before opening the graph

Four keyframe images, or fewer if you are testing with a shorter sequence.
A simple shot list with one action per segment.
Dialogue and sound effects written as text.
A continuity note for each character: screen side, gaze direction, clothing, environment, and emotional state.

Do not start with maximum ambition. First prove that ComfyUI, the model files, the workflow JSON, and one short segment all run. Then move to four segments, longer durations, or larger resolution.

Update ComfyUI and load the workflow

Step 1 — Update ComfyUI

The tutorial calls this the first required step. Update ComfyUI before loading the workflow so the graph has the latest node behavior and model support.

If you installed ComfyUI with Git, a common update pattern is:

cd /path/to/ComfyUI
git pull
# Then update dependencies using the method appropriate for your install.

If you use a ComfyUI manager/launcher, use its update function and restart ComfyUI after it completes.

Step 2 — Download the workflow JSON

Open the video description’s Google Drive link and download LTX.2.3 Latest(LTX MultiScene).json. Save a copy somewhere you can find again, then drag the JSON into ComfyUI or load it through ComfyUI’s workflow menu.

Step 3 — Resolve missing nodes and models

When the workflow opens, check for red/missing nodes or model warnings. Install missing custom nodes through your usual ComfyUI custom-node workflow. Place models, VAEs, and LoRA files in the folders expected by your ComfyUI setup, then refresh or restart ComfyUI.

Model naming note: the tutorial names the base model as LTX 2.3 distilled 1.1 and the LoRA as LTX 2.3 IC LoRA Dual Character from Civitai. If exact filenames differ, match the model family and loader expectations in the graph rather than guessing randomly.

Critical settings that prevent crashes

Most failures in this workflow come from incompatible dimensions, incompatible frame counts, or timeline values that do not match the segment frame values. Set these before running.

Resolution rule

Width and height must be divisible by 32. Examples that satisfy the rule include 512×288, 640×384, 768×432, 832×480, 1024×576, and 1280×720. On 8 GB VRAM, start smaller and scale up only after the workflow proves stable.

Frame-length rule

The tutorial specifically warns that frame lengths must be supported values such as 17, 25, or 33. If you drag a timeline segment and the visible length changes, update the corresponding segment frame value to match.

Recommended first-run settings

Use one or two segments instead of all four.
Choose a modest 16:9 size divisible by 32, such as 640×384 or 768×432.
Use a supported frame value such as 17 or 25 for each test segment.
Keep chunked feed-forward enabled.
Leave Sage attention nodes bypassed unless you know your environment supports them.
Run once and watch the preview pass for broad motion only; do not judge final sharpness until pass two finishes.

Crash warning: if the workflow immediately errors after a timeline edit, check the segment length/frame value match first. If it errors during generation, lower resolution or segment count before disabling important low-VRAM helpers.

Plan keyframes and write timeline prompts

The shared workflow uses a global prompt plus colored segment prompts. In the tutorial: blue maps to segment 1, orange to segment 2, green to segment 3, and red to segment 4. Your image is the starting frame for that segment; the text box describes what happens next.

Step 1 — Create a global look prompt

Use the main prompt to control the style of the entire video. Keep it stable across segments.

cinematic short film, natural dialogue, consistent characters, warm interior lighting, realistic camera motion, shallow depth of field, polished color grade

Step 2 — Upload starting images

Upload up to four images. Choose images that show the exact beginning of each scene, not the middle or end of the movement.

Good: a character seated before reaching for a cup, if the prompt says they reach for the cup.
Risky: a character already holding the cup, if the prompt says they pick it up.
Good: two characters facing each other before a line of dialogue.
Risky: both characters already mid-gesture, if the prompt asks them to start that gesture.

Step 3 — Write one segment prompt per scene

Each segment prompt should include action, camera behavior, dialogue, and sound cues. Keep the text specific but not overloaded.

Segment 1: Daniel sits at the cafe table across from Emma. Soft background cafe ambience. Daniel smiles and says, "Mind if I sit here?" Emma looks up and replies, "Not at all, please." Camera holds a medium two-shot.

Segment 2: Close-up of Emma on screen-left, still looking toward Daniel on her right. She laughs lightly and says, "Definitely lucky." A cup clinks softly on the table. Camera slowly pushes in.

Step 4 — Balance time and action

If the generated clip has dead space, the tutorial recommends two fixes: shorten that segment on the timeline, or add more action/dialogue so the characters stay busy. Do not let a long segment depend on a single tiny gesture.

Run the two-pass generation

Before run

Confirm ComfyUI is updated, model loaders are not red, chunked feed-forward is enabled, resolution is divisible by 32, and segment frame values match the timeline.

Pass 1

The first sampler creates a blurry low-resolution draft. The tutorial shows CFG at 1.0 for this rough pass. Use this preview to check composition, action direction, and whether the scene broadly follows the prompt.

Upscale

The latent upscaler increases the working size before the detail pass.

Pass 2

The second sampler adds sharper detail to faces, textures, and environments.

Combine

The final node combines video and audio into an MP4. Review both image continuity and audio timing before changing prompts.

Expected runtime: the tutorial’s example says a generation can take roughly 5 to 10 minutes. Your actual time depends on GPU, resolution, segment count, and node settings.

Character consistency and continuity rules

Character consistency is partly a model/workflow benefit and partly a directing problem. The workflow can help preserve identity, but it cannot fix contradictory shot planning.

Keep screen direction stable

If a character sits on the right side of the frame speaking to someone on the left, a later close-up should keep that character looking left. If they suddenly face the opposite direction, the cut feels wrong even if the face is consistent.

Start at the beginning of the action

The tutorial highlights a mistake: if the keyframe already shows two characters shaking hands, a prompt that asks them to shake hands may cause the model to repeat, undo, or distort the action.

Use continuity notes

Character A: screen side, gaze direction, clothing, emotional state.
Character B: screen side, gaze direction, relationship to Character A.
Environment: time of day, lighting, props, camera distance.

Prompt with edit points in mind

Do not ask every segment to do everything. Give each segment a clean job: establish, react, speak, move, reveal, or resolve.

Troubleshooting

Workflow crashes immediately

Check that width and height are divisible by 32.
Check that every segment frame value uses a supported number such as 17, 25, or 33.
Check that timeline segment lengths match the corresponding segment frame values.
Update ComfyUI and restart it before assuming the workflow is broken.

Out of VRAM

Keep chunked feed-forward enabled.
Use fewer segments for testing.
Lower resolution before increasing it again.
Close other GPU-heavy applications.
Leave experimental acceleration nodes bypassed unless they are known to work on your system.

Characters face the wrong way between cuts

Rebuild the offending keyframe so the character looks in the correct direction.
Add screen direction to the segment prompt: “Emma remains on screen-left, looking right toward Daniel.”
Do not mix mirrored images unless you intentionally want a perspective change.

The clip has awkward pauses or dead space

Shorten that timeline segment.
Add a second beat of action: glance, gesture, prop interaction, reaction, or camera movement.
Add ambient sound or dialogue if the silence feels unintentional.

Audio/dialogue is not matching the scene

Put dialogue in the exact segment where it should happen.
Keep lines short and assign speakers clearly.
Describe sound effects directly, such as “soft cafe ambience,” “cup clink,” or “distant thunder.”
Avoid packing too many speakers and effects into one short segment.

Sources and related links

Skill Destiny — “LTX 2.3 on 8GB VRAM: Consistent Characters + Dialogue in One Workflow”. Primary tutorial source for workflow structure, settings, low-VRAM notes, continuity tips, and demo examples.
Workflow download — LTX.2.3 Latest (LTX MultiScene).json. Shared workflow linked from the video description.
ComfyUI official project repository. Reference for the ComfyUI application context and update/source project.
Lightricks LTX-Video official repository. Official LTX-Video project context.
Lightricks on Hugging Face. Public model/source context for Lightricks releases.