LTX 2.3 in ComfyUI: Character and Background Replacement Workflow Guide (2026-05-22)

1. What this workflow does

This workflow is a video-to-video editing system for ComfyUI built around LTX 2.3, Flux 2 Klein Edit, IC LoRA, SAM3 segmentation, and DWpose. The video describes two primary use cases:

Character Replacement

Swap the performer in the source video for a new character based on a single reference image. The goal is to preserve the original performance: facial expressions, eye movement, rhythm, body language, emotional delivery, and timing.

Background Replacement

Keep the original character but transform the environment around them. The workflow uses segmentation and pose guidance so the subject remains consistent while the scene becomes a newly generated animated environment.

Important: the video explicitly does not re-teach ComfyUI and LTX 2.3 installation. This guide maps the required files and operational steps, but assumes ComfyUI is already working.

2. Requirements and model files

Use the author’s workflow/resources folder first, then place the model files in the ComfyUI folders listed below. Exact filenames and node expectations can change, so restart ComfyUI and use Manager’s missing-node/model warnings to confirm.

Resource	Destination	Purpose	Link
Workflow & Resources	`Google Drive workflow/resources folder`	Import workflow JSON/resources from the author	source
Flux 2 Klein Edit	`/ComfyUI/models/diffusion_models`	First-frame image editing for character/background integration	source
Qwen text encoder for Flux Klein	`/ComfyUI/models/text_encoders`	Text encoder used by Flux Klein subgraph	source
Flux2 VAE	`/ComfyUI/models/vae`	VAE for Flux Klein	source
MelBand RoFormer audio model	`/ComfyUI/models/diffusion_models`	Audio/dialogue extraction or separation support	source
LTX 2.3 low-VRAM GGUF	`/ComfyUI/models/diffusion_models`	12 GB VRAM path / quantized model	source
LTX 2.3 FP8 transformer	`/ComfyUI/models/diffusion_models`	16 GB+ VRAM path	source
Gemma 3 text encoder	`/ComfyUI/models/text_encoders`	LTX 2.3 text encoder	source
LTX text projection	`/ComfyUI/models/text_encoders`	LTX 2.3 text projection	source
LTX audio VAE	`/ComfyUI/models/vae`	Audio VAE for LTX 2.3	source
LTX video VAE	`/ComfyUI/models/vae`	Video VAE for LTX 2.3	source
Tiny VAE preview	`/ComfyUI/models/vae or preview-related folder`	Faster previews; verify workflow expected path	source
Spatial upscaler	`/ComfyUI/models/latent_upscale_models`	2x latent spatial upscaling	source
Distilled LoRA	`/ComfyUI/models/loras`	Faster/distilled LTX generation	source
Camera movement LoRAs	`/ComfyUI/models/loras`	Optional movement/control LoRAs	source

VRAM choice: the description identifies a low-VRAM LTX 2.3 GGUF path for about 12 GB VRAM and an FP8 transformer path for 16 GB+ VRAM. Use one path consistently with the workflow variant you load.

3. Installation map

Download the workflow/resources

Get the workflow JSON and any custom resource files from the Google Drive folder. Keep an untouched backup of the original workflow before editing node paths.

Place model files in their ComfyUI folders

Put diffusion models in ComfyUI/models/diffusion_models, text encoders in ComfyUI/models/text_encoders, VAEs in ComfyUI/models/vae, LoRAs in ComfyUI/models/loras, and the latent upscaler in ComfyUI/models/latent_upscale_models.

Install/update custom nodes

The workflow references SAM3 segmentation, DWpose, LTX 2.3 nodes, Flux/Klein edit nodes, audio extraction/separation support, and likely Kijai/Lightricks-related ComfyUI node packs. Use ComfyUI Manager to install missing nodes after loading the workflow.

Restart and reload

Restart ComfyUI after model placement. Then load the workflow and resolve red nodes or missing model dropdown entries before running a full video job.

4. Settings section

The video’s Settings section controls the workflow’s operating mode and output constraints.

Setting	What to choose	Practical guidance
Mode	Character Replacement or Background Replacement	Choose character replacement when the person/subject changes; choose background replacement when the original subject remains.
Video length	Length of the generated output	Start short for testing. Long clips multiply VRAM/time/failure risk.
Starting frame	First source frame used for alignment	Pick a clear frame that represents the subject and lighting well.
FPS	Output frame rate	Use modest FPS for tests; increase after visual stability is proven.
Resolution	Output dimensions	Use lower resolution while debugging, then upscale or increase once settings work.
Keep original voice audio	`true` or `false`	`true` preserves original voice; `false` lets LTX generate a more natural voice for the new character according to the video.

5. Prompting strategy

The workflow is designed so you do not manually type dialogue. You write the character, environment, and action. The workflow reads the original video dialogue and injects it into the final prompt.

Initial prompt should include

Who/what the character is.
Clothing, style, era, realism level.
Environment and lighting.
Action/emotion matching the source performance.
Camera feel: cinematic, handheld, close-up, interview, etc.

Avoid overloading it with

Manual dialogue already present in the source video.
Contradictory motion instructions.
Too many character details that fight the reference image.
Background details in Character Replacement mode unless needed.

Example character prompt:
A realistic cinematic female astronaut commander in a white EVA suit, expressive face, natural skin texture, matching the original actor's emotion and timing, standing in the same lighting and camera angle.

Example background prompt:
The same speaker stands inside a neon rain-soaked cyberpunk alley, cinematic reflections, moody blue and magenta light, realistic atmosphere, preserve the original body motion and expression.

6. Character Replacement workflow

Prepare the source video. Use a clip where the performer is visible, motion is readable, and the first frame is clean.
Choose Character Replacement. Set length, start frame, FPS, resolution, and voice behavior.
Add the reference image. Use a clear single image of the new character. The closer the framing and lighting are to the first frame, the easier the edit.
Write the initial prompt. Describe the new character and how they should belong in the scene.
Run the first-frame generation. Flux 2 Klein Edit creates the first output frame. This frame must match the source first frame almost perfectly except for the replaced character.
Inspect the first frame before continuing. If the character, lighting, scale, or pose is wrong, restart/tune before waiting for a full run.
Let dialogue injection happen. The workflow extracts source dialogue and combines it into the final prompt.
Generate the output. IC LoRA transfers the original motion/performance onto the new character.

Critical success factor: first-frame alignment. The transcript emphasizes that IC LoRA depends on the first output frame matching the first input frame almost perfectly, aside from the intended replacement.

7. Background Replacement workflow

Choose Background Replacement. Keep the original character and change the environment.
Usually keep original voice audio. The video keeps original voice in the background example because the character remains the same.
Write an environment-focused prompt. Describe the new world, lighting, mood, materials, and cinematic atmosphere.
Run first-frame background edit. Flux 2 Klein Edit replaces the background in the first input frame.
Use segmentation and pose isolation. SAM3 isolates the main character; DWpose creates a clean pose reference. This helps prevent random extra characters from appearing in the generated background.
Generate animated environment. IC LoRA preserves the original movements while integrating the subject into the new scene.

Best use case: interviews, monologues, short acting clips, or performance shots where the person should stay recognizable but the world around them should change dramatically.

8. Quality control before publishing a result

First frame

Subject scale matches source.
Lighting direction matches.
Hands/face are not distorted.
Replacement area blends naturally.

Motion

Eyes track naturally.
Facial expression follows source.
Body rhythm is preserved.
No extra limbs/characters appear.

Audio/dialogue

Dialogue extraction is correct.
Prompt injection did not hallucinate lines.
Voice choice matches intent.
Lip timing is acceptable.

9. Troubleshooting

Problem	Likely cause	Fix
Missing red nodes	Custom nodes not installed	Use ComfyUI Manager to install missing node packs, restart, reload workflow.
Model dropdown blank	File in wrong folder or not restarted	Verify exact folder, filename, and restart ComfyUI.
Output drifts from source motion	First frame mismatch or weak pose/motion guidance	Regenerate first frame closer to input; lower ambition; use cleaner source clip.
Random extra people in background mode	Segmentation/pose not isolating main subject	Check SAM3 mask and DWpose output; use a clearer source frame; simplify background prompt.
Character does not match reference	Reference image conflicts with source pose/lighting	Use a reference with similar angle, face visibility, and lighting; simplify prompt.
Out of memory	Resolution/FPS/length/model too high	Use low-VRAM GGUF path, shorter clips, lower resolution, fewer frames, or FP8 variants.
Dialogue wrong	Audio extraction misread source	Use cleaner audio; manually inspect final prompt; optionally transcribe externally and paste corrected dialogue if workflow allows.

10. Run checklist

☐ ComfyUI launches cleanly and workflow loads without missing nodes.
☐ Correct LTX model path chosen for VRAM level.
☐ Flux 2 Klein Edit model, Qwen encoder, and Flux VAE are installed.
☐ LTX text encoders, projection, video/audio VAEs, LoRAs, and upscaler are in place.
☐ Source video is short, clean, and has a strong first frame.
☐ Character reference image is clear and compatible with the source angle.
☐ Mode is set correctly: character vs. background replacement.
☐ First generated frame is inspected before full generation.
☐ Final prompt includes extracted dialogue correctly.
☐ Output is reviewed for motion, identity consistency, artifacts, and audio timing.

11. Sources

Video: How to Replace Characters or Backgrounds in Videos with LTX 2.3 by FutuTek.
Channel: https://www.youtube.com/channel/UCXG5FVJlVLSHREQUE-ha5OA.
Source notes and transcript timing: ltx23-comfyui-character-background-replacement-guide-2026-05-22.sources.md.
Primary workflow/resources folder: Google Drive workflow resources.

Guide created 2026-05-22 from video transcript, metadata, and the resource links supplied in the video description.