1. What this workflow does
This workflow is a video-to-video editing system for ComfyUI built around LTX 2.3, Flux 2 Klein Edit, IC LoRA, SAM3 segmentation, and DWpose. The video describes two primary use cases:
Character Replacement
Swap the performer in the source video for a new character based on a single reference image. The goal is to preserve the original performance: facial expressions, eye movement, rhythm, body language, emotional delivery, and timing.
Background Replacement
Keep the original character but transform the environment around them. The workflow uses segmentation and pose guidance so the subject remains consistent while the scene becomes a newly generated animated environment.
2. Requirements and model files
Use the author’s workflow/resources folder first, then place the model files in the ComfyUI folders listed below. Exact filenames and node expectations can change, so restart ComfyUI and use Manager’s missing-node/model warnings to confirm.
| Resource | Destination | Purpose | Link |
|---|---|---|---|
| Workflow & Resources | Google Drive workflow/resources folder | Import workflow JSON/resources from the author | source |
| Flux 2 Klein Edit | /ComfyUI/models/diffusion_models | First-frame image editing for character/background integration | source |
| Qwen text encoder for Flux Klein | /ComfyUI/models/text_encoders | Text encoder used by Flux Klein subgraph | source |
| Flux2 VAE | /ComfyUI/models/vae | VAE for Flux Klein | source |
| MelBand RoFormer audio model | /ComfyUI/models/diffusion_models | Audio/dialogue extraction or separation support | source |
| LTX 2.3 low-VRAM GGUF | /ComfyUI/models/diffusion_models | 12 GB VRAM path / quantized model | source |
| LTX 2.3 FP8 transformer | /ComfyUI/models/diffusion_models | 16 GB+ VRAM path | source |
| Gemma 3 text encoder | /ComfyUI/models/text_encoders | LTX 2.3 text encoder | source |
| LTX text projection | /ComfyUI/models/text_encoders | LTX 2.3 text projection | source |
| LTX audio VAE | /ComfyUI/models/vae | Audio VAE for LTX 2.3 | source |
| LTX video VAE | /ComfyUI/models/vae | Video VAE for LTX 2.3 | source |
| Tiny VAE preview | /ComfyUI/models/vae or preview-related folder | Faster previews; verify workflow expected path | source |
| Spatial upscaler | /ComfyUI/models/latent_upscale_models | 2x latent spatial upscaling | source |
| Distilled LoRA | /ComfyUI/models/loras | Faster/distilled LTX generation | source |
| Camera movement LoRAs | /ComfyUI/models/loras | Optional movement/control LoRAs | source |
3. Installation map
Download the workflow/resources
Get the workflow JSON and any custom resource files from the Google Drive folder. Keep an untouched backup of the original workflow before editing node paths.
Place model files in their ComfyUI folders
Put diffusion models in ComfyUI/models/diffusion_models, text encoders in ComfyUI/models/text_encoders, VAEs in ComfyUI/models/vae, LoRAs in ComfyUI/models/loras, and the latent upscaler in ComfyUI/models/latent_upscale_models.
Install/update custom nodes
The workflow references SAM3 segmentation, DWpose, LTX 2.3 nodes, Flux/Klein edit nodes, audio extraction/separation support, and likely Kijai/Lightricks-related ComfyUI node packs. Use ComfyUI Manager to install missing nodes after loading the workflow.
Restart and reload
Restart ComfyUI after model placement. Then load the workflow and resolve red nodes or missing model dropdown entries before running a full video job.
4. Settings section
The video’s Settings section controls the workflow’s operating mode and output constraints.
| Setting | What to choose | Practical guidance |
|---|---|---|
| Mode | Character Replacement or Background Replacement | Choose character replacement when the person/subject changes; choose background replacement when the original subject remains. |
| Video length | Length of the generated output | Start short for testing. Long clips multiply VRAM/time/failure risk. |
| Starting frame | First source frame used for alignment | Pick a clear frame that represents the subject and lighting well. |
| FPS | Output frame rate | Use modest FPS for tests; increase after visual stability is proven. |
| Resolution | Output dimensions | Use lower resolution while debugging, then upscale or increase once settings work. |
| Keep original voice audio | true or false | true preserves original voice; false lets LTX generate a more natural voice for the new character according to the video. |
5. Prompting strategy
The workflow is designed so you do not manually type dialogue. You write the character, environment, and action. The workflow reads the original video dialogue and injects it into the final prompt.
Initial prompt should include
- Who/what the character is.
- Clothing, style, era, realism level.
- Environment and lighting.
- Action/emotion matching the source performance.
- Camera feel: cinematic, handheld, close-up, interview, etc.
Avoid overloading it with
- Manual dialogue already present in the source video.
- Contradictory motion instructions.
- Too many character details that fight the reference image.
- Background details in Character Replacement mode unless needed.
Example character prompt: A realistic cinematic female astronaut commander in a white EVA suit, expressive face, natural skin texture, matching the original actor's emotion and timing, standing in the same lighting and camera angle. Example background prompt: The same speaker stands inside a neon rain-soaked cyberpunk alley, cinematic reflections, moody blue and magenta light, realistic atmosphere, preserve the original body motion and expression.
6. Character Replacement workflow
- Prepare the source video. Use a clip where the performer is visible, motion is readable, and the first frame is clean.
- Choose Character Replacement. Set length, start frame, FPS, resolution, and voice behavior.
- Add the reference image. Use a clear single image of the new character. The closer the framing and lighting are to the first frame, the easier the edit.
- Write the initial prompt. Describe the new character and how they should belong in the scene.
- Run the first-frame generation. Flux 2 Klein Edit creates the first output frame. This frame must match the source first frame almost perfectly except for the replaced character.
- Inspect the first frame before continuing. If the character, lighting, scale, or pose is wrong, restart/tune before waiting for a full run.
- Let dialogue injection happen. The workflow extracts source dialogue and combines it into the final prompt.
- Generate the output. IC LoRA transfers the original motion/performance onto the new character.
7. Background Replacement workflow
- Choose Background Replacement. Keep the original character and change the environment.
- Usually keep original voice audio. The video keeps original voice in the background example because the character remains the same.
- Write an environment-focused prompt. Describe the new world, lighting, mood, materials, and cinematic atmosphere.
- Run first-frame background edit. Flux 2 Klein Edit replaces the background in the first input frame.
- Use segmentation and pose isolation. SAM3 isolates the main character; DWpose creates a clean pose reference. This helps prevent random extra characters from appearing in the generated background.
- Generate animated environment. IC LoRA preserves the original movements while integrating the subject into the new scene.
8. Quality control before publishing a result
First frame
- Subject scale matches source.
- Lighting direction matches.
- Hands/face are not distorted.
- Replacement area blends naturally.
Motion
- Eyes track naturally.
- Facial expression follows source.
- Body rhythm is preserved.
- No extra limbs/characters appear.
Audio/dialogue
- Dialogue extraction is correct.
- Prompt injection did not hallucinate lines.
- Voice choice matches intent.
- Lip timing is acceptable.
9. Troubleshooting
| Problem | Likely cause | Fix |
|---|---|---|
| Missing red nodes | Custom nodes not installed | Use ComfyUI Manager to install missing node packs, restart, reload workflow. |
| Model dropdown blank | File in wrong folder or not restarted | Verify exact folder, filename, and restart ComfyUI. |
| Output drifts from source motion | First frame mismatch or weak pose/motion guidance | Regenerate first frame closer to input; lower ambition; use cleaner source clip. |
| Random extra people in background mode | Segmentation/pose not isolating main subject | Check SAM3 mask and DWpose output; use a clearer source frame; simplify background prompt. |
| Character does not match reference | Reference image conflicts with source pose/lighting | Use a reference with similar angle, face visibility, and lighting; simplify prompt. |
| Out of memory | Resolution/FPS/length/model too high | Use low-VRAM GGUF path, shorter clips, lower resolution, fewer frames, or FP8 variants. |
| Dialogue wrong | Audio extraction misread source | Use cleaner audio; manually inspect final prompt; optionally transcribe externally and paste corrected dialogue if workflow allows. |
10. Run checklist
- ☐ ComfyUI launches cleanly and workflow loads without missing nodes.
- ☐ Correct LTX model path chosen for VRAM level.
- ☐ Flux 2 Klein Edit model, Qwen encoder, and Flux VAE are installed.
- ☐ LTX text encoders, projection, video/audio VAEs, LoRAs, and upscaler are in place.
- ☐ Source video is short, clean, and has a strong first frame.
- ☐ Character reference image is clear and compatible with the source angle.
- ☐ Mode is set correctly: character vs. background replacement.
- ☐ First generated frame is inspected before full generation.
- ☐ Final prompt includes extracted dialogue correctly.
- ☐ Output is reviewed for motion, identity consistency, artifacts, and audio timing.
11. Sources
- Video: How to Replace Characters or Backgrounds in Videos with LTX 2.3 by FutuTek.
- Channel: https://www.youtube.com/channel/UCXG5FVJlVLSHREQUE-ha5OA.
- Source notes and transcript timing: ltx23-comfyui-character-background-replacement-guide-2026-05-22.sources.md.
- Primary workflow/resources folder: Google Drive workflow resources.
Guide created 2026-05-22 from video transcript, metadata, and the resource links supplied in the video description.