ACE-Step 1.5 XL Free Music Generation in ComfyUI (04/12/26)

Use this guide to run ACE-Step 1.5 XL in ComfyUI with a practical setup path: choose the right model path for your VRAM, generate and refine lyrics with local LLM nodes, run first-pass songs, and then remix with conditioning/latent strategies for stronger final outputs.

Source Video

ACE-Step 1.5 XL = Free Music Generation in ComfyUI!

Channel: Nerdy Rodent
Published: 04/11/26
Length: 22m 33s

Watch on YouTube

1) What you will accomplish

Set up an ACE-Step 1.5 XL workflow in ComfyUI with clean switch-based control groups.
Use Adaptive Projected Guidance effectively for SFT/base workflows, with quick A/B bypass testing.
Generate lyrics/tags with an LLM node chain (example: Ollama + Gemma 4), then constrain output for usable prompts.
Create first-pass songs, then run structured remix passes with latent reuse and conditioning blending.
Diagnose common failure points (slow LLM response, weak prompt adherence, unstable style blends, and VRAM issues).

Recommended path: Build one baseline song first, then test one remix strategy at a time. Avoid changing five controls at once.

2) Prerequisites

Hardware and runtime assumptions

GPU with about 12 GB VRAM minimum for local XL model usage, especially if testing multiple branches.
ComfyUI installed and launching reliably.
ACE-Step model files in the correct ComfyUI model directories (XL files are roughly much larger than non-XL variants).

Software and nodes

ComfyUI workflow with grouped controls for loader/settings/prompt/remix.
LLM node path available if you want AI lyrics or tags (example in video: Ollama + Gemma 4).
Switch nodes for fast branch toggling (prompting on/off, image conditioning on/off, remix branch selection).

Before you start

Pick your target output style: cinematic, ambient, techno, metal, etc.
Pick one objective for Run 1: either lyric quality, style quality, or vocal phrasing, not all three.

3) Step-by-step setup and workflow

Step 1, build a clean workflow layout

Create clear groups: Loader, Settings, Prompt/Lyrics, Generation, Remix, Monitoring.
Add switch nodes at branch entrances so you can test quickly without rewiring.
Place Adaptive Projected Guidance in the loader path and add a bypass switch for direct comparisons.

Practical reason: this keeps experimentation fast, reproducible, and less error-prone than manual reconnecting.

Step 2, set baseline generation settings

Choose model branch: turbo, base, or SFT.
Set song length and BPM to a realistic first-pass target.
Set your initial CFG and step values conservatively, then raise in later runs only if needed.

Step 3, configure AI prompt and lyric generation (optional but powerful)

Enable the LLM prompt switch and route topic/style inputs into your lyric/tag generator branch.
If using Ollama + Gemma 4, verify performance settings first. In the video workflow, disabling flash-attention-related behavior improved responsiveness.
Add a post-processor rule: cap tags and enforce output format (for example: max tag count and strict section structure).

Common issue: LLMs often ignore precise formatting requests. Add sanitizing nodes or regex cleanup before passing tags downstream.

Step 4, run a first-pass song and store baseline artifacts

Generate baseline output with prompt branch enabled and remix branches off.
Save key artifacts for comparison: output audio, seed, prompt text, tag list, key settings.
Listen once for structure and once for mix/style character, take separate notes.

Step 5, run controlled remix experiments

Create a second conditioning set with a new seed and slightly altered style/key instructions.
Test approach A: reuse prior latent and apply updated conditioning in a resampler path.
Test approach B: blend conditioning with a conditioning-average node, then render with unchanged core settings.
Optionally inject latent-noise variants for more texture, then compare against plain latent reuse.

Step 6, select winning branch and harden for repeatability

Pick one branch with best lyric intelligibility + style confidence.
Lock seed ranges and preserve exact node groups as a reusable template.
Save as versioned presets (for example: ace-xl-metal-v1, ace-xl-ambient-v1).

Suggested iteration matrix

Pass	What to Change	What to Keep Fixed	What to Evaluate
Baseline	None (initial prompt + settings)	Model, length, BPM	Overall coherence, lyric fit
Remix A	New conditioning + latent reuse	CFG, steps, length	Style evolution without structural collapse
Remix B	Conditioning average blend	Seed strategy, duration	Smoother transitions, less harsh drift
Remix C	Latent-noise injection	Prompt core and model	Texture uniqueness vs artifact risk

4) Practical examples you can copy

Example A, lyric-first workflow

Goal: improve lyrical narrative without changing instrumentation too aggressively.
Method: keep baseline latent, regenerate only lyric/tag branch, then render two variants.

Example B, style-first workflow

Goal: shift mood (for example ambient to darker metal undertone) while preserving vocal phrasing.
Method: update tag conditioning, average with original conditioning, keep seed neighborhood stable.

Example C, short-version remix

Goal: create a shorter, punchier version of a long generation.
Method: reduce lyric lines and target a shorter song length, then reuse strongest latent branch.

5) Success checks

You can toggle Adaptive Projected Guidance on/off and hear a consistent quality difference in your comparisons.
You can generate one stable baseline plus at least two remix variants from the same starting run.
Your prompt/lyric branch produces usable output after sanitizing (not uncontrolled tag spam).
You can reproduce a preferred result family using saved presets and branch settings.

6) Troubleshooting

Problem: XL runs fail or stall

Reduce concurrent branches, close nonessential GPU processes, and lower intermediate complexity.
Run first-pass at lower duration/complexity, then increase in later passes.

Problem: LLM branch is too slow

Use a smaller local model for drafting, then upscale prompts manually.
Disable problematic acceleration settings for the selected LLM runtime if known to conflict.

Problem: Lyrics are okay but style drifts

Keep prompt core text fixed and adjust only tag emphasis.
Prefer conditioning averaging over heavy latent noise when you need consistency.

Problem: Too many variables changed per run

Return to baseline and change one variable per pass.
Use a run log with columns: seed, CFG, steps, branch, notes, keep/drop decision.

7) Sources

ACE-Step 1.5 XL = Free Music Generation in ComfyUI! (Nerdy Rodent)