ACE-Step 1.5 XL Free Music Generation in ComfyUI (04/12/26)
Use this guide to run ACE-Step 1.5 XL in ComfyUI with a practical setup path: choose the right model path for your VRAM, generate and refine lyrics with local LLM nodes, run first-pass songs, and then remix with conditioning/latent strategies for stronger final outputs.
Source Video
ACE-Step 1.5 XL = Free Music Generation in ComfyUI!
Channel: Nerdy Rodent
Published: 04/11/26
Length: 22m 33s
Watch on YouTube
1) What you will accomplish
- Set up an ACE-Step 1.5 XL workflow in ComfyUI with clean switch-based control groups.
- Use
Adaptive Projected Guidance effectively for SFT/base workflows, with quick A/B bypass testing.
- Generate lyrics/tags with an LLM node chain (example: Ollama + Gemma 4), then constrain output for usable prompts.
- Create first-pass songs, then run structured remix passes with latent reuse and conditioning blending.
- Diagnose common failure points (slow LLM response, weak prompt adherence, unstable style blends, and VRAM issues).
Recommended path: Build one baseline song first, then test one remix strategy at a time. Avoid changing five controls at once.
2) Prerequisites
Hardware and runtime assumptions
- GPU with about 12 GB VRAM minimum for local XL model usage, especially if testing multiple branches.
- ComfyUI installed and launching reliably.
- ACE-Step model files in the correct ComfyUI model directories (XL files are roughly much larger than non-XL variants).
Software and nodes
- ComfyUI workflow with grouped controls for loader/settings/prompt/remix.
- LLM node path available if you want AI lyrics or tags (example in video: Ollama + Gemma 4).
- Switch nodes for fast branch toggling (prompting on/off, image conditioning on/off, remix branch selection).
Before you start
- Pick your target output style: cinematic, ambient, techno, metal, etc.
- Pick one objective for Run 1: either lyric quality, style quality, or vocal phrasing, not all three.
3) Step-by-step setup and workflow
Step 1, build a clean workflow layout
- Create clear groups:
Loader, Settings, Prompt/Lyrics, Generation, Remix, Monitoring.
- Add switch nodes at branch entrances so you can test quickly without rewiring.
- Place
Adaptive Projected Guidance in the loader path and add a bypass switch for direct comparisons.
Practical reason: this keeps experimentation fast, reproducible, and less error-prone than manual reconnecting.
Step 2, set baseline generation settings
- Choose model branch:
turbo, base, or SFT.
- Set song length and BPM to a realistic first-pass target.
- Set your initial CFG and step values conservatively, then raise in later runs only if needed.
Step 3, configure AI prompt and lyric generation (optional but powerful)
- Enable the LLM prompt switch and route topic/style inputs into your lyric/tag generator branch.
- If using Ollama + Gemma 4, verify performance settings first. In the video workflow, disabling flash-attention-related behavior improved responsiveness.
- Add a post-processor rule: cap tags and enforce output format (for example: max tag count and strict section structure).
Common issue: LLMs often ignore precise formatting requests. Add sanitizing nodes or regex cleanup before passing tags downstream.
Step 4, run a first-pass song and store baseline artifacts
- Generate baseline output with prompt branch enabled and remix branches off.
- Save key artifacts for comparison: output audio, seed, prompt text, tag list, key settings.
- Listen once for structure and once for mix/style character, take separate notes.
Step 5, run controlled remix experiments
- Create a second conditioning set with a new seed and slightly altered style/key instructions.
- Test approach A: reuse prior latent and apply updated conditioning in a resampler path.
- Test approach B: blend conditioning with a conditioning-average node, then render with unchanged core settings.
- Optionally inject latent-noise variants for more texture, then compare against plain latent reuse.
Step 6, select winning branch and harden for repeatability
- Pick one branch with best lyric intelligibility + style confidence.
- Lock seed ranges and preserve exact node groups as a reusable template.
- Save as versioned presets (for example:
ace-xl-metal-v1, ace-xl-ambient-v1).
Suggested iteration matrix
| Pass | What to Change | What to Keep Fixed | What to Evaluate |
| Baseline | None (initial prompt + settings) | Model, length, BPM | Overall coherence, lyric fit |
| Remix A | New conditioning + latent reuse | CFG, steps, length | Style evolution without structural collapse |
| Remix B | Conditioning average blend | Seed strategy, duration | Smoother transitions, less harsh drift |
| Remix C | Latent-noise injection | Prompt core and model | Texture uniqueness vs artifact risk |
4) Practical examples you can copy
Example A, lyric-first workflow
- Goal: improve lyrical narrative without changing instrumentation too aggressively.
- Method: keep baseline latent, regenerate only lyric/tag branch, then render two variants.
Example B, style-first workflow
- Goal: shift mood (for example ambient to darker metal undertone) while preserving vocal phrasing.
- Method: update tag conditioning, average with original conditioning, keep seed neighborhood stable.
Example C, short-version remix
- Goal: create a shorter, punchier version of a long generation.
- Method: reduce lyric lines and target a shorter song length, then reuse strongest latent branch.
5) Success checks
- You can toggle
Adaptive Projected Guidance on/off and hear a consistent quality difference in your comparisons.
- You can generate one stable baseline plus at least two remix variants from the same starting run.
- Your prompt/lyric branch produces usable output after sanitizing (not uncontrolled tag spam).
- You can reproduce a preferred result family using saved presets and branch settings.
6) Troubleshooting
Problem: XL runs fail or stall
- Reduce concurrent branches, close nonessential GPU processes, and lower intermediate complexity.
- Run first-pass at lower duration/complexity, then increase in later passes.
Problem: LLM branch is too slow
- Use a smaller local model for drafting, then upscale prompts manually.
- Disable problematic acceleration settings for the selected LLM runtime if known to conflict.
Problem: Lyrics are okay but style drifts
- Keep prompt core text fixed and adjust only tag emphasis.
- Prefer conditioning averaging over heavy latent noise when you need consistency.
Problem: Too many variables changed per run
- Return to baseline and change one variable per pass.
- Use a run log with columns: seed, CFG, steps, branch, notes, keep/drop decision.