demobook

ComfyUI: Using Tiled VAE Decode for low VRAM audio generation

Demo summary

The narrator shows how to replace the standard VAE decode node with a tiled version in ComfyUI to handle longer audio generations (120 seconds) on hardware with lower VRAM.

Step-by-step

  1. Locate the standard VAE Decode Audio node in your workflow
  2. Delete or bypass the standard node
  3. Add the Tiled VAE Decode Audio node
  4. Connect the latent and VAE inputs to the tiled node
  5. Set the desired duration (e.g., 120 seconds)
  6. Queue the prompt to generate the audio

Tips

  • Use the tiled version of the VAE decode node to handle longer audio generations on hardware with lower VRAM

Highlights

it should work a little better with lower VRAM

All demos from “Stable Audio 3 in ComfyUI: Create AI Music and Sound Effects (Ep19)

  1. 0:441:57Generate audio from text in ComfyUI with Stable Audio 3The creator demonstrates setting up a ComfyUI workflow using the Stable Audio 3 Medium model and T5 Gemma text encoder to generate a 30-second instrumental audio clip from a text prompt.ComfyUIAI Music Generator
  2. 3:340:28Using Tiled VAE Decode for low VRAM audio generationCurrentThe narrator shows how to replace the standard VAE decode node with a tiled version in ComfyUI to handle longer audio generations (120 seconds) on hardware with lower VRAM.ComfyUIAI Music Generator
  3. 4:270:30Copying and replacing prompts in ComfyUIThe user demonstrates a custom node feature that allows hovering over prompt examples to copy and then replace the current prompt in the workflow with a single click.ComfyUIAI Image Generator
  4. 8:420:58Enhance audio prompts with Gemma 4A workflow is shown where a simple piano prompt is processed by a Gemma 4 model to generate a more detailed audio prompt before being passed to the Stable Audio sampler.ComfyUIAI Music Generator
  5. 10:550:28Image-to-Music generation in ComfyUIThe creator demonstrates an experimental workflow that loads an image of a bunny, uses Gemma 4 to describe it as a music prompt, and then generates corresponding audio.ComfyUIAI Music Generator
  6. 13:030:51Managing node colors with PixaRoma nodesThe video demonstrates UI updates for PixaRoma nodes in ComfyUI, including selecting color swatches, copying/pasting colors between nodes, and setting favorite colors.ComfyUIAI Image Generator
  7. 14:161:11Advanced image loading and padding in ComfyUIThe narrator demonstrates the updated Load Image node featuring folder filtering, image previews, and manual padding controls for outpainting tasks.ComfyUIAI Outpainting
  8. Watch “Stable Audio 3 in ComfyUI: Create AI Music and Sound Effects (Ep19)” →

AI Music Generator

  1. 0:441:57Generate audio from text in ComfyUI with Stable Audio 3The creator demonstrates setting up a ComfyUI workflow using the Stable Audio 3 Medium model and T5 Gemma text encoder to generate a 30-second instrumental audio clip from a text prompt.pixaroma
  2. 3:340:28Using Tiled VAE Decode for low VRAM audio generationCurrentThe narrator shows how to replace the standard VAE decode node with a tiled version in ComfyUI to handle longer audio generations (120 seconds) on hardware with lower VRAM.pixaroma
  3. 8:420:58Enhance audio prompts with Gemma 4A workflow is shown where a simple piano prompt is processed by a Gemma 4 model to generate a more detailed audio prompt before being passed to the Stable Audio sampler.pixaroma
  4. 10:550:28Image-to-Music generation in ComfyUIThe creator demonstrates an experimental workflow that loads an image of a bunny, uses Gemma 4 to describe it as a music prompt, and then generates corresponding audio.pixaroma
  5. 1:191:36Generate music with Stable Audio 3 Medium in ComfyUIThe creator demonstrates loading the Stable Audio 3 medium model in ComfyUI, setting up a prompt for 'gothic techno', and configuring the audio duration to 95 seconds before generating the track.Nerdy Rodent
  6. 3:330:34Low VRAM audio generation with Small modelThe user switches to the 2GB 'small' model in ComfyUI to demonstrate audio generation suitable for low-end GPUs.Nerdy Rodent
  7. 6:311:11AI Prompt Generation with GemmaThe creator shows an 'audio to text to audio' pipeline using Gemma to describe an audio input and generate a new prompt for the music generation node.Nerdy Rodent
  8. 10:170:59Audio conditioning with voice inputThe user demonstrates using a 30-second vocal recording as an input combined with a text prompt and a linear quadratic scheduler to generate a new track.Nerdy Rodent
  9. 1:151:28Setting up Stable Audio 3.0 in ComfyUIThe creator demonstrates how to access the Stable Audio 3.0 workflow via ComfyUI templates, download the necessary checkpoint and text encoder models, and navigate the prompt nodes for music and sound effect generation.REBEL AI