demobook

Generate music with Stable Audio 3 Medium in ComfyUI

Demo summary

The creator demonstrates loading the Stable Audio 3 medium model in ComfyUI, setting up a prompt for 'gothic techno', and configuring the audio duration to 95 seconds before generating the track.

Step-by-step

  1. Select the Stable Audio 3 Medium model in the loading group
  2. Set the sampling steps to 8 and CFG to 1
  3. Enter a music description in the prompt node
  4. Configure the audio duration in the Empty Latent Audio node
  5. Click to generate the audio track

Options

  • Enable the Model Shift node to potentially improve audio quality
  • Use a negative prompt (only if CFG is set higher than 1)
  • Use the Any Switch to flip between latent and empty latent audio
  • Generate sound effects or one-shots instead of full music tracks

Watch out for

  • Negative prompts only work when CFG is not set to 1

Tips

  • Stable Audio 3 Medium is recommended as the best model to start with
  • Try the optional model shift node as some audio sounds better with it enabled
  • Use the provided examples node to see valid types of creations like instruments or sound effects

Highlights

Sampling is super straightforward in comfy

All demos from “Make High Quality Music in ComfyUI - Low VRAM!

  1. 1:191:36Generate music with Stable Audio 3 Medium in ComfyUICurrentThe creator demonstrates loading the Stable Audio 3 medium model in ComfyUI, setting up a prompt for 'gothic techno', and configuring the audio duration to 95 seconds before generating the track.ComfyUIAI Music Generator
  2. 3:330:34Low VRAM audio generation with Small modelThe user switches to the 2GB 'small' model in ComfyUI to demonstrate audio generation suitable for low-end GPUs.ComfyUIAI Music Generator
  3. 6:311:11AI Prompt Generation with GemmaThe creator shows an 'audio to text to audio' pipeline using Gemma to describe an audio input and generate a new prompt for the music generation node.ComfyUIAI Music Generator
  4. 7:421:39Audio-to-Audio VAE EncodingThe demo shows loading an existing audio file (beats from Ace Step) into ComfyUI using VAE encoding to influence the style of the generated output.ComfyUIAI Song Remixer
  5. 10:170:59Audio conditioning with voice inputThe user demonstrates using a 30-second vocal recording as an input combined with a text prompt and a linear quadratic scheduler to generate a new track.ComfyUIAI Music Generator
  6. 11:161:07Generate sound effects with Stable AudioThe creator demonstrates generating specific sound effects like 'creaky doors' and 'underwater fireworks' using the medium and small sound effects models.ComfyUIAI Sound Effect Generator
  7. 12:232:02Multi-sampler audio modificationThe video shows a workflow using two samplers in sequence, where the first stops at step four and the second continues, allowing for variations in the final audio output.ComfyUIAI Audio Editor
  8. Watch “Make High Quality Music in ComfyUI - Low VRAM!” →

AI Music Generator

  1. 0:441:57Generate audio from text in ComfyUI with Stable Audio 3The creator demonstrates setting up a ComfyUI workflow using the Stable Audio 3 Medium model and T5 Gemma text encoder to generate a 30-second instrumental audio clip from a text prompt.pixaroma
  2. 3:340:28Using Tiled VAE Decode for low VRAM audio generationThe narrator shows how to replace the standard VAE decode node with a tiled version in ComfyUI to handle longer audio generations (120 seconds) on hardware with lower VRAM.pixaroma
  3. 8:420:58Enhance audio prompts with Gemma 4A workflow is shown where a simple piano prompt is processed by a Gemma 4 model to generate a more detailed audio prompt before being passed to the Stable Audio sampler.pixaroma
  4. 10:550:28Image-to-Music generation in ComfyUIThe creator demonstrates an experimental workflow that loads an image of a bunny, uses Gemma 4 to describe it as a music prompt, and then generates corresponding audio.pixaroma
  5. 1:191:36Generate music with Stable Audio 3 Medium in ComfyUICurrentThe creator demonstrates loading the Stable Audio 3 medium model in ComfyUI, setting up a prompt for 'gothic techno', and configuring the audio duration to 95 seconds before generating the track.Nerdy Rodent
  6. 3:330:34Low VRAM audio generation with Small modelThe user switches to the 2GB 'small' model in ComfyUI to demonstrate audio generation suitable for low-end GPUs.Nerdy Rodent
  7. 6:311:11AI Prompt Generation with GemmaThe creator shows an 'audio to text to audio' pipeline using Gemma to describe an audio input and generate a new prompt for the music generation node.Nerdy Rodent
  8. 10:170:59Audio conditioning with voice inputThe user demonstrates using a 30-second vocal recording as an input combined with a text prompt and a linear quadratic scheduler to generate a new track.Nerdy Rodent
  9. 1:151:28Setting up Stable Audio 3.0 in ComfyUIThe creator demonstrates how to access the Stable Audio 3.0 workflow via ComfyUI templates, download the necessary checkpoint and text encoder models, and navigate the prompt nodes for music and sound effect generation.REBEL AI