demobook

XCAT 3.0: Animate multiple speakers in a podcast scene

Demo summary

The video shows how to configure Multi-Talk for two speakers by uploading an image of two people and two sequential audio clips, assigning voices based on their position in the frame.

Step-by-step

  1. Upload an image containing two people
  2. Select the number of speakers as two
  3. Choose the 'in a row' option for sequential audio playback
  4. Upload the first audio clip for the speaker on the left
  5. Upload the second audio clip for the speaker on the right
  6. Enter a prompt describing the scene and the subjects
  7. Adjust the number of frames to match the total duration of the audio clips
  8. Enable TCH and set the speed to 2x to accelerate generation
  9. Drag the TCH slider to the 10-second mark
  10. Click Generate

Options

  • Automatic detection of speakers from a single audio clip
  • Parallel audio playback for simultaneous speaking
  • Adjustable TCH start times and speed multipliers

Watch out for

  • The AI assumes the person on the left speaks the first audio clip and the person on the right speaks the second
  • The automatic speaker detection option often fails to correctly identify which speaker says which part

Tips

  • Use the 'in a row' or 'parallel' methods instead of automatic detection for better results
  • Ensure your image composition matches the audio order (e.g., female on the left if the first voice is female)
  • Set the TCH slider to start at the 10% mark to optimize generation speed for longer videos

Highlights

you can see it animates this very well. The lip sync is very good.

All demos from “Make AI videos with talking + pose + reference control. MultiTalk & VACE tutorial

  1. 5:271:27Overview of the Wan2GP interface for Multi-TalkThe creator walks through the Wan2GP Gradio interface, explaining how to select the Multi-Talk model and the specific 'Vase Multi-Talk Fusion X' version for better performance on low VRAM.XCAT 3.0AI Animation Generator
  2. 8:224:37Generate talking head video from image and audioThe user demonstrates uploading a reference image and an audio clip to Multi-Talk, configuring background removal and text prompts to generate a video of a woman speaking in a park.XCAT 3.0AI Avatar Video Generator
  3. 13:571:32Simulate angry expressions with Multi-TalkThe demo shows how to use an angry reference image and matching audio to generate a highly expressive video that captures the pitch and intensity of the speaker's anger.XCAT 3.0AI Lip Sync Generator
  4. 15:291:10Animate sad emotions and cryingThe creator demonstrates Multi-Talk's ability to handle complex emotions by animating a sad character who pauses and breathes in sync with a crying audio track.XCAT 3.0AI Lip Sync Generator
  5. 17:441:28Lip-syncing anime charactersA demonstration of applying Japanese audio to an anime still image, showing how the tool handles non-human characters and different languages.XCAT 3.0AI Lip Sync Generator
  6. 19:393:03Animate multiple speakers in a podcast sceneCurrentThe video shows how to configure Multi-Talk for two speakers by uploading an image of two people and two sequential audio clips, assigning voices based on their position in the frame.XCAT 3.0AI Avatar Video Generator
  7. 22:173:21Parallel multi-speaker animationThe user demonstrates a more advanced multi-speaker setup where two audio tracks are played in parallel to animate a conversation between two people in a single reference image.XCAT 3.0AI Avatar Video Generator
  8. 26:322:34Transfer human motion with VACE and Multi-TalkThe demo shows how to use a control video of a person dancing to drive the body movements of a reference image while simultaneously applying a Spanish lip-sync track.XCAT 3.0Video to Video
  9. Watch “Make AI videos with talking + pose + reference control. MultiTalk & VACE tutorial” →

AI Avatar Video Generator

  1. 8:224:37Generate talking head video from image and audioThe user demonstrates uploading a reference image and an audio clip to Multi-Talk, configuring background removal and text prompts to generate a video of a woman speaking in a park.AI Search
  2. 19:393:03Animate multiple speakers in a podcast sceneCurrentThe video shows how to configure Multi-Talk for two speakers by uploading an image of two people and two sequential audio clips, assigning voices based on their position in the frame.AI Search
  3. 22:173:21Parallel multi-speaker animationThe user demonstrates a more advanced multi-speaker setup where two audio tracks are played in parallel to animate a conversation between two people in a single reference image.AI Search
  4. 0:500:47Multi-person conversational video generationMultiTalk is shown animating a group image where two separate people interact and respond to each other using different audio tracks.NadimExplainsAI