XCAT 3.0: Multi-person conversational video generation

Demo summary

MultiTalk is shown animating a group image where two separate people interact and respond to each other using different audio tracks.

Use contrasting audio tracks, such as one person giving advice and another responding, to create a natural conversational flow

“It gets even better. Select a group image. Upload two separate voice clips... and watch as multi-talk generates a real conversation”

8:224:37Generate talking head video from image and audioThe user demonstrates uploading a reference image and an audio clip to Multi-Talk, configuring background removal and text prompts to generate a video of a woman speaking in a park.AI Search
19:393:03Animate multiple speakers in a podcast sceneThe video shows how to configure Multi-Talk for two speakers by uploading an image of two people and two sequential audio clips, assigning voices based on their position in the frame.AI Search
22:173:21Parallel multi-speaker animationThe user demonstrates a more advanced multi-speaker setup where two audio tracks are played in parallel to animate a conversation between two people in a single reference image.AI Search
0:500:47Multi-person conversational video generationCurrentMultiTalk is shown animating a group image where two separate people interact and respond to each other using different audio tracks.NadimExplainsAI