XCAT 3.0: Generate singing duet videos from photos

Demo summary
The demo shows MultiTalk processing two faces and two singing audio clips to create a synchronized musical performance video.
All demos from “MultiTalk: Open-source AI Turns Any Photo Into a Talking Video! (Free Veo 3 Alternative)”
0:220:28Single image lip-sync and emotion generation with MultiTalkThe video demonstrates MultiTalk taking a single still photo and an audio clip to generate a video where the character speaks with matching lip-sync and emotional expressions.XCAT 3.0· AI Avatar Video Generator
0:500:47Multi-person conversational video generationMultiTalk is shown animating a group image where two separate people interact and respond to each other using different audio tracks.XCAT 3.0· AI Avatar Video Generator
1:370:26Generate singing duet videos from photosCurrentThe demo shows MultiTalk processing two faces and two singing audio clips to create a synchronized musical performance video.XCAT 3.0· AI Avatar Video Generator
2:030:37Pose and movement transfer with MultiTalkThe tool demonstrates transferring body movements and dancing gestures from a reference video onto a static character image while maintaining lip-sync.XCAT 3.0· Video to Video- Watch “MultiTalk: Open-source AI Turns Any Photo Into a Talking Video! (Free Veo 3 Alternative)” →
AI Avatar Video Generator
8:224:37Generate talking head video from image and audioThe user demonstrates uploading a reference image and an audio clip to Multi-Talk, configuring background removal and text prompts to generate a video of a woman speaking in a park.AI Search
19:393:03Animate multiple speakers in a podcast sceneThe video shows how to configure Multi-Talk for two speakers by uploading an image of two people and two sequential audio clips, assigning voices based on their position in the frame.AI Search
22:173:21Parallel multi-speaker animationThe user demonstrates a more advanced multi-speaker setup where two audio tracks are played in parallel to animate a conversation between two people in a single reference image.AI Search
0:220:28Single image lip-sync and emotion generation with MultiTalkThe video demonstrates MultiTalk taking a single still photo and an audio clip to generate a video where the character speaks with matching lip-sync and emotional expressions.NadimExplainsAI
0:500:47Multi-person conversational video generationMultiTalk is shown animating a group image where two separate people interact and respond to each other using different audio tracks.NadimExplainsAI
1:370:26Generate singing duet videos from photosCurrentThe demo shows MultiTalk processing two faces and two singing audio clips to create a synchronized musical performance video.NadimExplainsAI
XCAT 3.0