ComfyUI: Generate temporal face swap with WAN Video

Demo summary
The user combines the reference identity, pose data, and masks into the WAN Video sampler to generate a temporally consistent video sequence.
Step-by-step
- Combine the reference identity, pose information, original background, and mask into the WAN Video sampler
- Run the sampler to generate the new sequence
- Decode the latent result back into image frames
- Reassemble the frames into a final video
- Import the original audio to sync with the new footage
Options
- Re-import original audio to preserve source sync
All demos from “Learn How To: Face Swap”
0:290:24Load source footage and models in ComfyUIThe user demonstrates importing source video footage and loading the necessary model nodes including WAN Video, VAE, and Clip Vision within the ComfyUI interface.ComfyUI· AI Animation Generator
0:530:48Generate and refine head masks with Florence 2 and SAM 2The workflow shows using Florence 2 for object detection to target the head and SAM 2 for precise segmentation, including adjusting the 'grow mask expand' value to improve blending.ComfyUI· AI Inpainting
1:410:46Prepare driving data and auto-prompts with Qwen2-VLThe demo shows running pose detection on source footage and using Qwen2-VL (referred to as Gwen VL) to generate a semantic text description from a reference image for the face swap.ComfyUI· AI Face Swap Generator
2:270:35Generate temporal face swap with WAN VideoCurrentThe user combines the reference identity, pose data, and masks into the WAN Video sampler to generate a temporally consistent video sequence.ComfyUI· AI Face Swap Video
3:280:27Simplified face swap using ComfyUI App ModeA demonstration of the simplified 'App Mode' interface where users can upload footage and a reference image to perform a face swap without interacting with the node graph.ComfyUI· AI Face Swap Generator- Watch “Learn How To: Face Swap” →
AI Face Swap Video
16:071:11Replace character face in generated videoA demonstration of a character replacement pass that swaps a face in a Sora-generated video with a specific reference photo to maintain identity consistency.Yaroflasher
2:270:35Generate temporal face swap with WAN VideoCurrentThe user combines the reference identity, pose data, and masks into the WAN Video sampler to generate a temporally consistent video sequence.ComfyUI
ComfyUI