ComfyUI: Prepare driving data and auto-prompts with Qwen2-VL

ComfyUI·Jun 2026

Demo summary

The demo shows running pose detection on source footage and using Qwen2-VL (referred to as Gwen VL) to generate a semantic text description from a reference image for the face swap.

Step-by-step

Run pose and face detection on the original source footage
Import a reference image to define the new identity
Generate an auto-prompt using Qwen2-VL to analyze the reference image
Review the generated text description
Copy and paste the prompt into the One Video Text and Code Cached node

Tips

Use auto-prompts to build a better semantic understanding of the replacement person rather than just copying pixels

All demos from “Learn How To: Face Swap”

0:290:24Load source footage and models in ComfyUIThe user demonstrates importing source video footage and loading the necessary model nodes including WAN Video, VAE, and Clip Vision within the ComfyUI interface.ComfyUI· AI Animation Generator
0:530:48Generate and refine head masks with Florence 2 and SAM 2The workflow shows using Florence 2 for object detection to target the head and SAM 2 for precise segmentation, including adjusting the 'grow mask expand' value to improve blending.ComfyUI· AI Inpainting
1:410:46Prepare driving data and auto-prompts with Qwen2-VLCurrentThe demo shows running pose detection on source footage and using Qwen2-VL (referred to as Gwen VL) to generate a semantic text description from a reference image for the face swap.ComfyUI· AI Face Swap Generator
3:280:27Simplified face swap using ComfyUI App ModeA demonstration of the simplified 'App Mode' interface where users can upload footage and a reference image to perform a face swap without interacting with the node graph.ComfyUI· AI Face Swap Generator
Watch “Learn How To: Face Swap” →

AI Face Swap Generator

1:410:46Prepare driving data and auto-prompts with Qwen2-VLCurrentThe demo shows running pose detection on source footage and using Qwen2-VL (referred to as Gwen VL) to generate a semantic text description from a reference image for the face swap.ComfyUI
3:280:27Simplified face swap using ComfyUI App ModeA demonstration of the simplified 'App Mode' interface where users can upload footage and a reference image to perform a face swap without interacting with the node graph.ComfyUI
2:081:55Configuring LTX 2.3 Face Swap Workflow in ComfyUIThe creator walks through a custom ComfyUI workflow, showing how to load the LTX 2.3 model with the face swap LoRA and set up resolution scaling and frame count calculations.Veteran AI
9:521:03Testing Face Swap with Scene CutsThe video demonstrates how the LTX 2.3 face swap LoRA handles a reference video with a camera cut and clothing change, showing its ability to maintain character features across transitions.Veteran AI
1:110:46Inpainting face swap with ComfyUIThe creator demonstrates an inpainting face swap workflow in ComfyUI by loading a reference and target image to achieve a high-resemblance result on a close-up shot.Yaroflasher
1:09:444:55Face swapping with InstantIDDemonstrates the full setup and execution of InstantID to swap Will Smith's face onto a generated policeman character in both realistic and watercolor styles.AI Search