ComfyUI: Generate and refine head masks with Florence 2 and SAM 2

ComfyUI·Jun 2026

Demo summary

The workflow shows using Florence 2 for object detection to target the head and SAM 2 for precise segmentation, including adjusting the 'grow mask expand' value to improve blending.

Step-by-step

Use Florence 2 to identify and target the head region
Pass the generated bounding box into SAM 2 for segmentation
Adjust the grow mask expand value to refine the mask boundaries

Watch out for

If the mask is too close to the existing outline, the model has less freedom to reinterpret shape and integrate the new face naturally

Tips

Ensure the grow mask expand value is large enough so the mask does not hug the silhouette too tightly
Adjust the grow mask expand setting first depending on the specific shot to allow the model room to be creative

All demos from “Learn How To: Face Swap”

0:290:24Load source footage and models in ComfyUIThe user demonstrates importing source video footage and loading the necessary model nodes including WAN Video, VAE, and Clip Vision within the ComfyUI interface.ComfyUI· AI Animation Generator
0:530:48Generate and refine head masks with Florence 2 and SAM 2CurrentThe workflow shows using Florence 2 for object detection to target the head and SAM 2 for precise segmentation, including adjusting the 'grow mask expand' value to improve blending.ComfyUI· AI Inpainting
1:410:46Prepare driving data and auto-prompts with Qwen2-VLThe demo shows running pose detection on source footage and using Qwen2-VL (referred to as Gwen VL) to generate a semantic text description from a reference image for the face swap.ComfyUI· AI Face Swap Generator
3:280:27Simplified face swap using ComfyUI App ModeA demonstration of the simplified 'App Mode' interface where users can upload footage and a reference image to perform a face swap without interacting with the node graph.ComfyUI· AI Face Swap Generator
Watch “Learn How To: Face Swap” →

AI Inpainting

16:571:43Generate composite image with Flux 2The creator demonstrates image-to-image generation by uploading two reference photos and using a text prompt to replace a logo on a coffee can with his own headshot using the Flux model.WINBUSH
20:010:28Text-to-image manipulation in ComfyUIThe video shows how to use a text prompt to attempt to rotate an object in an image within a specific node-based workflow.WINBUSH
1:111:34Configure LTX Latent Anchor Aware nodeThe video shows how to set up the LTX Latent Anchor Aware node in a ComfyUI workflow, explaining how to connect the reference image and adjust strength parameters to prevent identity drift.The AI Girlie
0:361:56Creative replacement with Flux KontextThe user demonstrates how to change a specific part of an image, such as replacing a building entrance with a glass door, using the Flux Kontext model in ComfyUI while maintaining the original scene context.Urban Decoders
5:200:53Change season to winter in architectural scenesThe demonstration shows how to use a text prompt to transform a summer architectural shot into a winter scene with snow on rooftops and trees using Flux Kontext.Urban Decoders
6:471:52Selective inpainting with crop and stitchUsing the 'ComfyUI inpaint crop and stitch' nodes, the user demonstrates masking a specific window area to insert a glowing LED digital facade.Urban Decoders
12:540:41Editing text on shop frontsThe demonstration shows how to use Flux Kontext to change specific text on a shop sign while maintaining the original font style and position.Urban Decoders
15:063:12Transform 3D scenes with Qwen2-VL (Qwen-Edit)The creator uses the Qwen-Edit workflow to perform complex scene modifications, such as changing weather to rain or snow, while preserving the original 3D geometry and text.Matt Hallett Visual
20:221:07Fix text and faces using Crop and StitchA demonstration of a custom 'crop image' node to isolate specific areas like signs or faces, regenerate them at native resolution, and stitch them back into the high-res image.Matt Hallett Visual
7:201:48Edit image details with Flux KontextThe user shows how to load a reference image and use a text prompt to change specific features, such as changing a cartoon bunny's eye color to red.pixaroma
19:191:08Modify hairstyles and facial featuresThe creator shows how to change a subject's hair color and style (e.g., to a blunt bob with bangs) while attempting to maintain facial consistency using Flux Kontext.pixaroma
21:030:26Edit text within an imageA demonstration of Flux Kontext's ability to change text on a 3D render from 'Welcome' to 'Pixaroma' while maintaining the original font style and perspective.pixaroma