demobook

ComfyUI: Setting up Wan 2.1 and InfiniteTalk models

Demo summary

A walkthrough of the model group in ComfyUI, showing the configuration of Wan Video Block Swap for VRAM management, the Light X2V LoRA for faster generation, and the InfiniteTalk GGUF model for audio conditioning.

Step-by-step

  1. Configure the Wan video torch compile settings to enable faster subsequent runs.
  2. Set the Wan Video Block Swap 'blocks to swap' setting to 15 to manage VRAM via CPU offloading.
  3. Load the Light X2V LoRA to reduce generation steps from 30-50 down to 4-10.
  4. Load the InfiniteTalk GGUF model using the Wan multi-talk model loader.
  5. Select the single person variant for audio conditioning unless syncing multiple speakers.
  6. Load the UMT5 double XL text encoder for prompt processing.
  7. Use the Clip Vision Loader to encode the reference image for identity preservation.
  8. Configure the Wave 2 vector model for acoustic feature encoding.

Options

  • Increase 'blocks to swap' if you encounter CUDA out of memory errors.
  • Use the multi-person variant of InfiniteTalk if you need to sync multiple speakers.

Watch out for

  • The first run after enabling torch compile will be slow.
  • Do not set the Light X2V LoRA strength to 1.0 as it causes color shifts and reduces identity preservation.
  • InfiniteTalk is not a standalone generator; it requires Wan 2.1 video to function.
  • You must use a video VAE rather than a standard image VAE to preserve motion coherence.

Tips

  • Set the Light X2V LoRA strength to 0.8 for the best balance of speed and facial consistency.
  • Use the Light X2V LoRA to reduce generation time from several hours to a fraction of that for long videos.

Highlights

This is where things get technically dense. We have six separate models loading here.

All demos from “AI Talking Head Videos With Perfect Lip Sync (ComfyUI + InfiniteTalk)

  1. 2:560:54Image preparation for InfiniteTalkThe video shows the process of loading a portrait image and resizing it to the specific 384x640 resolution required by the Wan video model using standard ComfyUI nodes.ComfyUIAI Crop Image
  2. 3:503:32Setting up Wan 2.1 and InfiniteTalk modelsCurrentA walkthrough of the model group in ComfyUI, showing the configuration of Wan Video Block Swap for VRAM management, the Light X2V LoRA for faster generation, and the InfiniteTalk GGUF model for audio conditioning.ComfyUIAI Animation Generator
  3. 9:081:03Sampling and video output generationThe demonstration shows the final sampling process using the Lightning LoRA settings (CFG 1, 7-10 steps) and combining the decoded frames with audio for the final video file.ComfyUIAI Animation Generator
  4. Watch “AI Talking Head Videos With Perfect Lip Sync (ComfyUI + InfiniteTalk)” →

AI Animation Generator

  1. 2:070:23Load Wan 2.2 Animate workflow and install nodesThe user demonstrates how to drag and drop the Wan 2.2 Animate workflow into ComfyUI and use the Manager to install missing custom nodes.MDMZ
  2. 2:551:05Configure video input and output settingsThe demo shows how to upload a source video to ComfyUI, set the frame count, and adjust the output dimensions to match the original aspect ratio.MDMZ
  3. 16:361:56Setting up HunyuanVideo 1.5 in ComfyUIThe video demonstrates how to update ComfyUI and import the HunyuanVideo 1.5 JSON workflow files to create a node-based generation environment.AI Search
  4. 20:151:53Text-to-Video generation in ComfyUIA step-by-step demo of configuring the Hunyuan nodes in ComfyUI, entering a prompt for a 'giant cat', and rendering the final 720p video.AI Search
  5. 29:151:33Running HunyuanVideo with GGUF (Low VRAM)The video shows how to use the GGUF loader node to run a compressed version of HunyuanVideo 1.5, enabling video generation on GPUs with as little as 6GB of VRAM.AI Search
  6. 0:594:52Configure LTX 2.3 in ComfyUIThe creator walks through the ComfyUI node setup for LTX 2.3, explaining the GGUF model loader, VAE settings, and how to adjust resolution and frame counts for optimal rendering.AIKnowledge2Go
  7. 1:230:41Setting up LTX-2.3 in ComfyUIThe creator demonstrates how to browse templates in ComfyUI, search for LTX 2.3, and download the required missing models for the text-to-video workflow.MDMZ
  8. 13:071:21Applying LoRA to Wan 2.2 video generationThe video shows how to integrate a LoRA (Low-Rank Adaptation) into a Wan 2.2 workflow to achieve specific cinematic movements like a face zoom.pixaroma
  9. 31:133:44Text-to-video with LTX-2The video walks through setting up the LTX-2 model in ComfyUI to generate high-resolution video clips from text prompts and images.pixaroma
  10. 39:075:27Cloud-based ComfyUI on RunPod/RunHubThe video shows how to run complex video workflows in the cloud using RunHub AI, demonstrating the interface and execution of InfiniteTalk and Wan 2.2 without local hardware.pixaroma
  11. 2:351:46Configure Infinite Talk models in ComfyUIThe creator demonstrates how to organize the necessary models within the ComfyUI workflow, including the Lightning LoRA, quantized Infinite Talk UNET models, and the Wan 2.1 VAE and Clip Vision nodes.Aiconomist
  12. 11:060:26Combining audio, images, and promptsA demonstration of layering a specific action prompt (patting stomach) over a specific audio timestamp to create a fully directed AI scene.What Dreams Cost