Extract and merge audio in CapCut

Sirio·Dec 2025

Demo summary

The user demonstrates how to use CapCut to extract audio from multiple AI video clips and export them as a single MP3 to prepare for voice cloning.

Step-by-step

Right-click the video clip in the timeline
Select Extract audio
Add additional clips to the project as needed
Go to Voice effects and adjust the pitch
Go to File and select Export
Select Audio only and choose MP3 format
Click Export

Options

Import videos from Sora 2, Kling 2, or VO 3.1 to extract their audio
Merge two to five different audio clips to create a unique voice

Watch out for

You need at least 15 seconds of audio for voice training

Tips

Aim for 30 seconds of audio to get better results during the training phase
Adjust the pitch in voice effects so the audio doesn't sound like a generic AI generation

Highlights

“this is good”

All demos from “I gave my AI influencer a consistent voice (new method)”

1:501:20Generate a base image with Kora ProThe creator demonstrates how to use the Kora Pro model within the Enhancor platform to generate a high-quality, realistic base image of an AI influencer on a white background.Enhancor· AI Realistic Image Generator
3:420:45Create a character contact sheet with Nano BananaThe user shows how to generate a 16x9 contact sheet using the Nano Banana model to create multiple poses and environments for the same character in a single generation to save costs.Enhancor· AI Person Generator
5:351:33Train a custom influencer model in EnhancorThe video walks through uploading a dataset of images to the Enhancor 'influencer' tab, setting a unique trigger word, and training a LoRA-style model for consistent character generation.Enhancor· AI Avatar Generator
8:270:55Generate talking clips with Kling and VO 3.1The creator demonstrates using the Kling 2.6 and Google VO 3.1 models to generate short video clips of the AI influencer speaking to extract audio for voice training.Enhancor· AI Avatar Video Generator
9:330:56Extract and merge audio in CapCutCurrentThe user demonstrates how to use CapCut to extract audio from multiple AI video clips and export them as a single MP3 to prepare for voice cloning.CapCut· AI Audio Editor
12:000:22Lip-sync AI video in EnhancorThe creator demonstrates using the lip-syncing tool in Enhancor to match the generated influencer image with the custom Hume AI voice clip.Enhancor· AI Lip Sync Generator
Watch “I gave my AI influencer a consistent voice (new method)” →

AI Audio Editor

3:340:39Adjust audio levels using waveformsThe creator shows how to open the waveform indication level and adjust clip volume to ensure spoken audio stays between -6 and -12 dB.Matt Loui
9:330:56Extract and merge audio in CapCutCurrentThe user demonstrates how to use CapCut to extract audio from multiple AI video clips and export them as a single MP3 to prepare for voice cloning.Sirio
40:513:31Manual audio ducking with keyframesThe tutorial provides a detailed walkthrough of setting volume keyframes to 'duck' background music when narration begins.Metics Media
30:463:59Adding sound effects and background musicThe creator demonstrates syncing whoosh and glitch sound effects to visual animations and adjusting background music volume to -20dB.Content Creators