When processing videos with complex audio—such as those with background noise or specific vocal requirements—fine-tuning your speaker settings is essential. Use this guide to determine which slider will best solve your specific audio challenges.
Quick Reference Table
Feature | When to Use | Recommended Value |
Clean Reference | Public settings, background cheering, or poor mic quality. | Off - if the audio has been already processed and cleaned |
Acoustic Boost | If the output voice sounds "hazy" or muffled. | Off - if output is already clear |
Speaker Stability | To make the speech more "expressive" or dynamic. | Decrease Value towards 0 to make speech more expressive. |
Speaker Similarity | When cloning permission is not available. | 0.7-1 (Standard) for exact cloning |
Accent Boost | Non-English to English voiceover only. | 0 (for source language affected accent) 1 (for target English regional accent) |
Please remember to click on "Confirm" to save any changes in setting. The new settings will apply only to all dialogues of this speaker generated after the change is made. If you want to apply to all dialogue of a speaker, then select all dialogue of that speaker and click "Re-generate all dialogues"
Same speaker settings apply to all target languages, but there is no need to apply same settings to all speakers of a video. This is highly subjective to how much voice data one speaker has in the video.
You can use intermediate values to achieve varying levels of trade-off across the settings.
Deep Dive: Setting Descriptions
1. Clean Reference
Use this setting when your source audio is "dirty." If the video was filmed in a public space with singing, cheering, or low-quality microphones, this will help isolate the primary speaker from the environment.
2. Acoustic Boost
This is your primary tool for clarity. If the output sounds hazy or lacks definition,
toggle this on to sharpen the vocal profile.
3. Speaker Stability
This controls how "robotic" or "human" the voice sounds.
For more expressive speech: Decrease the value. This allows for more natural inflection and emotion.
4. Speaker Similarity
This adjusts how closely the AI mimics the original voice.
0.0 – 0.4: Use this when you want the speaker to sound like a different person.
0.7 (Default): The "sweet spot" for most standard speakers.
1.0: Reserved for special cases where the output doesn't sound enough like the original.
Note: Use lower settings if you do not have explicit permission to clone a specific voice.
5. Accent Boost (Non-English to English only)
Specifically designed for carrying over native accented English.
0 (Default): Retains the native tongue's accent in the English output. Even an 0.3 accent boost can remove the accent from the source language in some cases
0.7 – 1.0: Use this to achieve unaccented American / British English.
