Camb.ai

Speaker Similarity controls how closely the generated voice matches the original speaker’s voice from the reference audio. It helps maintain the same voice identity when audio is generated, translated, or dubbed into another language.

⭐ Why Speaker Similarity Is Important? 

The same speaker identity across languages Natural and believable AI-generated speech Consistency in tone, pitch, and vocal character. 

Better listener experience without sudden voice changes. 

- The same speaker identity across languages Natural and believable AI-generated speech Consistency in tone, pitch, and vocal character. 
- Better listener experience without sudden voice changes. 

- 🎬 AI Dubbing
- 🧑‍🤝‍🧑 Voice Cloning
- 🌐 Multilingual Narration
- 📦 Content Localization

Go to Speaker Settings Open your project and navigate to Speaker Settings.

Select a Speaker Choose the speaker you want to configure (for example, Speaker 4).

- Enter or edit the Speaker Name
- Select the Gender

(Optional) Improve Reference Audio

- Enable Clean Reference to remove noise from the reference audio
- Enable Maintain Source Accent if you want to keep the original accent

Adjust the Speaker Similarity Slider

- Move the Speaker Similarity slider to control how closely the generated voice matches the original speaker
- Recommended: The default Speaker Similarity value is 0.7, which provides a good balance between voice accuracy and natural sound.
- Higher values sound more like the original voice, lower values give more flexibility

- Choose the language
- Select the voice model (for example, Voice From Original Media)

Click Confirm and save the speaker.

- Go to Speaker Settings Open your project and navigate to Speaker Settings.
- Select a Speaker Choose the speaker you want to configure (for example, Speaker 4).
- Set Basic Details
 - Enter or edit the Speaker Name
 - Select the Gender
- (Optional) Improve Reference Audio
 - Enable Clean Reference to remove noise from the reference audio
 - Enable Maintain Source Accent if you want to keep the original accent
- Adjust the Speaker Similarity Slider
 - Move the Speaker Similarity slider to control how closely the generated voice matches the original speaker
 - Recommended: The default Speaker Similarity value is 0.7, which provides a good balance between voice accuracy and natural sound.
 - Higher values sound more like the original voice, lower values give more flexibility
- Select Voice Models
 - Choose the language
 - Select the voice model (for example, Voice From Original Media)
- Click Confirm and save the speaker.

___________________________________________________________

What Is Voice Fine-Tuning Sliders?

These sliders help balance realism and clarity. These options are enabled by default. Use the slider to adjust and click Confirm to apply. Each speaker can be adjusted individually.

Higher values make the voice more consistent and less expressive.

- Controls how steady the voice sounds.
- Higher values make the voice more consistent and less expressive.

Adjusts how closely the AI voice matches the original speaker.

Higher values mean the voice sounds more like the original person.

- Adjusts how closely the AI voice matches the original speaker.
- Higher values mean the voice sounds more like the original person.

Strengthens the accent in the generated voice.

This is helpful when the accent feels too neutral or weak.

- Strengthens the accent in the generated voice.
- This is helpful when the accent feels too neutral or weak.

🔹Lower Values (e.g., 0.3 – 0.5) :

More flexibility in pronunciation and delivery 

- Reference audio quality is low 
- Exact voice matching is not required 

- Voice is less similar to the original 
- More flexibility in pronunciation and delivery 
- Useful when: 
  - Reference audio quality is low 
  - Exact voice matching is not required 

🔹Medium Values (Recommended: ~0.6 – 0.75) :

Voice sounds like the original speaker but remains natural 

- Best for most dubbing and narration use cases 

- Balanced similarity and clarity 
- Voice sounds like the original speaker but remains natural 
- Useful when: 
  - Best for most dubbing and narration use cases 

🔹Higher Values (e.g., 0.8 – 1.0) :

Voice sounds very close to the original speaker 

- High-quality reference audio is available 
- Voice consistency is critical 

- Voice sounds very close to the original speaker 
- Preserves vocal identity strongly 
- Useful when: 
  - High-quality reference audio is available 
  - Voice consistency is critical 

⚠️ Note: Noisy or distorted input audio may reduce naturalness if the reference audio has noise or distortion.

Speaker Similarity lets you decide how much of the original speaker’s voice identity is preserved in the output.

Speaker Similarity: What It Is & How to Use It? 🎙️

Sign in

Find answers and get help from Intercom Support and Community Experts

Conversations you've started through the messenger will appear here.

No conversations created by you

Try using different keywords or checking for typos.

No conversations found

Title

This site employs cookies and other technologies that we and our third party vendors use to monitor and record personal information about you and your interactions with the site (including content viewed, cursor movements, screen recordings, and chat contents) for the purposes described in our Cookie Policy. By continuing to visit our site, you agree to our {websiteTermsLink}, {privacyPolicyLink} and {cookiePolicyLink}.

This site uses cookies and similar technologies ("cookies") as strictly necessary for site operation. We and our partners also would like to set additional cookies to enable site performance analytics, functionality, advertising and social media features. See our {cookiePolicyLink} for details. You can change your cookie preferences in our Cookie Settings.

We use cookies to make our site work and also for analytics and advertising purposes. You can enable or disable optional cookies as desired. See our {cookiePolicyLink} for more details.

Advertising cookies are set by our advertising partners to collect information about your use of the site, our communications, and other online services over time and with different browsers and devices. They use this information to show you ads online that they think will interest you and measure the ads' performance. Social media cookies are set by social media platforms to enable you to share content on those platforms, and are capable of tracking information about your activity across other online services for use as described in their privacy policies.

These cookies enable the website to provide enhanced functionality and personalisation. They may be set by us or by third party providers whose services we have added to our pages. If you do not allow these cookies then some or all of these services may not function properly.

These cookies are necessary for the website to function and cannot be switched off in our systems.

These cookies allow us to count visits and traffic sources so we can measure and improve the performance of our site. They help us to know which pages are the most and least popular and see how visitors move around the site.

You have the right to opt out of the sale of your personal information. See our {cookiePolicyLink} for more details about how we use your data.

Your Privacy Choices

We use cookies to enhance your experience. You can customize your cookie preferences below. See our {cookiePolicyLink} for more details.

Cookie Settings

Empty Help Center

Uh oh. That page doesn’t exist.

Home

Search results

Disappointed

Neutral

Smiley

Thinking...

Searching through sources...

Analyzing...

Tickets submitted through the messenger or by a support agent in your conversation will appear here.

Speaker Similarity: What It Is & How to Use It? 🎙️

🗣️ What is Speaker Similarity?

⭐ Why Speaker Similarity Is Important?

⚖️ How to Use Speaker Similarity?

What Is Voice Fine-Tuning Sliders?

Stability

Speaker Similarity

Accent Boost