Skip to main content

❓ What Is Text-to-Speech (TTS)?

Quick guide: Understand what Text-to-Speech is, how it functions, and how to manage its settings and downloads.

Updated over 2 weeks ago

Text-to-Speech (TTS) is a technology that converts written text into spoken audio using synthetic voices.

Click here to access this TTS tool.


What Does TTS Do? 🗣️

  • TTS reads digital text aloud. For example:

  • You type "Hello, how are you?"

  • The TTS system uses a computer-generated voice to say it out loud.

How Does TTS Work? ❓

1. Text Input

You enter or upload the text you want to convert to speech.

2. Linguistic Analysis

The system analyzes pronunciation, punctuation, and intonation to prepare the text for natural-sounding speech.

3. Speech Synthesis

It generates audio using either pre-recorded voice samples or AI-generated voices.

What Settings Can I Adjust in CAMB.ai's TTS? ⚙️

  • Select Voice: You can choose a default voice from the portal or you can create and upload your own. Please check here about voices.

  • Language: Select the language of the input text.

Enter Your Text

Start by typing or pasting the text you want to convert into speech in the Input text box.

Select Your Intent

Choose what you’re trying to build from the Select your intent dropdown. This helps optimize the speech output for your use case.

  • Expressive Dubbing
    Best for emotional and performance-driven voiceovers, such as movies or dramatic scenes.

  • Audiobooks
    Designed for long-form narration with clear pronunciation and consistent pacing.

  • Digital Media
    Ideal for ads, social media videos, explainer videos, and online marketing content.

  • Real-time Voice Agents
    Optimized for interactive systems that respond instantly, such as virtual assistants.

  • Call Centers
    Suitable for IVR systems, customer support, and automated call handling.

  • Live Conversational AI
    Built for natural, back-and-forth conversations with human-like flow.

  • Film & TV Dubbing
    Focuses on professional dubbing with natural timing and cinematic delivery.

  • Precise Prosody Control
    Enables fine control over tone, pitch, pauses, and emphasis for advanced voice tuning.

  • Creative Editing Workflows
    Designed for post-production, sound design, and creative audio editing workflows.

Choose Model

Select the voice generation model. By default, MARS8-Pro is used for high-quality results.

Language and Voice

  • Source Language – Select the language of your input text.

  • Voice – Choose a voice style that fits your content (for example, Sports Commentary).

  • Output Type – Select the audio format for the generated file (e.g., FLAC).

Note: The model works best when the selected voice matches the source language.

Generate Speech

Once everything is set, click Start Generating Speech to create your audio output.

You can also explore automation by clicking Try it as an API.


How to Generate and Download Audio? ▶️

Steps:

  1. Click on "Generate Speech" to create the audio from your text input.

  2. Once the audio is generated, you can listen to it and download the output by clicking on "Download".

  • You can try TTS by clicking 'Try with sample text', then click 'Generate Speech' to hear the sample audio.

Voice Settings (Advanced)

The Voice Settings section allows you to fine-tune how the generated voice sounds. These options help improve clarity, accuracy, and overall voice quality. All settings are optional and can be enabled or disabled as needed.

Clean Reference

  • This option cleans the reference audio used for voice generation.

  • Enable it to reduce minor inconsistencies and improve overall voice clarity.

Enhance Reference Audio Quality

  • Improves the quality of the reference audio by enhancing sharpness and reducing distortions.

  • Recommended when the uploaded reference audio is not studio-quality.

Maintain Source Accent

  • Preserves the original accent of the source voice.

  • Turn this on if you want the generated output to stay true to the speaker’s natural accent.

Enhance Named Entities

  • Improves the pronunciation of proper nouns such as names, places, brands, and technical terms.

  • Useful for professional content, narration, and informational audio.

Output Configuration

  • The output configuration section allows you to control how the final audio is generated and exported.

  • These settings help align the voice output with your project requirements.

Applying Settings

  • Each setting can be toggled on or off individually.

  • Once you’ve selected the required options, proceed with generating the audio to apply the changes.

Did this answer your question?