Text-to-Speech AI: Lifelike Speech Synthesis | Google Cloud (2024)

Convert text into natural-sounding speech using an API powered by the best ofGoogle’s AI technologies.

New customers get up to $300 in free credits to try Text-to-Speech and other Google Cloud products.

Try Text-to-Speech freeContact sales

  • Improve customer interactions with intelligent, lifelike responses

  • Engage users with voice user interface in your devices and applications

  • Personalize your communication based on user preference of voice and language

Benefits

High fidelity speech

Deploy Google’s groundbreaking technologies to generate speech with humanlike intonation. Built based on DeepMind’s speech synthesis expertise, the API delivers voices that are near human quality.

Widest voice selection

Choose from a set of 380+ voices across 50+ languages and variants, including Mandarin, Hindi, Spanish, Arabic, Russian, and more. Pick the voice that works best for your user and application.

One-of-a-kind voice

Create a unique voice to represent your brand across all your customer touchpoints, instead of using a common voice shared with other organizations.

Demo

Put Text-to-Speech into action

Type what you want, select a language then click “Speak It” to hear.

Key features

Key features

Journey voices (Experimental)

Build engaging agents using the latest spontaneous conversational voices based on AudioLM.

Studio voices

Dazzle your listeners with professionally narrated content recorded in a studio-quality environment. Make sure to put your headphones on!

Neural2 voices

Internationalize your voice experience with ready to usevoices powered by the latest research behind Custom Voice.

Custom Voice

Train a custom voice model using your own audio recordings to create a unique and more natural sounding voice for your organization. You can define and choose the voice profile that suits your organization and quickly adjust to changes in voice needs without needing to record new phrases.

Text and SSML support

Customize your speech with SSML tags that allow you to add pauses, numbers, date and time formatting, and other pronunciation instructions.

View all features

What's new

What's new

Sign up for Google Cloud newsletters to receive product updates, event information, special offers, and more.

Blog postGoogle Cloud Text-to-Speech API now supports custom voicesRead the blog
Blog postConversational AI drives better customer experiencesRead the blog
Blog postNew voices and languages for Text-to-SpeechRead the blog

Documentation

Documentation

Google Cloud Basics

Text-to-Speech basics

A guide to the fundamental concepts of using the Text-to-Speech API.

Learn more

Quickstart

Quickstart: Using the command line

Set up your Google Cloud project and authorization and make a request for Text-to-Speech to create audio from text.

Learn more

Google Cloud Basics

Supported voices and languages

Browse guides and resources for this product.

Learn more

Google Cloud Basics

Custom Voice (beta) overview

Learn how you can create a unique and more natural-sounding voice with Custom Voice using your own studio-quality audio recordings.

Learn more

Tutorial

WaveNet and other synthetic voices

Learn about the different synthetic voices available for use in Text-to-Speech, including the premium WaveNet voices.

Learn more

Tutorial

Speaking addresses with SSML

This tutorial demonstrates how to use Speech Synthesis Markup Language (SSML) to speak a text file of addresses.

Learn more

Not seeing what you’re looking for?

View all product documentation

Explore more docs

  • Get a quick intro to using this product.
  • Learn to complete specific tasks with this product.
  • Browse guides and tutorials for this product.
  • View APIs, references, and other resources for this product.
Release notesRead about the latest releases for Text-to-Speech

Use cases

Use cases

Use case

Voicebots in contact centers

Deliver a better voice experience for customer service with voicebots on Dialogflowthat dynamically generate speech, instead of playing static, pre-recorded audio. Engage with high-quality synthesized voices that give callers a sense of familiarity and personalization.

Text-to-Speech AI: Lifelike Speech Synthesis | Google Cloud (7)

Use case

Voice generation in devices

Enable natural communications with your users by empowering your devices to speak humanlike voices as a text reader. Build an end-to-end voice user interface together with Speech-to-Text and Natural Language to improve user experience with easy and engaging interactions.

Text-to-Speech AI: Lifelike Speech Synthesis | Google Cloud (8)

Use case

Accessible EPGs (Electronic Program Guides)

Easily have the EPGs read text aloud to provide a better user experience to your customers and meet accessibility requirements for your services and applications. Try the EPG demo.

Easily implement text-to-speech functionality in EPGs to provide a better user experience to your customers and meet accessibility requirements for your services and applications.

Text-to-Speech AI: Lifelike Speech Synthesis | Google Cloud (9)

View all technical guides

All features

All features

Custom Voice

Train a custom speech synthesis model using your own audio recordings to create a unique and more natural-sounding voice for your organization. You can define and choose the voice profile that suits your organization and quickly adjust to changes in voice needs without needing to record new phrases. Learn more.

Long audio synthesis

Asynchronously synthesize up to 1 million bytes of input withLong Audio Synthesis.

Voice and language selection

Choose from an extensive selection of 220+ voices across 40+ languages and variants, with more to come soon.

WaveNet voices

Take advantage of 90+ WaveNet voices built based on DeepMind’s groundbreaking research to generate speech that significantly closes the gap with human performance.

Text and SSML support

Customize your speech with SSML tags that allow you to add pauses, numbers, date and time formatting, and other pronunciation instructions.

Pitch tuning

Personalize the pitch of your selected voice, up to 20 semitones more or less than the default.

Speaking rate tuning

Adjust your speaking rate to be 4x faster or slower than the normal rate.

Volume gain control

Increase the volume of the output by up to 16db or decrease the volume up to -96db.

Integrated REST and gRPC APIs

Easily integrate with any application or device that can send a REST or gRPC request including phones, PCs, tablets, and IoT devices (e.g., cars, TVs, speakers).

Audio format flexibility

Convert text to MP3, Linear16, OGG Opus, and a number of other audio formats.

Audio profiles

Optimize for the type of speaker from which your speech is intended to play, such as headphones or phone lines.

Pricing

Pricing

Text-to-Speech is priced based on the number of characters sent to the service to be synthesized into audio each month. The first 1 million characters for WaveNet voices are free each month. For Standard (non-WaveNet) voices, the first 4 million characters are free each month. After the free tier has been reached, Text-to-Speech is priced per 1 million characters of text processed.

If you pay in a currency other than USD, the prices listed in your currency on Google Cloud SKUs apply.

View pricing details

Take the next step

New customers get $300 in free credits to try Text-to-Speech and other Google Cloud products.

Try Text-to-Speech free

  • Need help getting started?
    Contact sales
  • Work with a trusted partner
    Find a partner
  • Continue browsing
    See all products
Text-to-Speech AI: Lifelike Speech Synthesis | Google Cloud (2024)
Top Articles
Latest Posts
Article information

Author: Stevie Stamm

Last Updated:

Views: 6127

Rating: 5 / 5 (80 voted)

Reviews: 95% of readers found this page helpful

Author information

Name: Stevie Stamm

Birthday: 1996-06-22

Address: Apt. 419 4200 Sipes Estate, East Delmerview, WY 05617

Phone: +342332224300

Job: Future Advertising Analyst

Hobby: Leather crafting, Puzzles, Leather crafting, scrapbook, Urban exploration, Cabaret, Skateboarding

Introduction: My name is Stevie Stamm, I am a colorful, sparkling, splendid, vast, open, hilarious, tender person who loves writing and wants to share my knowledge and understanding with you.