Google speech to text online demo

12/3/2023 0 Comments

Google speech to text online demo

See the get-speech-service instructions and epg-ui instructions. Deploymentĭeployment instructions are contained within the component directories. The epg-ui plays the synthesised audio to the user.įor more information on these components see the get-speech-service and epg-ui instructions. If the audio file is not in cache it will be loaded from the GCS Bucket and cached for future requests. If the file is in cache it will be returned directly from the CDN. The epg-ui (or alternate consuming client) loads the audio file from Cloud CDN. The get-speech-service returns a response to the epg-ui (or alternate consuming client) containing a Cloud CDN Signed URL that provides access to the synthesised audio. The get-speech-service generates a Signed URL to provide secure, time bound access to the synthesised audio. The get-speech-service writes the synthesised audio from the Text-to-Speech Service to the GCS Bucket. If there is no existing synthsised audio file in the GCS Bucket, the get-speech-service sends the text to the Text-to-Speech Service to be synthesised. It checks to see if there is already a synthesised audio file in GCS Bucket. The get-speech-service generates a hash of the text to be synthesised, and any optional configuration passed with the request. The epg-ui (or real life client such as a set-top box) sends a POST request to the get-speech-service containing a JSON payload which contains the text to be synthesised, alongside optional configuration parameters. When the get-speech-service starts, it makes a call to Secrets Manager and loads the Cloud CDN signing key for later use to provide secure, time bound URL's for accessing synthesised audio.Ī user clicks an item to be spoken on the epg-ui (or real life client such as a set-top box). The below diagram illustrates how the components communicate: Secrets Manager: Used for storing the signing key for Cloud CDN to provide secure, time bound URL's for accessing synthesised audio.Cloud CDN: Content delivery network used for delivering synthesised speech audio.Cloud Storage: Used for storage of the synthesised speech audio.Cloud Run: Used for hosting the get-speech-service and epg-ui.Text-to-Speech: Used for synthesising text to audio.This demo makes use of the following Google Cloud Services: Some lightweight JavaScript then calls the get-speech-service to fetch an URL for the synthesised audio, which is then played back to the user. It provides a mock EPG where users can click elements. The epg-ui is an extremely simple static website that is used for emulating an EPG for demo purposes. This has significant performance benefits and provides cost savings. If so, a new Signed URL it is immediately generated and returned to the client for the existing file, avoiding the need to re-synthesise the audio. On each request, the get-speech-service also checks if synthesised audio for the requested text payload (and associated configuration) already exists. The get-speech-service then sends the text from the request to the Google Cloud Text-to-Speech API and saves the resulting audio file in Google Cloud Storage.įinally, a time-bound Signed URL is generated for the resulting audio file which is returned to the client to be played to the user. The get-speech-service is a web service written in Golang that is responsible for handling requests for speech synthesis.Ĭlients send a POST request to the /getSpeech endpoint containing the text that needs to be synthesised, alongside some additional configuration parameters. Note: This is not an officially supported Google product Component These components, alongside the utilised Google Cloud Services make it possible to deliver a highly performant, and extremely cost effective implementation of text-to-speech for EPG's and other similar use cases.Ī hosted version of this demo is available here. This repository contains two supporting components that augment the Google Cloud Text-to-Speech API to deliver a reference implementation and demo for the EPG use-case. This is particularly relevant due to new Ofcom guidance that mandates that EPG's must offer text-to-speech functionality in order to meet customers accessibility requirements. This repository contains a reference implementation demonstrating how the Google Cloud Text-to-Speech API can be used to easily implement text-to-speech functionality for Electronic Program Guides (EPG's).

0 Comments

YOUR CART

Google speech to text online demo

Leave a Reply.

Author

Archives

Categories