- From: Adam Sobieski <adamsobieski@hotmail.com>
- Date: Sat, 15 Sep 2018 06:35:43 +0000
- To: "public-speech-api@w3.org" <public-speech-api@w3.org>
- Message-ID: <CY4PR0101MB309564A2D9B2B655760DC48DC5180@CY4PR0101MB3095.prod.exchangelabs.com>
Introduction We can envision and consider client-side, server-side and third-party speech recognition, synthesis and translation scenarios for a next version of the Web Speech API. Advancing the State of the Art Speech Recognition Beyond speech-to-text, speech recognition includes speech-to-SSML and speech-to-hypertext. With speech-to-SSML and speech-to-hypertext, there can be a higher degree of fidelity possible for round-tripping speech audio through speech recognition and synthesis components or services. Speech Synthesis Beyond text-to-speech, speech synthesis includes SSML-to-speech and hypertext-to-speech<https://github.com/w3c/speech-api/issues/36>. Translation Translation scenarios include processing text, SSML, hypertext or audio in a source language into text, SSML, hypertext or audio in a target language. Desirable features include interoperability between client-side, server-side and third-party translation and WebRTC<https://www.w3.org/TR/webrtc/> with translations available as subtitles or audio tracks. Multimodal Dialogue Systems Interesting scenarios include Web-based multimodal dialogue systems which efficiently utilize client-side, server-side and third-party speech recognition, synthesis and translation. Client-side Scenarios Client-side Speech Recognition These scenarios are considered in the current version of the Web Speech API. Client-side Speech Synthesis These scenarios are considered in the current version of the Web Speech API. Client-side Translation These scenarios are new to the Web Speech API and involve the client-side translation of text, SSML, hypertext or audio into text, SSML, hypertext or audio. Server-side Scenarios Server-side Speech Recognition These scenarios are new to the Web Speech API and involve one or more audio streams from a client being streamed to a server which performs speech recognition, optionally providing speech recognition results to the client. Server-side Speech Synthesis These scenarios are new to the Web Speech API and involve a client sending text, SSML or hypertext to a server which performs speech synthesis and streams audio to the client. Server-side Translation These scenarios are new to the Web Speech API and involve a client sending text, SSML, hypertext or audio to a server for translation into text, SSML, hypertext or audio. Third-party Scenarios Third-party Speech Recognition These scenarios are new to the Web Speech API and involve one or more audio streams from a client or server being streamed to a third-party service which performs speech recognition providing speech recognition results to the client or server. Third-party Speech Synthesis These scenarios are new to the Web Speech API and involve a client or server sending text, SSML or hypertext to a third-party service which performs speech synthesis and streams audio to the client or server. Third-party Translation These scenarios are new to the Web Speech API and involve a client sending text, SSML, hypertext or audio to a third-party translation service for translation into text, SSML, hypertext or audio. Hyperlinks Amazon Web Services<https://aws.amazon.com/> * Speech to Text<https://aws.amazon.com/transcribe/> * Text to Speech<https://aws.amazon.com/polly/> * Translation<https://aws.amazon.com/translate/> Google Cloud AI<https://cloud.google.com/products/ai/> * Speech to Text<https://cloud.google.com/speech-to-text/> * Text to Speech<https://cloud.google.com/text-to-speech/> * Translation<https://cloud.google.com/translate/> IBM Watson Products and Services<https://www.ibm.com/watson/products-services/> * Speech to Text<https://www.ibm.com/watson/services/speech-to-text/> * Text to Speech<https://www.ibm.com/watson/services/text-to-speech/> * Translation<https://www.ibm.com/watson/services/language-translator/> Microsoft Cognitive Services<https://azure.microsoft.com/en-us/services/cognitive-services/> * Speech to Text<https://azure.microsoft.com/en-us/services/cognitive-services/speech-to-text/> * Text to Speech<https://azure.microsoft.com/en-us/services/cognitive-services/text-to-speech/> * Translation<https://azure.microsoft.com/en-us/services/cognitive-services/speech-translation/> Real Time Translation in WebRTC<https://www.youtube.com/watch?v=EPBWR_GNY9U> Best regards, Adam Sobieski P.S.: https://github.com/w3c/speech-api/issues/41
Received on Saturday, 15 September 2018 06:36:07 UTC