Speech Synthesis - Length parameter from Jerry Smith (WPT) on 2016-11-10 (public-speech-api@w3.org from November 2016)

From: Jerry Smith (WPT) <jdsmith@microsoft.com>
Date: Thu, 10 Nov 2016 23:58:21 +0000
To: "public-speech-api@w3.org" <public-speech-api@w3.org>
CC: Glen Shires <gshires@google.com>
Message-ID: <BY2PR03MB041222C6D13D24FBFFDE4B8A4B80@BY2PR03MB041.namprd03.prod.outlook.com>

We’ve implemented speech synthesis in Edge on the Windows 10 Anniversary Update, and have been revising it lately to support the word boundary features. We have an internal partner that wants to use these. They’ve also requested we support word “length”, which isn’t included in the community group Web Speech API Specification. Knowing the length in addition to boundary makes it very simple to highlight text while it is being spoken. We already support this in WinRT APIs, and would like to do the same on Edge.
Our goal would be to receive equivalents to the following WinRT API events for both paragraphs and words:

TimeSpan StartTime Position in the audio stream
Required by IMediaCue
HSTRING Text Text of the bookmark. For sentence and word boundary this can provide the text snippet from the original text.
Note: We do not have a strong requirement to support text for word and sentence boundary markers.
Nullable<UINT32> Offset Offset in the input text associated with the current position in the audio playback.
This is not populated for SSML bookmarks.
Nullable<UINT32> Length The length of the text starting from the Offset associated with the position in the audio playback.
This is not populated for SSML bookmarks.

The existing speech API spec has been around for a while. Is there a way to evaluate and process spec additions/edits?

Glen: I’d appreciate hearing your take on this suggestion. The Speech API community report dates to 2012. Is there much interest in revising it in other ways?

Jerry Smith
Microsoft – Web Platform Team

Received on Thursday, 10 November 2016 23:58:58 UTC