- From: T.V Raman <raman@google.com>
- Date: Tue, 6 Sep 2011 08:35:30 -0700
- To: marc.schroeder@dfki.de
- Cc: Robert.Brown@microsoft.com, public-xg-htmlspeech@w3.org
In addition, accessibility use cases, e.g. screenreaders for the
blind need:
1. Rapid response when speaking individual letters; in general
it is advantageous to have a separate "speakLeter" call that
reacts instantaneously
2. Index markers and a callback when the index mark has been
processed
3. A callback when speech synthesis playback is complete.
Marc Schroeder writes:
> Hi Robert,
>
> I have given this some thought and I think that, while the TTS mechanics
> for the protocol will be almost the same, for educational purposes we
> might want to have three TTS examples that cover:
>
> (1) unplannable just-in-time TTS where the next utterance depends on the
> recent user action, e.g. a dialogue system or an in-car navigation system;
>
> (2) bulk requests of TTS for pre-determined elements of the web page
> (say, spoken help texts to be played on mouseover);
>
> (3) a multimodal presentation example, showcasing the requirement for
> marking points in time in the synthesized audio ("would you like to sit
> HERE at the window or rather HERE at the aisle?").
>
> In each of these, in the protocol document we would only give the use
> case scenario and the web api integration in the form of a short
> explanatory story, but flesh out the protocol aspect of these in detail.
>
> Examples (1) and (2) could be either plain text or SSML, whereas (3)
> must be SSML with <ssml:mark> tags.
>
>
> Does that match what you had in mind?
>
> Story text for each of these could be as follows (see below).
>
> If you think these examples make sense, I would be most grateful if you
> could write the protocol part going with them... it is beyond me at the
> moment.
>
> Thanks and best,
> Marc
>
>
>
> (1) The most straightforward use case for TTS is the synthesis of one
> utterance at a time. This is inevitable for just-in-time rendition of
> speech, for example in dialogue systems or in in-car navigation
> scenarios. Here, the web application will send a single speech synthesis
> request to the speech service, and retrieve the resulting speech output
> as described (elsewhere).
>
> On the protocol level, the synthesis of a single utterance would look as
> follows.
>
> (a) plain-text example
>
> The utterance to be spoken can be sent as plain text. In this case, it
> is necessary to specify the language to use:
>
> (protocol level details here...)
>
>
> (b) Speech Synthesis Markup example
>
> For richer markup of the text, it is possible to use the SSML format for
> sending an annotated request. For example, it is possible to propose an
> appropriate pronunciation or to indicate where to insert pauses:
>
> (example adapted from http://www.w3.org/TR/speech-synthesis11/#edef_break):
>
> <?xml version="1.0"?>
> <speak version="1.1" xmlns="http://www.w3.org/2001/10/synthesis"
> xml:lang="en-US">
> Please make your choice. <break time="3s"/>
> Click any of the buttons to indicate your preference.
> </speak>
>
> (protocol level details here...)
>
>
> (2) Some use cases require relatively static speech output which can be
> known at the time of loading a web page. In these cases, all required
> speech output can be requested in parallel as multiple concurrent
> requests. Callback methods in the web api are responsible to relate each
> speech stream to the appropriate place in the web application.
>
> On the protocol level, the request of multiple speech streams
> concurrently is realized as follows.
>
> (for educational purposes, maybe use several languages and voices but
> with plain text, such as "Hola, me llamo Maria." (Spanish), "Hi, I'm
> George." (UK English), or "Hallo, ich heiße Peter." (German).)
>
> (protocol level details here...)
>
>
> (3) In order to synchronize the speech content with other events in the
> web application, it is possible to mark relevant points in time using
> the SSML <mark> tag. When the speech is played back, a callback method
> is called for these markers, allowing the web application to present,
> e.g., visual displays synchronously.
>
> (example adapted from http://www.w3.org/TR/speech-synthesis11/#S3.3.2):
>
> <?xml version="1.0"?>
> <speak version="1.1" xmlns="http://www.w3.org/2001/10/synthesis"
> xml:lang="en-US">
> Would you like to sit <mark name="window_seat"/> here at the window, or
> rather <mark name="aisle_seat"/> here at the aisle?
> </speak>
>
>
> (protocol level details here...)
>
>
> On 30.08.11 15:46, Robert Brown wrote:
> > Hi Marc,
> >
> > No problem if you're short of time. If you can suggest the examples, I can create them and send them to you for comment.
> >
> > Cheers,
> >
> > /Rob
> > ________________________________________
> > From: Marc Schroeder [marc.schroeder@dfki.de]
> > Sent: Tuesday, August 30, 2011 6:15 AM
> > To: Robert Brown
> > Subject: Re: more protocol examples for synthesis?
> >
> > Hi Robert,
> >
> > sorry for my relative silence recently, time is in short supply at my
> > end at the moment.
> >
> > I definitely think there should be some more synthesis examples. I can
> > certainly think of some, and attempt, with my limited understanding of
> > MRCP and thus of this protocol, to formulate them.
> >
> > An issue might be time; until when are they needed?
> >
> > Best wishes,
> > Marc
> >
> > On 30.08.11 02:13, Robert Brown wrote:
> >> Hi Marc,
> >>
> >> I was wondering if you think the protocol draft needs any additional
> >> synthesis examples?
> >>
> >> (here’s the current link:
> >> http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Aug/att-0004/speech-protocol-draft-04.html
> >> )
> >>
> >> Also, if you think it does, would you like to write them?
> >>
> >> Let me know what you think.
> >>
> >> Cheers,
> >>
> >> /Rob
> >>
>
> --
> Dr. Marc Schröder, Senior Researcher at DFKI GmbH
> Project leader for DFKI in SSPNet http://sspnet.eu
> Team Leader DFKI TTS Group http://mary.dfki.de
> Editor W3C EmotionML Working Draft http://www.w3.org/TR/emotionml/
> Portal Editor http://emotion-research.net
>
> Homepage: http://www.dfki.de/~schroed
> Email: marc.schroeder@dfki.de
> Phone: +49-681-85775-5303
> Postal address: DFKI GmbH, Campus D3_2, Stuhlsatzenhausweg 3, D-66123
> Saarbrücken, Germany
> --
> Official DFKI coordinates:
> Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
> Trippstadter Strasse 122, D-67663 Kaiserslautern, Germany
> Geschaeftsfuehrung:
> Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
> Dr. Walter Olthoff
> Vorsitzender des Aufsichtsrats: Prof. Dr. h.c. Hans A. Aukes
> Amtsgericht Kaiserslautern, HRB 2313
Received on Tuesday, 6 September 2011 15:36:01 UTC