Re: inline grammar example (CSS Speech, EPUB3)

On 3 Nov 2011, at 19:42, Olli Pettay wrote:
>> It seems appropriate for this group to advocate SRGS and SSML namespaces.
>> 
> HTML does have special cases for MathML and SVG,
> the first one perhaps mainly because certain browser engine has
> supported it for a long time, and the latter because it is actually
> being used in the web.
> Namespace attributes aren't handled, but when the parser gets <svg>,
> it knows that the element is in svg namespace.
> 
> I would be a bit surprised to see WhatWG/HTML WG to accept
> SRGS and SSML elements inline in the HTML.
> But I'll ask hsivonen (the HTML parsing guy).
> 
> In general I think HTML Speech API has better chances to get
> accepted outside this group if it complicates rest of the web platform
> as little as possible.
> Adding a JS API is totally ok, and probably also <reco>,
> but bringing in two new XML languages to HTML sounds quite a bit
> more complicated.


Just for everyone's information, the recently-released EPUB3 e-book standard uses a strict subset (and XML serialisation) of HTML5 that includes a couple of SSML-namespaced attributes (for inline phonemes). This, the W3C PLS format (pronunciation lexicons), and CSS Speech are the building blocks to enable text-to-speech within EPUB publications:

http://idpf.org/epub/30/spec/epub30-overview.html#sec-tts

Although I think it is unreasonable to hope that fully-fledge SSML markup will make its way into the HTML5 specification, I sure hope that browser vendors will implement some of CSS Speech, and will support the 'pronunciation' rel value for referencing external PLS files:

http://microformats.org/wiki/rel-pronunciation

I guess I digressed a little bit here! :) I thought that this would be valuable information for this group though.

I look forward to reading more about your speech synthesis -related work, in order to identify the potential pitfalls / inconsistencies that may arise between the CSS and JavaScript/API content authoring approaches. My gut feeling is that the risk of conflictual overlap is very low. Also, I expect the TTS API of HTML Speech to remain simple, given the ability to feed SSML directly to the screen-reader (conversely, CSS Speech does its best to mimic some of SSML's features, so it is "natively" richer).

Kind regards, Daniel
(disclaimer: editor of the CSS Speech Module Level 3)

Received on Thursday, 3 November 2011 20:52:49 UTC