Re: [css3-speech] Heads-up: CSS WG plans last call for css3-speech (part 2) from Daniel Weck on 2011-09-29 (www-style@w3.org from September 2011)

From: Daniel Weck <daniel.weck@gmail.com>
Date: Thu, 29 Sep 2011 14:49:43 +0100
To: W3C style mailing list <www-style@w3.org>, "paul.bagshaw@orange.com> <paul.bagshaw@orange.com" <paul.bagshaw@orange.com>
Cc: w3c-voice-wg@w3.org
Message-Id: <CFB503F0-8C25-4314-B603-F21808BD90F4@gmail.com>
When required="", isn't 'xml:lang' used to determine the list of candidate TTS voices? In that case, the SSML1.1 document resulting from a translation from CSS3-Speech would conform exactly to the author's intent. Obviously, I do agree with you that allowing the CSS author to override the language/accent for voice selection purposes, would be ideal. Regards, Dan

On 29 Sep 2011, at 14:26, <paul.bagshaw@orange.com> wrote:

> Well, if you put required="", that's fine for SSML syntax. Your conformant SSML1.1 processor will now select you a voice that can speak any language regardless of the whether or not it can interpret the language of the text you give it to read. That is a drastic situation to end up in, by default. And moreover, you've left your style author with no way of controlling that. That is a handicap.
> 
> Have fun discussing this....
> 
> -- Paul
> 
> -----Original Message-----
> From: Daniel Weck [mailto:daniel.weck@gmail.com] 
> Sent: Thursday, September 29, 2011 3:06 PM
> To: W3C style mailing list; BAGSHAW Paul RD-TECH-REN
> Cc: w3c-voice-wg@w3.org
> Subject: Re: [css3-speech] Heads-up: CSS WG plans last call for css3-speech (part 2)
> 
> Hi Paul,
> CSS3-Speech doesn't allow content authors to control the 'required' attribute, but the translation to SSML1.1 results in <voice required="" ...> ... </voice>, right? (note the empty attribute value) How does that qualify as "handicapped", when it is valid SSML syntax?
> 
> At any rate, the issue raised is that voice selection based on author-specified languages is a deemed a crucial feature, which should not be omitted in Level 3 of CSS Speech. I don't disagree on principle, I am just reluctant to introduce strong conformance requirements on speech synthesisers ("voice descriptions") that are hard to enforce and to test in practice (from the point of view of web browser vendors), and 
> consequently to increase the risk of lacking reference implementation.
> 
> As said before, please allow me to consult the Working Group on this issue, and I will report back in this public mailing-list discussion.
> 
> Kind regards, Daniel
> 
> 
> On 29 Sep 2011, at 13:13, <paul.bagshaw@orange.com> <paul.bagshaw@orange.com> wrote:
> 
>> Hi,
>> 
>> I note that CSS3-Speech does not provide access to ssml:voice behaviour control attributes ('required', 'ordered' and 'onvoicefailure'). That's fine; I have no issue with that (although IMHO it would have been nice to have, but it's probably too implementation specific for CSS3's "broader implementation spectrum"). Please note, however, that the default value of 'required' is 'languages'. Oh dear, CSS3-speech provides no means to control this REQUIRED attribute (not optional, by default).
>> 
>> It is extremely important when working with speech to make a distinction between the languages a text is written in and the languages / accents in which it is spoken. That is why SSML 1.1 made a significant move from SSML 1.0 when it removed 'xml:lang' from <ssml:voice> and replaced it with 'languages'. If this distinction is not maintained in CSS3-Speech, then the advance made from SSML 1.0 to 1.1 in its voice selection is severely hindered. You will be prohibiting authors' access to SSML 1.1 functionality, and CCS3-Speech -> SSML1.1 translation is handicapped.
>> 
>> I greatly suspect that CSS3's "broader implementation spectrum" would be grateful to lean on SSML's experience in its need to distinguish text and voice languages. There seems little to lose by adding it and a lot to lose by its current omission.
>> 
>> Yours (watching the space)
>> -- Paul
>> 
>> -----Original Message-----
>> From: Daniel Weck [mailto:daniel.weck@gmail.com] 
>> Sent: Thursday, September 29, 2011 11:02 AM
>> To: W3C style mailing list; BAGSHAW Paul RD-TECH-REN
>> Cc: w3c-voice-wg@w3.org
>> Subject: Re: [css-speech] Heads-up: CSS WG plans last call for css3-speech (part 2)
>> 
>> Hi Paul,
>> in my reply, I only mentioned 'xml:lang' in between parenthesis, to clarify the context within which BCP47 is referenced (in SSML1.1), and to draw a parallel with CSS3-Speech. Rest assured that I do not find the semantics of 'languages' confounding. Note that I brought-up SSML's 'onlangfailure' feature because CSS-Speech would need to normatively document conformance requirements for synthesis processor ("voice descriptions", BCP47, etc.).
>> 
>> Given that the 'languages' attribute is optional in SSML1.1, why do you think that "the ability to generate a conformant SSML 1.1 document through application of a CSS Speech Model" would be "severely hindered"? I appreciate that round-trip engineering would be an issue (as with other features in CSS3-Speech), but the CSS -> SSML translation would seem to work fine.
>> 
>> Personally (like many others, based on the feedback I received), I hope to see a broader implementation spectrum - i.e. beyond the existing niche and somewhat incomplete "aural" / "speech" stylesheets applications - for the features provided by CSS-Speech. I would love Level3 of the Speech Module to reach critical mass, so I am naturally concerned about CR reference implementations, and ultimately about support from browser vendors. By instinct and (hopefully) based on my technical insight, I would mark the 'languages' feature "at-risk" in CSS3-Speech. I could be wrong though, and note that this issue has not been debated yet with the rest of the CSS Working Group. Blame me personally for any oversight :)
>> 
>> To conclude, this issue needs to be further discussed. Watch this space.
>> 
>> Thank you very much for your feedback Paul!
>> Regards, Daniel
>> 
>> On 29 Sep 2011, at 08:10, <paul.bagshaw@orange.com> wrote:
>> 
>>> Hi,
>>> 
>>> You seem to be mixing (and consequently misunderstanding) the interpretation of the 'xml:lang' and 'languages' attributes in SSML 1.1. The mismatch between CSS Level 3 language model [5] and SSML's use of 'xml:lang' is justified by your comment. However, 'languages' is independent of that, even though the values it may take are related to those of xml:lang.
>>> 
>>> The 'languages' attribute is REQUIRED by the voice selection algorithm of an SSML 1.1 interpreter. It must be made available to it in the same way that 'age' is a REQUIRED attribute. If it is not added, then the ability to generate a conformant SSML 1.1 document through application of a CSS Speech Model will be severely hindered. Its omission is a serious deficiency.
>>> 
>>> -- Paul
>>> 
>>> -----Original Message-----
>>> From: Daniel Weck [mailto:daniel.weck@gmail.com] 
>>> Sent: Thursday, September 08, 2011 6:08 PM
>>> To: BAGSHAW Paul RD-TECH-REN; www-style list
>>> Cc: w3c-voice-wg@w3.org
>>> Subject: Re: [css-speech] Heads-up: CSS WG plans last call for css3-speech (part 2)
>>> 
>>> Hello Paul, thank you for your comments.
>>> 
>>> SSML1.1 normatively references BCP47, which defines not only the syntax for language identifiers (used in xml:lang [1] and in the 'languages' attribute [2] of SSML markup), but also the language matching algorithm. Furthermore, the voice selection algorithm in SSML 1.1 relies on the "voice descriptions" [3] and "onlangfailure" [4] conformance requirements ('must') for speech processors.
>>> 
>>> CSS Level 3 supports a looser language model [5] (which doesn't normatively reference BCP47), and the Speech Module relies on a number of implementation-dependent properties of the underlying speech processor [6] (i.e. areas where there is no such strict conformance requirements as the ones in SSML). Although we aim at matching most SSML features, there is no 1-to-1 mapping due to intrinsic technical differences between SSML versus CSS document models.
>>> 
>>> On a more pragmatic note, I am in favor of managing complexity in order to encourage uptake amongst browser vendors and implementors within the e-book publishing field, at least for Level 3. The next Level of CSS Speech is likely to address shortcomings such as the lack of strict language support, 3D spatial audio, etc.
>>> 
>>> To conclude, I propose to defer the introduction of an equivalent to SSML's 'languages' attribute to the next major revision of CSS Speech.
>>> 
>>> Let us know if this satisfies you.
>>> Kind regards, Daniel
>>> 
>>> [1]
>>> http://www.w3.org/TR/speech-synthesis11/#adef_xmllang
>>> 
>>> [2]
>>> http://www.w3.org/TR/speech-synthesis11/#adef_voice
>>> 
>>> [3]
>>> http://www.w3.org/TR/speech-synthesis11/#voice_descriptions
>>> 
>>> [4]
>>> http://www.w3.org/TR/speech-synthesis11/#adef_onlangfailure
>>> 
>>> [5]
>>> http://www.w3.org/TR/css3-selectors/#lang-pseudo
>>> 
>>> [6]
>>> http://www.w3.org/TR/css3-speech/#voice-selection
>>> 
>>> On 18 Aug 2011, at 12:00, <paul.bagshaw@orange-ftgroup.com> <paul.bagshaw@orange-ftgroup.com> wrote:
>>> 
>>>> Bert,
>>>> 
>>>> Continuing comments...
>>>> 
>>>> 2. <languages> value need for voice selection in 'voice-family' property.
>>>> 
>>>> SSML 1.1 has made a significant move from SSML 1.0 in its voice selection algorithm. It has brought a much needed clarification between the language of written content and the language(s) spoken by a particular voice (to allow, for example, a French voice to read an English film title). The xml:lang attribute is used by SSML 1.1 only to designate the language of written content. The SSML 1.1 <voice> element takes an optional 'languages' attribute "indicating the list of languages the voice is desired to speak".
>>>> 
>>>> It is desirable to give access to this attribute via the 'voice-family' property.
>>>> 
>>>> The voice selection algorithm proposed by the CSS-Speech module should consequently to put in line with that specified by SSML 1.1.
>>>> 
>>>> Regards,
>>>> Paul Bagshaw
>>>> Co-author of SSML 1.1 and PLS 1.0.
>>>> 
>>>> -----Original Message-----
>>>> From: w3c-voice-wg-request@w3.org [mailto:w3c-voice-wg-request@w3.org] On Behalf Of Bert Bos
>>>> Sent: Sunday, August 14, 2011 12:32 AM
>>>> To: w3c-wai-pf@w3.org; w3c-voice-wg@w3.org; member-xg-htmlspeech@w3.org; wai-xtech@w3.org
>>>> Cc: chairs@w3.org
>>>> Subject: Heads-up: CSS WG plans last call for css3-speech
>>>> 
>>>> Hello chairs,
>>>> 
>>>> The CSS WG decided to issue a last call for the CSS Speech Module. We're planning to publish next week, with a deadline for comments of 30 September, i.e., about 6 weeks.
>>>> 
>>>> Please, let us know if that deadline is too soon.
>>>> 
>>>> We'd especially like to hear from
>>>> 
>>>> - WAI PF and/or HTML Accessibility TF
>>>> - Voice Browser WG
>>>> - HTML Speech XG
>>>> 
>>>> The latest editor's draft is here:
>>>> 
>>>>  http://dev.w3.org/csswg/css3-speech/
>>>> 
>>>> (The content is what will be published, after reformatting for Working Draft.)
>>>> 
>>>> The CSS Speech module contains properties to style the rendering of documents via a speech synthesizer: voice, volume, speed, pitch, pauses, etc. It is designed to be compatible with SSML, i.e., the rendering of the document could be in the form of an SSML stream.
>>>> 
>>>> 
>>>> 
>>>> For the CSS WG,
>>>> 
>>>> Bert
>>>> --
>>>> Bert Bos                                ( W 3 C )http://www.w3.org/
>>>> http://www.w3.org/people/bos                               W3C/ERCIM
>>>> bert@w3.org                             2004 Rt des Lucioles / BP 93
>>>> +33 (0)4 92 38 76 92            06902 Sophia Antipolis Cedex, France
>>>> 
>>>> 
>>> 
>> 
>
Received on Thursday, 29 September 2011 13:50:19 UTC