- From: fantasai <fantasai.lists@inkedblade.net>
- Date: Mon, 02 May 2011 11:53:01 -0700
- To: "www-style@w3.org" <www-style@w3.org>
The SSML spec gives an algorithm for selecting voice families: http://www.w3.org/TR/speech-synthesis/#edef_voice This algorithm is roughly approximated in the CSS3 Speech spec for 'voice-family': http://dev.w3.org/csswg/css3-speech/#voice-family # The ‘voice-family’ property is used to guide the selection of the voice to be # used for speech synthesis. The overriding priority is to match the language # specified by the xml:lang attribute as per the XML 1.0 specification [XML10], # and as inherited by nested elements until overridden by a further xml:lang # attribute. # # If there is no voice available for the requested value of xml:lang, the # processor should select a voice that is closest to the requested language # (e.g. a variant or dialect of the same language). If there are multiple # such voices available, the processor should use a voice that best matches # the values provided with the ‘voice-volume’ property. It is an error if # there are no such matches. Firstly, the prose here needs some tightening up. Copying the list structure from SSML is probably a good idea. Second, CSS doesn't use xml:lang directly, since CSS (unlike SSML) is not an XML language. Looking up "the language of the element" is an abstract operation; the closest thing we have to a definition is in Selectors Level 3: http://www.w3.org/TR/css3-selectors/#lang-pseudo Third, the SSML algorithm is somewhat imprecise about what "best matches" means. We either need a definition here, or we need a note that this is undefined. Lastly, we need to figure out, for CSS, when the voice family is recalculated. In SSML, it's recalculated on every element, which means that if an element has a different language value than its parent, the voice family changes. The SSML spec notes that this is not always desirable (e.g. a French phrase embedded in an English sentence) and in such cases suggests that the xml:lang attribute not indicate the language of the foreign phrase, thus avoiding the recalculation. This isn't particularly practical in CSS. We don't actually want to discourage people from marking up their documents correctly, even if many don't bother, and messing with the markup to change the speech rendering interferes with the separation of content and style. Probably the simplest solution would be to add a 'match-parent' keyword to 'voice-family'. This would add the 'match-parent' keyword to the inherited value for the computed value, and would prevent the voice selection from being recalculated. We could also consider something similar to the CSS3 Font's 'font-language-override' property, e.g. voice-language: auto | <language-code> | inherit; inherited: yes computed value: as specified auto - The used value is taken from the language of the element, or some UA-chosen value if unknown. (The computed value is the keyword 'auto'.) I'm somewhat less in favor of this option, as a) 'match-parent' seems easier to use (imho) b) 'match-parent' is just a keyword instead of an additional property c) you can do more intelligent things with 'match-parent' if you have the ability. E.g., use French phonics to map the embedded phrase to the closest English phonemes, so "à propos" could be rendered as "ah pro-POE" instead of "a PROP-uss". But it's something to consider. ~fantasai
Received on Monday, 2 May 2011 18:53:34 UTC