Re: [css3-speech] voice-family selection and language

All good points, thanks!

I added the 'preserve' keyword value, a dedicated section to describe  
the language-dependent voice selection mechanism, and a short example:

http://dev.w3.org/csswg/css3-speech/#voice-props-voice-family

On 2 May 2011, at 19:53, fantasai wrote:

> The SSML spec gives an algorithm for selecting voice families:
>  http://www.w3.org/TR/speech-synthesis/#edef_voice
>
> This algorithm is roughly approximated in the CSS3 Speech spec for  
> 'voice-family':
>  http://dev.w3.org/csswg/css3-speech/#voice-family
>
> # The ‘voice-family’ property is used to guide the selection of the  
> voice to be
> # used for speech synthesis. The overriding priority is to match the  
> language
> # specified by the xml:lang attribute as per the XML 1.0  
> specification [XML10],
> # and as inherited by nested elements until overridden by a further  
> xml:lang
> # attribute.
> #
> # If there is no voice available for the requested value of  
> xml:lang, the
> # processor should select a voice that is closest to the requested  
> language
> # (e.g. a variant or dialect of the same language). If there are  
> multiple
> # such voices available, the processor should use a voice that best  
> matches
> # the values provided with the ‘voice-volume’ property. It is an  
> error if
> # there are no such matches.
>
> Firstly, the prose here needs some tightening up. Copying the list  
> structure
> from SSML is probably a good idea.
>
> Second, CSS doesn't use xml:lang directly, since CSS (unlike SSML)  
> is not an
> XML language. Looking up "the language of the element" is an abstract
> operation; the closest thing we have to a definition is in Selectors  
> Level 3:
>  http://www.w3.org/TR/css3-selectors/#lang-pseudo
>
> Third, the SSML algorithm is somewhat imprecise about what "best  
> matches"
> means. We either need a definition here, or we need a note that this  
> is
> undefined.
>
>
> Lastly, we need to figure out, for CSS, when the voice family is  
> recalculated.
> In SSML, it's recalculated on every element, which means that if an  
> element
> has a different language value than its parent, the voice family  
> changes. The
> SSML spec notes that this is not always desirable (e.g. a French  
> phrase
> embedded in an English sentence) and in such cases suggests that the  
> xml:lang
> attribute not indicate the language of the foreign phrase, thus  
> avoiding the
> recalculation.
>
> This isn't particularly practical in CSS. We don't actually want to  
> discourage
> people from marking up their documents correctly, even if many don't  
> bother,
> and messing with the markup to change the speech rendering  
> interferes with the
> separation of content and style.
>
> Probably the simplest solution would be to add a 'match-parent'  
> keyword to
> 'voice-family'. This would add the 'match-parent' keyword to the  
> inherited
> value for the computed value, and would prevent the voice selection  
> from
> being recalculated.
>
> We could also consider something similar to the CSS3 Font's 'font- 
> language-override'
> property, e.g.
>
>  voice-language: auto | <language-code> | inherit;
>  inherited: yes
>  computed value: as specified
>
>  auto -
>    The used value is taken from the language of the element, or some
>    UA-chosen value if unknown. (The computed value is the keyword  
> 'auto'.)
>
> I'm somewhat less in favor of this option, as
>  a) 'match-parent' seems easier to use (imho)
>  b) 'match-parent' is just a keyword instead of an additional property
>  c) you can do more intelligent things with 'match-parent' if you  
> have the
>     ability. E.g., use French phonics to map the embedded phrase to  
> the
>     closest English phonemes, so "à propos" could be rendered as
>     "ah pro-POE" instead of "a PROP-uss".
> But it's something to consider.
>
> ~fantasai
>
>

Daniel Weck
daniel.weck@gmail.com

Received on Wednesday, 11 May 2011 10:22:24 UTC