W3C home > Mailing lists > Public > www-style@w3.org > May 2011

Re: [css3-speech] voice-family selection and language

From: Daniel Weck <daniel.weck@gmail.com>
Date: Wed, 11 May 2011 17:48:52 +0100
Message-ID: <BANLkTimMz0STRn_WV_yD+68-SFqb2AHR-g@mail.gmail.com>
To: fantasai <fantasai.lists@inkedblade.net>
Cc: "www-style@w3.org" <www-style@w3.org>
Hi Fantasai,
I forgot that SSML 1.1 changed the language handling algorithm
("voice" now has a "languages" attribute) [1].
I still believe that the prose in the current CSS Speech editor's
draft is suitable [2], because I think that the language should *only*
be specified in the content layer (SSML encapsulates both text content
and aural presentation).

Thoughts ?



On Wed, May 11, 2011 at 11:21 AM, Daniel Weck <daniel.weck@gmail.com> wrote:
> All good points, thanks!
> I added the 'preserve' keyword value, a dedicated section to describe the
> language-dependent voice selection mechanism, and a short example:
> http://dev.w3.org/csswg/css3-speech/#voice-props-voice-family
> On 2 May 2011, at 19:53, fantasai wrote:
>> The SSML spec gives an algorithm for selecting voice families:
>>  http://www.w3.org/TR/speech-synthesis/#edef_voice
>> This algorithm is roughly approximated in the CSS3 Speech spec for
>> 'voice-family':
>>  http://dev.w3.org/csswg/css3-speech/#voice-family
>> # The ‘voice-family’ property is used to guide the selection of the voice
>> to be
>> # used for speech synthesis. The overriding priority is to match the
>> language
>> # specified by the xml:lang attribute as per the XML 1.0 specification
>> [XML10],
>> # and as inherited by nested elements until overridden by a further
>> xml:lang
>> # attribute.
>> #
>> # If there is no voice available for the requested value of xml:lang, the
>> # processor should select a voice that is closest to the requested
>> language
>> # (e.g. a variant or dialect of the same language). If there are multiple
>> # such voices available, the processor should use a voice that best
>> matches
>> # the values provided with the ‘voice-volume’ property. It is an error if
>> # there are no such matches.
>> Firstly, the prose here needs some tightening up. Copying the list
>> structure
>> from SSML is probably a good idea.
>> Second, CSS doesn't use xml:lang directly, since CSS (unlike SSML) is not
>> an
>> XML language. Looking up "the language of the element" is an abstract
>> operation; the closest thing we have to a definition is in Selectors Level
>> 3:
>>  http://www.w3.org/TR/css3-selectors/#lang-pseudo
>> Third, the SSML algorithm is somewhat imprecise about what "best matches"
>> means. We either need a definition here, or we need a note that this is
>> undefined.
>> Lastly, we need to figure out, for CSS, when the voice family is
>> recalculated.
>> In SSML, it's recalculated on every element, which means that if an
>> element
>> has a different language value than its parent, the voice family changes.
>> The
>> SSML spec notes that this is not always desirable (e.g. a French phrase
>> embedded in an English sentence) and in such cases suggests that the
>> xml:lang
>> attribute not indicate the language of the foreign phrase, thus avoiding
>> the
>> recalculation.
>> This isn't particularly practical in CSS. We don't actually want to
>> discourage
>> people from marking up their documents correctly, even if many don't
>> bother,
>> and messing with the markup to change the speech rendering interferes with
>> the
>> separation of content and style.
>> Probably the simplest solution would be to add a 'match-parent' keyword to
>> 'voice-family'. This would add the 'match-parent' keyword to the inherited
>> value for the computed value, and would prevent the voice selection from
>> being recalculated.
>> We could also consider something similar to the CSS3 Font's
>> 'font-language-override'
>> property, e.g.
>>  voice-language: auto | <language-code> | inherit;
>>  inherited: yes
>>  computed value: as specified
>>  auto -
>>   The used value is taken from the language of the element, or some
>>   UA-chosen value if unknown. (The computed value is the keyword 'auto'.)
>> I'm somewhat less in favor of this option, as
>>  a) 'match-parent' seems easier to use (imho)
>>  b) 'match-parent' is just a keyword instead of an additional property
>>  c) you can do more intelligent things with 'match-parent' if you have the
>>    ability. E.g., use French phonics to map the embedded phrase to the
>>    closest English phonemes, so "à propos" could be rendered as
>>    "ah pro-POE" instead of "a PROP-uss".
>> But it's something to consider.
>> ~fantasai
> Daniel Weck
> daniel.weck@gmail.com
Received on Wednesday, 11 May 2011 16:49:20 UTC

This archive was generated by hypermail 2.3.1 : Monday, 2 May 2016 14:38:45 UTC