Re: [css3-speech] voice-volume from Daniel Weck on 2011-05-11 (www-style@w3.org from May 2011)

From: Daniel Weck <daniel.weck@gmail.com>
Date: Wed, 11 May 2011 01:27:25 +0100
To: W3C style mailing list <www-style@w3.org>, fantasai <fantasai.lists@inkedblade.net>
Message-Id: <E9B2480C-F88D-447D-9602-79075CDB2758@gmail.com>
Fixed in the latest editors' draft. This was actually a regression  
bug, since the CSS21 Aural Stylesheet Appendix defined volume levels  
correctly. Note that I added the value "silent" to audio cues as "0"  
now means something different.

http://dev.w3.org/csswg/css3-speech/#mixing-props-voice-volume

http://dev.w3.org/csswg/css3-speech/#cue-props

http://www.w3.org/TR/CSS21/aural.html#propdef-volume

On 28 Apr 2011, at 23:50, Daniel Weck wrote:

> On 28 Apr 2011, at 08:00, fantasai wrote:
>> voice-volume
>>
>> # silent, x-soft, soft, medium, loud, and x-loud
>> #    A sequence of monotonically non-decreasing volume levels.
>> #    The value of ‘silent’ is mapped to ‘0’ and ‘x-loud’ is
>> #    mapped to ‘100’. The mapping of other values to numerical
>> #    volume levels is implementation-dependent and may vary
>> #    from one speech synthesizer to another.
>>
>> Because this definition doesn't map 'medium' to anything, it
>> makes it near-impossible for an author to use the absolute
>> values, assuming 'medium' (and not 'x-loud') is user's
>> preferred volume and the author intends to use that as the
>> baseline volume.
>
> Well, the volume scale is linear amplitude, so (for the sake of  
> argument) a simple fix would be to explicitly state the actual  
> values corresponding to each keyword:
>
> silent => 0
>
> x-soft => 15
> soft => 30
> medium => 50
> loud => 75
>
> x-loud => 100 (max tolerable loudness, defined by user)
>
> _however_, this has limited usefulness, because the keywords are  
> just "shortcuts" to numerical values (i.e. "named values"). As you  
> rightly said, a more useful feature would be a keyword enumeration  
> that maps to "softest audible", "loudest tolerable", and "preferred  
> volume". My feeling is that the 5 values (excluding silence) defined  
> by SSML aim to express just that:
>
> x-soft => "softest audible"
> soft => ?
> medium => "preferred volume"
> loud => ?
> x-loud => "loudest tolerable"
>
> ...but of course the "soft" and "loud" values remain slightly under- 
> specified (i.e. what should implementors do, and what should authors  
> expect when using these values ?).
>
>> Afaict, it's unlikely that the absolute
>> scale can be used for anything other than fading from x-loud
>> to silence.
>
> Sure, a cursor can be moved on the linear volume scale to animate  
> the wave amplitude, that's a useful feature in itself.
>
> I agree that without a deterministic mapping between keywords (which  
> we assume represent "softest", "preferred" and "loudest" + two in- 
> between steps) and absolute values, authors cannot produce content  
> using numerical values that predictably meet concrete user needs or  
> user-agent's "reasonable" pre-defined settings, because, for  
> example, "medium" (or "preferred volume") may not necessarily  
> correspond to 50.0 ... it could be 90 for a reading system operating  
> in a loud environment.
>
> However, this doesn't mean that numerical values are pointless, in  
> fact there might also be use-cases where the enumerated keywords are  
> not used at all.
>
>> Percentages are tricky, because due to nesting, it's not
>> possible to reference against 'medium', which I assume in
>> most cases is what you'd want to do, right?
>
> Well, the remark above about the usefulness of absolute numerical  
> values apply to percentages too, given that they are relative to the  
> inherited computed value which is situated on the somewhat-abstract  
> linear [0,100] amplitude scale.
>
> We would need another syntax of property value in order to provide  
> volume adjustment relative to a keyword. For example:
>
> span.half-x-loud
> {
> voice-volume: 50% x-loud;
> }
>
> Are you requesting this feature, or merely pointing-out that it is  
> not currently doable ? In my opinion, this is still as non- 
> deterministic as the absolute values case ("50% x-loud" may  
> effectively resolve to "medium"...but maybe not).
>
>> It seems to me that what an author would really need is a
>> scale that varies between "softest audible", "loudest
>> tolerable", and "preferred volume", where each of these are
>> set by the listener. The keywords give you that scale, but
>> there are only 5 points on this scale, as opposed to infinite
>> on the absolute scale, which strikes me as less useful in
>> general...
>
> Well, we either have a (short) enumeration, with tangible, easily- 
> usable mapping to user values, or we have a scale with a large  
> number (technically, near-infinite) of abstract steps. Currently, we  
> provide both, and the only direct connection between the two is the  
> 0/min and 100/max boundaries. It works (i.e. it can be implemented  
> unambiguously), but I agree that we lack a good understanding of how  
> authors benefit from the enormous number of absolute values.
>
>> I'm having a hard time understanding how the capabilities
>> of this property would be used, but I suspect it's not matching
>> the authoring story very well. Perhaps you could explain how
>> voice-volume values other than the keywords would be used?
>
> I don't have a concrete usage in mind where absolute numerical  
> values would be more useful to authors than 3 (or 5) pre-defined  
> user-centric keyword-based volume levels.
>
> I am not aware of SSML's rationale for this design choice, but I  
> think CSS-Speech should aim to remain compatible with SSML  
> notation.  It doesn't really hurt anyone, right ? Unless of course  
> the specification itself is ambiguous, which I think isn't.
>
> Regards, Daniel

Daniel Weck
daniel.weck@gmail.com
Received on Wednesday, 11 May 2011 00:27:51 UTC