- From: Daniel Weck <daniel.weck@gmail.com>
- Date: Wed, 11 May 2011 01:27:25 +0100
- To: W3C style mailing list <www-style@w3.org>, fantasai <fantasai.lists@inkedblade.net>
Fixed in the latest editors' draft. This was actually a regression
bug, since the CSS21 Aural Stylesheet Appendix defined volume levels
correctly. Note that I added the value "silent" to audio cues as "0"
now means something different.
http://dev.w3.org/csswg/css3-speech/#mixing-props-voice-volume
http://dev.w3.org/csswg/css3-speech/#cue-props
http://www.w3.org/TR/CSS21/aural.html#propdef-volume
On 28 Apr 2011, at 23:50, Daniel Weck wrote:
> On 28 Apr 2011, at 08:00, fantasai wrote:
>> voice-volume
>>
>> # silent, x-soft, soft, medium, loud, and x-loud
>> # A sequence of monotonically non-decreasing volume levels.
>> # The value of ‘silent’ is mapped to ‘0’ and ‘x-loud’ is
>> # mapped to ‘100’. The mapping of other values to numerical
>> # volume levels is implementation-dependent and may vary
>> # from one speech synthesizer to another.
>>
>> Because this definition doesn't map 'medium' to anything, it
>> makes it near-impossible for an author to use the absolute
>> values, assuming 'medium' (and not 'x-loud') is user's
>> preferred volume and the author intends to use that as the
>> baseline volume.
>
> Well, the volume scale is linear amplitude, so (for the sake of
> argument) a simple fix would be to explicitly state the actual
> values corresponding to each keyword:
>
> silent => 0
>
> x-soft => 15
> soft => 30
> medium => 50
> loud => 75
>
> x-loud => 100 (max tolerable loudness, defined by user)
>
> _however_, this has limited usefulness, because the keywords are
> just "shortcuts" to numerical values (i.e. "named values"). As you
> rightly said, a more useful feature would be a keyword enumeration
> that maps to "softest audible", "loudest tolerable", and "preferred
> volume". My feeling is that the 5 values (excluding silence) defined
> by SSML aim to express just that:
>
> x-soft => "softest audible"
> soft => ?
> medium => "preferred volume"
> loud => ?
> x-loud => "loudest tolerable"
>
> ...but of course the "soft" and "loud" values remain slightly under-
> specified (i.e. what should implementors do, and what should authors
> expect when using these values ?).
>
>> Afaict, it's unlikely that the absolute
>> scale can be used for anything other than fading from x-loud
>> to silence.
>
> Sure, a cursor can be moved on the linear volume scale to animate
> the wave amplitude, that's a useful feature in itself.
>
> I agree that without a deterministic mapping between keywords (which
> we assume represent "softest", "preferred" and "loudest" + two in-
> between steps) and absolute values, authors cannot produce content
> using numerical values that predictably meet concrete user needs or
> user-agent's "reasonable" pre-defined settings, because, for
> example, "medium" (or "preferred volume") may not necessarily
> correspond to 50.0 ... it could be 90 for a reading system operating
> in a loud environment.
>
> However, this doesn't mean that numerical values are pointless, in
> fact there might also be use-cases where the enumerated keywords are
> not used at all.
>
>> Percentages are tricky, because due to nesting, it's not
>> possible to reference against 'medium', which I assume in
>> most cases is what you'd want to do, right?
>
> Well, the remark above about the usefulness of absolute numerical
> values apply to percentages too, given that they are relative to the
> inherited computed value which is situated on the somewhat-abstract
> linear [0,100] amplitude scale.
>
> We would need another syntax of property value in order to provide
> volume adjustment relative to a keyword. For example:
>
> span.half-x-loud
> {
> voice-volume: 50% x-loud;
> }
>
> Are you requesting this feature, or merely pointing-out that it is
> not currently doable ? In my opinion, this is still as non-
> deterministic as the absolute values case ("50% x-loud" may
> effectively resolve to "medium"...but maybe not).
>
>> It seems to me that what an author would really need is a
>> scale that varies between "softest audible", "loudest
>> tolerable", and "preferred volume", where each of these are
>> set by the listener. The keywords give you that scale, but
>> there are only 5 points on this scale, as opposed to infinite
>> on the absolute scale, which strikes me as less useful in
>> general...
>
> Well, we either have a (short) enumeration, with tangible, easily-
> usable mapping to user values, or we have a scale with a large
> number (technically, near-infinite) of abstract steps. Currently, we
> provide both, and the only direct connection between the two is the
> 0/min and 100/max boundaries. It works (i.e. it can be implemented
> unambiguously), but I agree that we lack a good understanding of how
> authors benefit from the enormous number of absolute values.
>
>> I'm having a hard time understanding how the capabilities
>> of this property would be used, but I suspect it's not matching
>> the authoring story very well. Perhaps you could explain how
>> voice-volume values other than the keywords would be used?
>
> I don't have a concrete usage in mind where absolute numerical
> values would be more useful to authors than 3 (or 5) pre-defined
> user-centric keyword-based volume levels.
>
> I am not aware of SSML's rationale for this design choice, but I
> think CSS-Speech should aim to remain compatible with SSML
> notation. It doesn't really hurt anyone, right ? Unless of course
> the specification itself is ambiguous, which I think isn't.
>
> Regards, Daniel
Daniel Weck
daniel.weck@gmail.com
Received on Wednesday, 11 May 2011 00:27:51 UTC