- From: Daniel Weck <daniel.weck@gmail.com>
- Date: Thu, 28 Apr 2011 23:50:35 +0100
- To: W3C style mailing list <www-style@w3.org>, fantasai <fantasai.lists@inkedblade.net>
On 28 Apr 2011, at 08:00, fantasai wrote:
> voice-volume
>
> # silent, x-soft, soft, medium, loud, and x-loud
> # A sequence of monotonically non-decreasing volume levels.
> # The value of ‘silent’ is mapped to ‘0’ and ‘x-loud’ is
> # mapped to ‘100’. The mapping of other values to numerical
> # volume levels is implementation-dependent and may vary
> # from one speech synthesizer to another.
>
> Because this definition doesn't map 'medium' to anything, it
> makes it near-impossible for an author to use the absolute
> values, assuming 'medium' (and not 'x-loud') is user's
> preferred volume and the author intends to use that as the
> baseline volume.
Well, the volume scale is linear amplitude, so (for the sake of
argument) a simple fix would be to explicitly state the actual values
corresponding to each keyword:
silent => 0
x-soft => 15
soft => 30
medium => 50
loud => 75
x-loud => 100 (max tolerable loudness, defined by user)
_however_, this has limited usefulness, because the keywords are just
"shortcuts" to numerical values (i.e. "named values"). As you rightly
said, a more useful feature would be a keyword enumeration that maps
to "softest audible", "loudest tolerable", and "preferred volume". My
feeling is that the 5 values (excluding silence) defined by SSML aim
to express just that:
x-soft => "softest audible"
soft => ?
medium => "preferred volume"
loud => ?
x-loud => "loudest tolerable"
...but of course the "soft" and "loud" values remain slightly under-
specified (i.e. what should implementors do, and what should authors
expect when using these values ?).
> Afaict, it's unlikely that the absolute
> scale can be used for anything other than fading from x-loud
> to silence.
Sure, a cursor can be moved on the linear volume scale to animate the
wave amplitude, that's a useful feature in itself.
I agree that without a deterministic mapping between keywords (which
we assume represent "softest", "preferred" and "loudest" + two in-
between steps) and absolute values, authors cannot produce content
using numerical values that predictably meet concrete user needs or
user-agent's "reasonable" pre-defined settings, because, for example,
"medium" (or "preferred volume") may not necessarily correspond to
50.0 ... it could be 90 for a reading system operating in a loud
environment.
However, this doesn't mean that numerical values are pointless, in
fact there might also be use-cases where the enumerated keywords are
not used at all.
> Percentages are tricky, because due to nesting, it's not
> possible to reference against 'medium', which I assume in
> most cases is what you'd want to do, right?
Well, the remark above about the usefulness of absolute numerical
values apply to percentages too, given that they are relative to the
inherited computed value which is situated on the somewhat-abstract
linear [0,100] amplitude scale.
We would need another syntax of property value in order to provide
volume adjustment relative to a keyword. For example:
span.half-x-loud
{
voice-volume: 50% x-loud;
}
Are you requesting this feature, or merely pointing-out that it is not
currently doable ? In my opinion, this is still as non-deterministic
as the absolute values case ("50% x-loud" may effectively resolve to
"medium"...but maybe not).
> It seems to me that what an author would really need is a
> scale that varies between "softest audible", "loudest
> tolerable", and "preferred volume", where each of these are
> set by the listener. The keywords give you that scale, but
> there are only 5 points on this scale, as opposed to infinite
> on the absolute scale, which strikes me as less useful in
> general...
Well, we either have a (short) enumeration, with tangible, easily-
usable mapping to user values, or we have a scale with a large number
(technically, near-infinite) of abstract steps. Currently, we provide
both, and the only direct connection between the two is the 0/min and
100/max boundaries. It works (i.e. it can be implemented
unambiguously), but I agree that we lack a good understanding of how
authors benefit from the enormous number of absolute values.
> I'm having a hard time understanding how the capabilities
> of this property would be used, but I suspect it's not matching
> the authoring story very well. Perhaps you could explain how
> voice-volume values other than the keywords would be used?
I don't have a concrete usage in mind where absolute numerical values
would be more useful to authors than 3 (or 5) pre-defined user-centric
keyword-based volume levels.
I am not aware of SSML's rationale for this design choice, but I think
CSS-Speech should aim to remain compatible with SSML notation. It
doesn't really hurt anyone, right ? Unless of course the specification
itself is ambiguous, which I think isn't.
Regards, Daniel
Received on Thursday, 28 April 2011 22:51:06 UTC