- From: Daniel Weck <daniel.weck@gmail.com>
- Date: Thu, 28 Apr 2011 23:50:35 +0100
- To: W3C style mailing list <www-style@w3.org>, fantasai <fantasai.lists@inkedblade.net>
On 28 Apr 2011, at 08:00, fantasai wrote: > voice-volume > > # silent, x-soft, soft, medium, loud, and x-loud > # A sequence of monotonically non-decreasing volume levels. > # The value of ‘silent’ is mapped to ‘0’ and ‘x-loud’ is > # mapped to ‘100’. The mapping of other values to numerical > # volume levels is implementation-dependent and may vary > # from one speech synthesizer to another. > > Because this definition doesn't map 'medium' to anything, it > makes it near-impossible for an author to use the absolute > values, assuming 'medium' (and not 'x-loud') is user's > preferred volume and the author intends to use that as the > baseline volume. Well, the volume scale is linear amplitude, so (for the sake of argument) a simple fix would be to explicitly state the actual values corresponding to each keyword: silent => 0 x-soft => 15 soft => 30 medium => 50 loud => 75 x-loud => 100 (max tolerable loudness, defined by user) _however_, this has limited usefulness, because the keywords are just "shortcuts" to numerical values (i.e. "named values"). As you rightly said, a more useful feature would be a keyword enumeration that maps to "softest audible", "loudest tolerable", and "preferred volume". My feeling is that the 5 values (excluding silence) defined by SSML aim to express just that: x-soft => "softest audible" soft => ? medium => "preferred volume" loud => ? x-loud => "loudest tolerable" ...but of course the "soft" and "loud" values remain slightly under- specified (i.e. what should implementors do, and what should authors expect when using these values ?). > Afaict, it's unlikely that the absolute > scale can be used for anything other than fading from x-loud > to silence. Sure, a cursor can be moved on the linear volume scale to animate the wave amplitude, that's a useful feature in itself. I agree that without a deterministic mapping between keywords (which we assume represent "softest", "preferred" and "loudest" + two in- between steps) and absolute values, authors cannot produce content using numerical values that predictably meet concrete user needs or user-agent's "reasonable" pre-defined settings, because, for example, "medium" (or "preferred volume") may not necessarily correspond to 50.0 ... it could be 90 for a reading system operating in a loud environment. However, this doesn't mean that numerical values are pointless, in fact there might also be use-cases where the enumerated keywords are not used at all. > Percentages are tricky, because due to nesting, it's not > possible to reference against 'medium', which I assume in > most cases is what you'd want to do, right? Well, the remark above about the usefulness of absolute numerical values apply to percentages too, given that they are relative to the inherited computed value which is situated on the somewhat-abstract linear [0,100] amplitude scale. We would need another syntax of property value in order to provide volume adjustment relative to a keyword. For example: span.half-x-loud { voice-volume: 50% x-loud; } Are you requesting this feature, or merely pointing-out that it is not currently doable ? In my opinion, this is still as non-deterministic as the absolute values case ("50% x-loud" may effectively resolve to "medium"...but maybe not). > It seems to me that what an author would really need is a > scale that varies between "softest audible", "loudest > tolerable", and "preferred volume", where each of these are > set by the listener. The keywords give you that scale, but > there are only 5 points on this scale, as opposed to infinite > on the absolute scale, which strikes me as less useful in > general... Well, we either have a (short) enumeration, with tangible, easily- usable mapping to user values, or we have a scale with a large number (technically, near-infinite) of abstract steps. Currently, we provide both, and the only direct connection between the two is the 0/min and 100/max boundaries. It works (i.e. it can be implemented unambiguously), but I agree that we lack a good understanding of how authors benefit from the enormous number of absolute values. > I'm having a hard time understanding how the capabilities > of this property would be used, but I suspect it's not matching > the authoring story very well. Perhaps you could explain how > voice-volume values other than the keywords would be used? I don't have a concrete usage in mind where absolute numerical values would be more useful to authors than 3 (or 5) pre-defined user-centric keyword-based volume levels. I am not aware of SSML's rationale for this design choice, but I think CSS-Speech should aim to remain compatible with SSML notation. It doesn't really hurt anyone, right ? Unless of course the specification itself is ambiguous, which I think isn't. Regards, Daniel
Received on Thursday, 28 April 2011 22:51:06 UTC