- From: Daniel Weck <daniel.weck@gmail.com>
- Date: Wed, 11 May 2011 01:27:25 +0100
- To: W3C style mailing list <www-style@w3.org>, fantasai <fantasai.lists@inkedblade.net>
Fixed in the latest editors' draft. This was actually a regression bug, since the CSS21 Aural Stylesheet Appendix defined volume levels correctly. Note that I added the value "silent" to audio cues as "0" now means something different. http://dev.w3.org/csswg/css3-speech/#mixing-props-voice-volume http://dev.w3.org/csswg/css3-speech/#cue-props http://www.w3.org/TR/CSS21/aural.html#propdef-volume On 28 Apr 2011, at 23:50, Daniel Weck wrote: > On 28 Apr 2011, at 08:00, fantasai wrote: >> voice-volume >> >> # silent, x-soft, soft, medium, loud, and x-loud >> # A sequence of monotonically non-decreasing volume levels. >> # The value of ‘silent’ is mapped to ‘0’ and ‘x-loud’ is >> # mapped to ‘100’. The mapping of other values to numerical >> # volume levels is implementation-dependent and may vary >> # from one speech synthesizer to another. >> >> Because this definition doesn't map 'medium' to anything, it >> makes it near-impossible for an author to use the absolute >> values, assuming 'medium' (and not 'x-loud') is user's >> preferred volume and the author intends to use that as the >> baseline volume. > > Well, the volume scale is linear amplitude, so (for the sake of > argument) a simple fix would be to explicitly state the actual > values corresponding to each keyword: > > silent => 0 > > x-soft => 15 > soft => 30 > medium => 50 > loud => 75 > > x-loud => 100 (max tolerable loudness, defined by user) > > _however_, this has limited usefulness, because the keywords are > just "shortcuts" to numerical values (i.e. "named values"). As you > rightly said, a more useful feature would be a keyword enumeration > that maps to "softest audible", "loudest tolerable", and "preferred > volume". My feeling is that the 5 values (excluding silence) defined > by SSML aim to express just that: > > x-soft => "softest audible" > soft => ? > medium => "preferred volume" > loud => ? > x-loud => "loudest tolerable" > > ...but of course the "soft" and "loud" values remain slightly under- > specified (i.e. what should implementors do, and what should authors > expect when using these values ?). > >> Afaict, it's unlikely that the absolute >> scale can be used for anything other than fading from x-loud >> to silence. > > Sure, a cursor can be moved on the linear volume scale to animate > the wave amplitude, that's a useful feature in itself. > > I agree that without a deterministic mapping between keywords (which > we assume represent "softest", "preferred" and "loudest" + two in- > between steps) and absolute values, authors cannot produce content > using numerical values that predictably meet concrete user needs or > user-agent's "reasonable" pre-defined settings, because, for > example, "medium" (or "preferred volume") may not necessarily > correspond to 50.0 ... it could be 90 for a reading system operating > in a loud environment. > > However, this doesn't mean that numerical values are pointless, in > fact there might also be use-cases where the enumerated keywords are > not used at all. > >> Percentages are tricky, because due to nesting, it's not >> possible to reference against 'medium', which I assume in >> most cases is what you'd want to do, right? > > Well, the remark above about the usefulness of absolute numerical > values apply to percentages too, given that they are relative to the > inherited computed value which is situated on the somewhat-abstract > linear [0,100] amplitude scale. > > We would need another syntax of property value in order to provide > volume adjustment relative to a keyword. For example: > > span.half-x-loud > { > voice-volume: 50% x-loud; > } > > Are you requesting this feature, or merely pointing-out that it is > not currently doable ? In my opinion, this is still as non- > deterministic as the absolute values case ("50% x-loud" may > effectively resolve to "medium"...but maybe not). > >> It seems to me that what an author would really need is a >> scale that varies between "softest audible", "loudest >> tolerable", and "preferred volume", where each of these are >> set by the listener. The keywords give you that scale, but >> there are only 5 points on this scale, as opposed to infinite >> on the absolute scale, which strikes me as less useful in >> general... > > Well, we either have a (short) enumeration, with tangible, easily- > usable mapping to user values, or we have a scale with a large > number (technically, near-infinite) of abstract steps. Currently, we > provide both, and the only direct connection between the two is the > 0/min and 100/max boundaries. It works (i.e. it can be implemented > unambiguously), but I agree that we lack a good understanding of how > authors benefit from the enormous number of absolute values. > >> I'm having a hard time understanding how the capabilities >> of this property would be used, but I suspect it's not matching >> the authoring story very well. Perhaps you could explain how >> voice-volume values other than the keywords would be used? > > I don't have a concrete usage in mind where absolute numerical > values would be more useful to authors than 3 (or 5) pre-defined > user-centric keyword-based volume levels. > > I am not aware of SSML's rationale for this design choice, but I > think CSS-Speech should aim to remain compatible with SSML > notation. It doesn't really hurt anyone, right ? Unless of course > the specification itself is ambiguous, which I think isn't. > > Regards, Daniel Daniel Weck daniel.weck@gmail.com
Received on Wednesday, 11 May 2011 00:27:51 UTC