W3C home > Mailing lists > Public > www-style@w3.org > August 2004

Re: [css3-speech] Proposal: an aural box model

From: csant <csant@csant.info>
Date: Fri, 06 Aug 2004 00:31:45 +0200
To: "www-style.w3.org" <www-style@w3.org>
Message-ID: <opsb9567rtd84skq@csant.info>

On Thu,  5 Aug 2004 14:34:25 -0700 (PDT), T. V. Raman <tvraman@us.ibm.com>  

> As we think about the box or flow model for audio
> in ACSS with respect to pause vs cue, all we may  need to define is
> that pause is  "stretchable" i.e. when you have multiple chunks
> of pause coming together, it stretches or shrinks to fill a
> bounding interval.

This is what I would base on the so-far well accepted mechanism of  
collapsing: if two pauses meet, they do not add up, but the longest of the  
two will be applied.

> The stretch/shrink notion is important, since if the bulleted
> list also ends the containing section, the pause after it *does*
> need to be a little longer than if it were being followed by a
> paragraph.

In my opinion a stretch/shrink notion would be much overhead introduced to  
little benefit. If stretch/shrink is simply based on the concept of  
'margin/pause-collapsing' there is little that needs to be added. As for  
your example of list with some default 'pause' around it, and a somewhat  
shorter default pause around list items - when the last list item is read  
out, pause collapsing happens and the pause after the last list item will  
equal the pause after the list - that is, it will be a bit longer than  
inside the list.

Of course these defaults can be overwritten ad libitum by the author.

> I would avoid pushing the visual analogy too far except where it
> makes sense, i.e. there is no notion of a margin in audio since
> the moddel if that of a linearly scrolling ticker tape --- rather
> than a two-dimensional scroll.

I totally agree on this last point - the audio canvas is one-dimensional,  
monolinear. But there are sounds and silences, and they happen in a  
certain sequence: my proposed model for this sequence of silences and  
sounds is (from left-to-right, only!)

silence - sound - silence - <element> - silence - sound - silence

or, using a more technical terminology

pause-before - cue-before - "padding" - <element> - "padding" - cue-after  
- pause-after

where "padding" should please be replaced with a good property-name...

This model closely reminds us of the boxmodel if we take the visual box  
model, cut through it to reduce it to a two-dimensional plane and move  
across this plain in only the one direction that time flows us: for the  
English speaking culture (and, btw, for musical scores) this goes from  
left to right when transcribed visually (like my string above is).

> As a case in point, a bulleted list in the visual domain is often
> cued by indentation -- with multiple levels of nesting causing
> mutliple levels of indentation. An effective means of conveying
> this auditorily turns out to be to use something like voice pitch
> -- rather than pause -- which is what you'd end up with if you
> simply mapped margin to puase.

I allow myself to respectfully disagree on using pitch for this means.  
IMHO we should aim at creating an aural flow as natural as possible, and  
ending up with an extremely high pitched voice to hear the third nested  
list is not the best means to achieve this.

My take would rather be (in a default voice stylesheet) to include some  
additional information :before a list: "list start", and then "first  
nested list start" - we have all the means in the generated content to  
make this as flexible as possible, including counters.

There is no such thing as centuries of typographical tradition to offer a  
standard model on how to style lists aurally - but there is our experience  
as human beings reading out lists to other people over the phone: and  
never, I would guess, did anybody raise the pitch of his voice to read out  
a nested list. It seems more likely that he/she would add some additional  
information which does not directly belong to the list ("here comes a  
nested list"); and, yes, this additional, non-structural information is  
clearly set off against the structural information by changing (usually,  
in Western cultures, by lowering) the pitch of the voice, and by  
appropriately pausing in the right moments. (After all, CSS is there to  
add non-structural information to allow the user to better perceive the  


"He is old". But she is wrong. It is not age; it is that a drop has  
fallen; another drop.
~~~ Virginia Woolf
Received on Thursday, 5 August 2004 18:34:38 UTC

This archive was generated by hypermail 2.3.1 : Monday, 2 May 2016 14:27:14 UTC