RE: order inversion 'twixt speech and vision from Al Gilman on 2003-02-08 (public-tt@w3.org from February 2003)

From: Al Gilman <asgilman@iamdigex.net>
Date: Sat, 08 Feb 2003 11:06:22 -0500
To: W3C Timed Text <public-tt@w3.org>
Message-Id: <5.1.0.14.2.20030208101850.025b2600@pop.iamdigex.net>
At 01:06 AM 2003-02-08, Glenn A. Adams wrote:


>There is a problem with the proposal below, which is that it
>requires that the timing containment and ordering relationships
>be congruous with the layout/formatting (box model) relationships;
>however, in general, this is not the case. This can potentially
>be handled, albeit, with a loss of generality, by flattening out
>either the timing model or the formatting model, i.e., assigning
>fixed times or positions to every element.

Techical response first:  Doesn't require that either one be flattened, that
is to say its order or hierarchy be suppressed.

What it requires is that the semantics of the binding be defined with
respect to a bag [an un-ordered superclass] view of the collection of stuff
which has been presented as a list or bag.

SMIL handles this by having one tree declare the planar structure of visual
display regions and assign [select from flat bag] IDs to them.  Then media
objects are associated with the display regions using the [flat bag] ID
index space.  So there is a [time X bag] order to the playlist and a planar
order of the display regions, but a media object in the playlist that
consumes display resources is bound to the display structure in an unordered
domain, there is no heredity (not even as an initial or default hint) of the
order in either domain into the binding.

The order in which linearly-connected-chains of things
are presented will often to be preferred in one order in visual layout
and in the reverse order in speech[1].  Thank you for raising this
theme.

Please note, however, that this [TT and its application case as HTML+time]
is the first place where so far as I know there may be an explicit
articulation of this model: that there is a topological chain which is
totally ordered except that there is no one definition of which end of the
chain is first; and the chain is presented as an ordered sequence in
different orders in different contexts.

In XML we have the default that the wire-format order has force of default
for the semantics of the content, so we are more likely to have a 'reversed'
semantic as in the bi-directionality algorithm for sequences of sub-structures
where 'reversed' means in the order opposite to the lexical order in the XML.

This topological class of graph -- an ambi-ordered linear chain -- is not
well handled yet in the habits of the web media.  But we may get there yet.

Al

[1] My iconic instance of this is the trail of scopes that one uses to 
locate current
context.  In vision this is presented outer-to-inner.  In speech one 
presents this
inner-to-outer.  Please note that such a breadcrumb trail is a potential 
assistive
option on the so-called "context menu" which may be likened to the "situation
summary" that one performs just before committing to a transaction when dealing
with a call center to order goods from a direct merchant or to book travel 
through
an agent.  In the GUI the context menu is only verbs because the depiction 
of the
context is clear.  In voice the context is not in a persistent buffer and 
should
be reviewed in this sort of a summary view.  Where am I, and What can I do 
here?

See also (string search for 'breadcrumb' in):

  W3C DIAT Workshop - Statements
  http://www.w3.org/2002/07/DIAT/stmts/Overview.html#gilman


The inner to outer recapitulation is part of longstanding oral tradition as
reflected in the vernacular song "There's a hole in the bottom of the 
sea."  While
the contexts are introduced top-down in successive verses, the breadcrumb 
trail
through the contexts is recapitulated bottom-up at the conclusion of each 
verse.

The actual constraint flow or decision process here is that the audio 
display medium
is ordered; the presentation controls the order of apperception, and the visual
display protocol, painting a whole screen as an atomic transaction, does 
not control
the order of the apperception of the information.  Since the screen does 
not in
this case perform a staged release of information there is no suspense 
value to be
had and the first prize is listed first as the skimming visual reader will 
often
only look for this and skim on.  In speech, however, where the listener is 
captive,
the suspense value is exploited to get people to listen to the lower ranks 
as the
suspense builds.

-- All quote below here
>As an example, consider the case:
>
><ol>
>   <li>Gold Winner</li>
>   <li>Silver Winner</li>
>   <li>Bronze Winner</li>
></ol>
>
>vs
>
><seq>
>   <p>Bronze Winner</p>
>   <p>Silver Winner</p>
>   <p>Gold Winner</p>
></seq>
>
>The natural layout order turns out to be reverse of the natural
>timing order.
>
>Regards,
>Glenn
>
> > -----Original Message-----
> > From: Charles Wiltgen [mailto:lists@wiltgen.net]
> > Sent: Friday, February 07, 2003 9:19 PM
> > To: List EUR W3C Timed Text
> > Subject: Re: TT and subtitling
> >
> >
> >
> > John Glauert wrote...
> >
> > > * Some semantics should be built in.
> > >  - Language
> >
> > Again stealing from a mature specification, here's Proposal 0.0 in two
> > languages:
> >
> > <switch>
> >     <seq systemLanguage="en">
> >         <tt:p begin="0s" dur="5s">One</tt:p>
> >         <tt:p dur="10s">Two</tt:p>
> >         <tt:p begin="1s" dur="5s" class="important">Three</tt:p>
> >     </seq>
> >     <seq systemLanguage="fr">
> >         <tt:p begin="0s" dur="5s">Un</tt:p>
> >         <tt:p dur="10s">Deux</tt:p>
> >         <tt:p begin="1s" dur="5s">Trois</tt:p>
> >     </seq>
> > </switch>
> >
> > This could also be done like this, I believe:
> >
> > <seq>
> >     <switch>
> >         <tt:p begin="0s" dur="5s" systemLanguage="en">One</tt:p>
> >         <tt:p begin="0s" dur="5s" systemLanguage="fr">Un</tt:p>
> >     </switch>
> >     <switch>
> >         <tt:p dur="10s" systemLanguage="en">Two</tt:p>
> >         <tt:p dur="10s" systemLanguage="fr">Deux</tt:p>
> >     </switch>
> >     <switch>
> >         <tt:p begin="1s" dur="5s" systemLanguage="en">Three</tt:p>
> >         <tt:p begin="1s" dur="5s" systemLanguage="fr">Trois</tt:p>
> >     </switch>
> > </seq>
> >
> > > I would like to add a plea to consider non-text!
> >
> > If TT is an extension of SMIL, then we get audio/video
> > integration for free.
> >
> > -- Charles Wiltgen
> >    <http://playbacktime.com/>
> >
> >
Received on Saturday, 8 February 2003 11:33:26 UTC