some points w.r.t. streaming and buffering scenarios from Al Gilman on 2003-02-07 (public-tt@w3.org from February 2003)

From: Al Gilman <asgilman@iamdigex.net>
Date: Fri, 07 Feb 2003 16:24:51 -0500
To: public-tt@w3.org
Message-Id: <5.1.0.14.2.20030207151109.023beec0@pop.iamdigex.net>
** quickLook

- creation cases include 'capture' in an end-to-end real-time context.
'Authoring' does not cover this case.  Add "real-time capture" to the
origination case repertory.

- Braille users may need to control the release timing of access units,
i.e. not always allow the content to push onto the display.  This calls
for a mini-buffer as part of the implementation of this adapted access
in the playback process.

- We want to define the timing function as purely as we can, use XML for
what it helps us with but not slavishly use XML if in the end it is
seriously getting in the way.  We also want a ready-to-use profile for
captioners, not just the resources leaving a programming project for
instance creators.  There is some module+profile stuff in here, and possibly
some timingModel+demonstrationScenarios stuff as well.  It will be more
productive to define proofOfPudding demonstration/evaluation scenarios than
to argue Requirements in the absense of this framing.  If the SMIL timing
*model* doesn't work for our applications, MMI needs to know this.

** floodGate

There has already been an excellent discussion of this.

Since it's in a previous-month archive I will include the link[1] rather
than SMTP:reply to the thread started.

1. Contexts of use.  The requirements for 'capture' are more severe than for
'authoring.'  Make 'capture' [in a real-time context] one of the contexts of
use.  While people thought about Erik's mention of real-time applications in
terms of the transport and chunking issues, nobody seems to have pulled up
to the surface and really looked at the implications for the originating
process in a real-time application.

The term "authoring" is too static.  The discussion to date has not really
grasped the climate of values that arise in the creation of timed text in a
live event situation[2].  The point is that by character count more timed
text streams arise more in the process of instrumenting meetings for access
by the hearing impaired and for instant replay than in the preparation and
distribution of highly- polished publications such as movies and other
merchant multimedia products.

The TT format should meet the needs of the MMI group to support real time
collaboration among geographically distributed people.  In this case the
creation sets the streaming tempo and the transmission has to keep up and
can only deal with as much structure as has been recognized in the capture
process.

The playback need not be quite so lockstep, as will be discussed in a
subsequent point.

2.  Limits on control of playback timing in consideration of accommodating
individual needs.

Someone talked about the mission of the timing information as controlling the
display of the text.  The previous point attempts to make it clear that there
are equally valid use cases where no control is implied, that the timing marks
are there to document what was observed during capture.

In addition, it is necessary to put an inch of daylight between the timing
as marked and the display process as realized[3,4].  The point is that there
are relaxed display protocols which are plausible propositions as reasonable
accomodation, and these should be in our library of timeDisciplineDomainLinking
scenarios.

Glenn asked if there were anything that needed to be considered in
consideration of those reading the text via a Braille display.  Initially I
said that there was no special markup.  The timing information should be
able to be orthogonal to Braille-specific issue without too much difficulty
(still want to make sure that it is.  See more on orthogonality below).
However, in kicking this around in the WAI/PF group a *playback* requirement
did seem to emerge.  This was the remark that a Braille presentation of text
timed that had been authored on the assumption of visual display might
frequently be too fast for the user to follow.  Or to digest quickly enough
to be satisfied with lockstep push control of the display.

So we came up with a playback concept where the authored access units of
text would become available to the assistive playback module at the
author-planned events within the timebase of the playback or meeting
context; but that these would not necessarily pre-emptively overwrite the
display, but could be presented in a scrollable stream-of-lines buffer much
like a virtual scroll-mode terminal.  Both manualAdvanceToNextLine and
popOnUnlessScrollLock modes are candidates, and either one would probably
need a catchUp that is escapeToNow verb as well.

In any case, without claiming that the exact relaxed-sync mode of accessing
the text in the Braille line display has been fully developed, please be
aware that having the timed text dumped into an accessUnitQueue at the
client node to accomodate more user-directed display timing in the case of 
Braille
access is something that asForNow *should be possible and permissible*,
because the risk that the lockstep display or scrolling display will not be
usable is too real based on the current state of knowledge.  Where I say
'permissible' this means do not place language in the specification that
appears to forbid such user-controlled final display timing, rather integrate
the user control requirements of the UAAG into the final specification.
[Note that while the UAAG specifically states that buffering is not required,
in the case of timed text it may be reasonable to require, or anticipate the
provision of, some buffering at this point.]

3. Architectural approach:  IPPD -- Integrated Product and Process Development:
Develop Proof-of-Pudding demonstration cases, not just format requirements.

As Glenn said, the goal is to support a timing aspect in as modular a fashion
as possible.  On the other hand, the caption industry will need a fully-formed
ready-to-use profile of data formats.  The project probably has to have some
module encapsulation and some profile integration *both* in its work products.

Making the timing model independent of XML is attractive, or at least a wire
format binding over bare RTP where the RTP packet payloads are not
guaranteed to be well-formed XML in themselves but the timing is still
entirely documented is something that should be considered before being
rejected.

What is emerging in my mind as the right process approach to the development
and qualification of modules is that there should be some evaluation profiles
defined even 'though they do not become part of the normative provisions of
any specification.  Just of the text and evaluation master plan for the module.

There are a bunch of -ability operational requirements such as streamability.
I tried above to argue that there are applications that are intrinsically 
stream
at the source and so it is not just a distribution convenience to be compatible
with streaming conditions.  These -ability requirements should be demonstrated
before the work product becomes a Proposed Recommendation.  So the 
determination
of the "CR exit criteria" -- scenarios to have been demonstrated -- is as 
important
as line by line Requirements at the "what the format can represent" level.

There are suitabilityForIntendedUse concerns that do not map well to 
'requirements' that one can verify in a desk check of the format spec.  If 
we spend some time
outlining cases for evaluation at various stages in the development and 
deployment
of the technology, we can perhaps spend less time in endless circles debating
Requirements that are an awkward medium to capture what we really care about.


[1] http://lists.w3.org/Archives/Public/public-tt/2003Jan/thread.html#5

[2] timed-text applications with emphasis on meetings and other collaborations

  experimental speech-to-text service demonstration at SC2002
  http://trace.wisc.edu/handouts/sc2002/index.htm

.. and more generally

  http://trace.wisc.edu/world/modtrans/

[3] user control of timing requirements in User Agent Accessibility Guidelines

   http://www.w3.org/TR/UAAG10/guidelines.html#gl-user-control-styles

[4] general philosophy for blending source and destination constraints on
presentation

  http://lists.w3.org/Archives/Public/www-archive/2001Nov/0069.html
Received on Friday, 7 February 2003 17:10:13 UTC