Issues raised during discussion with RealNetworks and Microsoft

Hello,

During my recent visits to Microsoft and RealNetworks, a couple
of issues arose while reviewing the 9 April 2001 draft [1]. 
Please see my comments below.

 - Ian

[1] http://www.w3.org/TR/2001/WD-UAAG10-20010409/

---------------------------------------------
Issue 1: What should be done with lost packets?
[Proposal]
---------------------------------------------

Several checkpoints (at least 2.4, 3.5, and 4.4) involve
requirements that, in certain scenarios, may result in "lost
packets", i.e., information that may not be viewed when it is
available and may not be available in the same manner at a later
time. 

For instance, imagine a real-time presentation of a baseball
game. The presentation includes some enabled elements (e.g.,
links) that are only available for 30 seconds each.  If the user
pauses the presentation (per checkpoint 2.4), what should the
user agent do with the lost packets? I don't believe that we
intended 2.4 to require the user agent to buffer information
(potentially infinitely) while the presentation is paused.

Proposal:

Clarify in checkpoints 2.4, 3.5, and 4.4 that for some
presentations, the required functionality may result in
information loss. It may be possible to determine from the format
that a presentation is "live". In this case, I think we should
suggest in the techniques document that the user agent should
alert the user (notably in the configuration to pause
automatically) that pausing may lead to information loss. We can
also recommend some buffering.

----------------------------------------------
Issue 2: Will checkpoint 2.4 be useful in heavily
interactive presentations?
[No proposal]
---------------------------------------------

In many situations, dynamic content may be accompanied by banner
advertisements, for instance. Imagine a presentation where the
top of the presentation is occupied by a series of eighty banner
ads, one after the other, each lasting 30 seconds.  It would seem
that pausing the presentation every thirty seconds to allow for
user input (for ads or some other content) would not make for a
very positive user experience. In short, dynamic content with
frequent and numerous opportunities for interaction would not
be very usable if paused so frequently. Consider also a stock
ticker, where each symbol is a link to that company's home page
(or data about that company). How would 2.4 work in this case?

I don't have any alternatives to suggest. I am satisfied to leave
2.4 as is, and to be aware that for some content, the pause
functionality may not produce very useful results.

I do note that checkpoint 2.4 is a global configuration
requirement. It might be very useful if the user agent were to
allow the user to not pause selected elements (that might be
selected interactively). That way, the user could say "for this
presentation, ignore this stock ticker and this other set of
banner ads, but pause for everything else that requires input in
a finite time interval." I do not want to add element-level
control as a requirement in UAAG 1.0.

----------------------------------------------
Issue 3: What is the scope of 2.4? What must be paused?
[No proposal]
---------------------------------------------

Imagine some content where two interactive streams are playing at
the same time, but they are not explicitly synchronized with each
other (the synchronization case is covered by 2.6). When the user
agent pauses (per 2.4) to allow for user input related to the
first stream, what should happen to the second stream?  Should it
be paused as well, or should it continue?  When would the user
agent recognize that two streams are synchronized or not (e.g.,
in SMIL would the <par> element suffice to indicate
synchronization?)

----------------------------------------------------------
Issue 4: Conformance for some formats only must be clarified
[Proposal]
----------------------------------------------------------

It is my understanding that our document allows conformance for a
subset of all formats implemented by the user agent. For
instance, the claimant might choose to claim conformance for HTML
and PNG, but not for JPEG, even if it implements JPEG. Imagine a
media player that implements 20 formats. A developer may not wish
to claim conformance for all 20, and shouldn't be required to.

Checkpoint 2.1 reads:

   "For all format specifications that the user agent implements,
   make content available through the rendering processes
   described by those specifications."

I think that "for all" needs to be restricted to "for all that
are part of the conformance claim". I think the same change needs
to be made for checkpoint 2.2 ("For all text formats..."),
8.1 ("of all implemented specifications"), and 8.2.

I'm not sure how to rewrite them (as I don't want to mention
conformance in the checkpoints if I can avoid it). Something like
this might be reasonable:

<NEW 2.1> 
For the format specifications implemented to satisfy the
requirements of this document, make content available through
the rendering processes described by those specifications."
</NEW 2.1> 

------------------

<OLD 2.2> 
For all text formats that the user agent implements, provide
a view of the text source.
</OLD 2.2> 

<NEW 2.2> 
For text formats implemented to satisfy the requirements
of this document, provide a view of the text source.
</NEW 2.2> 

------------------

<OLD 8.1> 
Implement the accessibility features of all implemented
specifications (etc.).
</OLD 8.1> 

<NEW 8.1> 
Implement the accessibility features of all specifications
implemented to satisfy the requirements of this document (etc.)
</NEW 8.1> 

------------------

<OLD 8.2> 
8.2 Use and conform to ...
</OLD 8.2> 

<NEW 8.2> 
8.2 To satisfy the requirements of this document, 
    use and conform to ...
</NEW 8.2> 

----------------------------------------------------------
Issue 5: Checkpoint 3.3 (blinking/animation) and streams
[No proposal]
----------------------------------------------------------

What is the relationship between streaming and animated text?
Animated text may be part of a text stream (so the text content
is not all available at time "t"). How should the animated text
be rendered as motionless text (per 3.3) in that case?

I don't think that the user agent should have to wait for the
entire text stream to render part of it as motionless text.

I can imagine the "subtitles" technique, where a phrase of text
is rendered for a few seconds, then another phrase, etc.  (I
don't think that this should be considered an animation, since
this is not a "visual movement effect" as mentioned in the
glossary.) Furthermore, in this case, I don't think there's
interaction between checkpoint 3.3 (animated text) and 2.6
(respect synchronization cues).

----------------------------------------------------------
Issue 6: Checkpoint 4.6: Captions positioning
[Proposal]
----------------------------------------------------------

What happens when the author has laid out captions with some
particular constraints (e.g., take up fifty percent of the
parent's available horizontal width and be centered within that
width)? Should the user be able to override that? What happens to
the rest of the layout?

Checkpoint 4.6 reads: "For graphical viewports, allow the user to
position text transcripts, collated text transcripts, and
captions in the viewport." However, I can imagine techniques
(that might even address the previous question) where a solution
would be to render the captions in a separate viewport (i.e., not
in the same viewport, which is suggested by the end of 4.6). Did
we mean to exclude the technique of rendering in a separate (and
positionable) viewport? 

Proposed:

<NEW 4.6>
  "For graphical viewports, allow the user to position text
  transcripts, collated text transcripts, and captions in the
  same or another viewport."  
</NEW 4.6>

For example, the user might be able to select captions and
"extract them" from the presentation into a second viewport,
leaving the layout otherwise intact.

----------------------------------------------------------
Issue 8: Checkpoint 10.9: Scope of position indicator?
[Proposal]
----------------------------------------------------------

Checkpoint 10.9 reads:

 "Indicate the relative position of the viewport in rendered
 content (e.g., the proportion of an audio or video clip that has
 been played, the proportion of a Web page that has been viewed,
 etc.)."

Imagine a presentation with 80 audio clips in a row (this could
be done in SMIL with a <seq> element). Should the position
indicator account for all 80? Or each one, one at a time?  I
wouldn't want the user agent to have to go out to the Web to get
duration information about all 80 clips in advance in order to
build a proportional position indicator. Instead, I think it
would be reasonable to display in that case something like "First
of 80 clips, 20% of first clip".

I think we should state explicitly that do *not* specify how such
cases should be handled, only that the user have some indication
of time elapse. 

---------
Editorial
---------

 - Section 1.2 talks about "mainstream" user agents. The Palm
 Pilot is mainstream (lots of people have one), but is not a
 target platform. Instead, we should be more precise and talk
 about personal computers or desktop personal computers or
 something similar.

 - It might be useful to mention the term "audio mixer" in
 checkpoint 4.10 since that's a likely technique.

 - The 11 April version of the document is split into a number of
 sections. There should be a next/previous/contents navigation
 bar at the bottom of each section.

---------
Techniques
---------

Checkpoint 2.4:

 - For a presentation that is not "live", present the user with
   a list of time-sensitive links (essentially making them
   time-independent).

-- 
Ian Jacobs (ij@w3.org)   http://www.w3.org/People/Jacobs
Tel:                     +1 831 457-2842
Cell:                    +1 917 450-8783

Received on Wednesday, 18 April 2001 19:29:17 UTC