Re: Issues raised during discussion with RealNetworks and Microsoft

At 07:29 PM 2001-04-18 -0400, Ian Jacobs wrote:
>Hello,
>
>During my recent visits to Microsoft and RealNetworks, a couple
>of issues arose while reviewing the 9 April 2001 draft [1]. 
>Please see my comments below.
>

AG::  I can't tell from the styling of this communication what is yours and
what is the commenter's.

> - Ian
>
>[1]
<http://www.w3.org/TR/2001/WD-UAAG10-20010409/>http://www.w3.org/TR/2001/WD-
UAAG10-20010409/
>
>---------------------------------------------
>Issue 1: What should be done with lost packets?
>[Proposal]
>---------------------------------------------
>
>Several checkpoints (at least 2.4, 3.5, and 4.4) involve
>requirements that, in certain scenarios, may result in "lost
>packets", i.e., information that may not be viewed when it is
>available and may not be available in the same manner at a later
>time. 
>
>For instance, imagine a real-time presentation of a baseball
>game. The presentation includes some enabled elements (e.g.,
>links) that are only available for 30 seconds each.  If the user
>pauses the presentation (per checkpoint 2.4), what should the
>user agent do with the lost packets? I don't believe that we
>intended 2.4 to require the user agent to buffer information
>(potentially infinitely) while the presentation is paused.
>
>Proposal:
>
>Clarify in checkpoints 2.4, 3.5, and 4.4 that for some
>presentations, the required functionality may result in
>information loss. It may be possible to determine from the format
>that a presentation is "live". In this case, I think we should
>suggest in the techniques document that the user agent should
>alert the user (notably in the configuration to pause
>automatically) that pausing may lead to information loss. We can
>also recommend some buffering.
>

AG:: yes, the extent of buffering is implementation dependent.  If information
loss is a consequence of pausing, pains should be taken to allow the user to
decide what information to miss -- the old or the new.

If information loss is always true on pausing a real-time stream, the user
should perhaps have to confirm the pause.  If loss begins after a finite
timeout, an advisory when the user pauses is probably appropriate, possibly
with a transient (no response required) alert when the buffer is full and bits
start dumping over the edge.

>----------------------------------------------
>Issue 2: Will checkpoint 2.4 be useful in heavily
>interactive presentations?
>[No proposal]
>---------------------------------------------
>
>In many situations, dynamic content may be accompanied by banner
>advertisements, for instance. Imagine a presentation where the
>top of the presentation is occupied by a series of eighty banner
>ads, one after the other, each lasting 30 seconds.  It would seem
>that pausing the presentation every thirty seconds to allow for
>user input (for ads or some other content) would not make for a
>very positive user experience. In short, dynamic content with
>frequent and numerous opportunities for interaction would not
>be very usable if paused so frequently. Consider also a stock
>ticker, where each symbol is a link to that company's home page
>(or data about that company). How would 2.4 work in this case?
>
>I don't have any alternatives to suggest. I am satisfied to leave
>2.4 as is, and to be aware that for some content, the pause
>functionality may not produce very useful results.
>
>I do note that checkpoint 2.4 is a global configuration
>requirement. It might be very useful if the user agent were to
>allow the user to not pause selected elements (that might be
>selected interactively). That way, the user could say "for this
>presentation, ignore this stock ticker and this other set of
>banner ads, but pause for everything else that requires input in
>a finite time interval." I do not want to add element-level
>control as a requirement in UAAG 1.0.
>

AG:: leave as is.  For an ADD user the tone of the presentation described by
the commenter could be totally defeating.  These users will use the form field
where you enter your stock symbol, or scroll the list, and not wait for a
symbol to stream by on a ticker.  On the other hand, other users are quite the
other way, as illustrated by the fact that scanning software is a standard
piece of assistive technology.  It just cycles the focus through the plausible
next actions until you say 'yes' to one of them.

>----------------------------------------------
>Issue 3: What is the scope of 2.4? What must be paused?
>[No proposal]
>---------------------------------------------
>
>Imagine some content where two interactive streams are playing at
>the same time, but they are not explicitly synchronized with each
>other (the synchronization case is covered by 2.6). When the user
>agent pauses (per 2.4) to allow for user input related to the
>first stream, what should happen to the second stream?  Should it
>be paused as well, or should it continue?  When would the user
>agent recognize that two streams are synchronized or not (e.g.,
>in SMIL would the <par> element suffice to indicate
>synchronization?)
>

AG:: in SMIL you always know the synchronization requirements across streams. 
So the SMIL player knows if you can pause one component independently or not. 
If the two media objects are referenced from the same SMIL file, the answer to
this question is given by the SMIL markup and the semantics of that markup.

 <http://www.w3.org/TR/smil20/smil-timing.html#adef-syncBehavior>http://www
.w3.org/TR/smil20/smil-timing.html#adef-syncBehavior

Beyond that, it should be the smallest super-presentation (where the
collection
of all open windows in a windowing system is a super-presentation) for which
there are sync constraints to require it.  So the whole screen doesn't have to
stop when you stop one thing, necessarily.

>----------------------------------------------------------
>Issue 4: Conformance for some formats only must be clarified
>[Proposal]
>----------------------------------------------------------
>
>It is my understanding that our document allows conformance for a
>subset of all formats implemented by the user agent. For
>instance, the claimant might choose to claim conformance for HTML
>and PNG, but not for JPEG, even if it implements JPEG. Imagine a
>media player that implements 20 formats. A developer may not wish
>to claim conformance for all 20, and shouldn't be required to.
>
>Checkpoint 2.1 reads:
>
>   "For all format specifications that the user agent implements,
>   make content available through the rendering processes
>   described by those specifications."
>
>I think that "for all" needs to be restricted to "for all that
>are part of the conformance claim". I think the same change needs
>to be made for checkpoint 2.2 ("For all text formats..."),
>8.1 ("of all implemented specifications"), and 8.2.
>
>I'm not sure how to rewrite them (as I don't want to mention
>conformance in the checkpoints if I can avoid it). Something like
>this might be reasonable:
>
><NEW 2.1> 
>For the format specifications implemented to satisfy the
>requirements of this document, make content available through
>the rendering processes described by those specifications."
></NEW 2.1> 
>
>------------------
>
><OLD 2.2> 
>For all text formats that the user agent implements, provide
>a view of the text source.
></OLD 2.2> 
>
><NEW 2.2> 
>For text formats implemented to satisfy the requirements
>of this document, provide a view of the text source.
></NEW 2.2> 
>
 
AG:: I would like to have an example of hardship imposed by the simpler rule
before considering this level of hair.  That is for 2.1 and 2.2, I don't see
the problem.  8.1 is another matter.

>------------------
>
><OLD 8.1> 
>Implement the accessibility features of all implemented
>specifications (etc.).
></OLD 8.1> 
>
><NEW 8.1> 
>Implement the accessibility features of all specifications
>implemented to satisfy the requirements of this document (etc.)
></NEW 8.1> 
>
>------------------
>
><OLD 8.2> 
>8.2 Use and conform to ...
></OLD 8.2> 
>
><NEW 8.2> 
>8.2 To satisfy the requirements of this document, 
>    use and conform to ...
></NEW 8.2> 
>

AG:: This is a genuine problem.  The browser configuration item does not
decide
whether it implements Flash or not.  It just checks locally for a compatible
handler for the medium.

>----------------------------------------------------------
>Issue 5: Checkpoint 3.3 (blinking/animation) and streams
>[No proposal]
>----------------------------------------------------------
>
>What is the relationship between streaming and animated text?
>Animated text may be part of a text stream (so the text content
>is not all available at time "t"). How should the animated text
>be rendered as motionless text (per 3.3) in that case?
>
>I don't think that the user agent should have to wait for the
>entire text stream to render part of it as motionless text.
>
>I can imagine the "subtitles" technique, where a phrase of text
>is rendered for a few seconds, then another phrase, etc.  (I
>don't think that this should be considered an animation, since
>this is not a "visual movement effect" as mentioned in the
>glossary.) Furthermore, in this case, I don't think there's
>interaction between checkpoint 3.3 (animated text) and 2.6
>(respect synchronization cues).
>

AG::  Who is saying what, here?  I can't follow the boundaries of 'animation'
in any of the above.

Yes, there is interaction between 3.3 and 2.6 where timed text captions are
synchronized with a video, for example.  Stopping the text from changing will
(depending on the SMIL synchronization details as discussed above) sometimes
require that the video be stopped.

There may be a requirement buried in here for a pop-on presentation method to
be an option for any timed-text stream, but we haven't studied it in enought
detail to define a hard and fast requirement.

>----------------------------------------------------------
>Issue 6: Checkpoint 4.6: Captions positioning
>[Proposal]
>----------------------------------------------------------
>
>What happens when the author has laid out captions with some
>particular constraints (e.g., take up fifty percent of the
>parent's available horizontal width and be centered within that
>width)? Should the user be able to override that? What happens to
>the rest of the layout?
>
>Checkpoint 4.6 reads: "For graphical viewports, allow the user to
>position text transcripts, collated text transcripts, and
>captions in the viewport." However, I can imagine techniques
>(that might even address the previous question) where a solution
>would be to render the captions in a separate viewport (i.e., not
>in the same viewport, which is suggested by the end of 4.6). Did
>we mean to exclude the technique of rendering in a separate (and
>positionable) viewport? 
>
>Proposed:
>
><NEW 4.6>
>  "For graphical viewports, allow the user to position text
>  transcripts, collated text transcripts, and captions in the
>  same or another viewport."  
></NEW 4.6>
>
>For example, the user might be able to select captions and
>"extract them" from the presentation into a second viewport,
>leaving the layout otherwise intact.
>

AG:: I do not concur with this proposal.  In the earlier discussion, we
pointed
out that this positioning capability was so a low vision user could control
what portion of the video is adjacent to the captions and hence visible within
the cropped effective viewport along with the captions when the user is
using a
high power of magnification.  Moving the captions into a separate window
prevents the user from having the juxtaposition capability they need.

Another viewport is sometimes desirable, but does not satisfy this
requirement.

What happens to the rest of the layout?  It is unaffected, as if grouped into
wallpaper, for the purposes of positioning the caption display.


>----------------------------------------------------------
>Issue 8: Checkpoint 10.9: Scope of position indicator?
>[Proposal]
>----------------------------------------------------------
>
>Checkpoint 10.9 reads:
>
> "Indicate the relative position of the viewport in rendered
> content (e.g., the proportion of an audio or video clip that has
> been played, the proportion of a Web page that has been viewed,
> etc.)."
>
>Imagine a presentation with 80 audio clips in a row (this could
>be done in SMIL with a <seq> element). Should the position
>indicator account for all 80? Or each one, one at a time?  I
>wouldn't want the user agent to have to go out to the Web to get
>duration information about all 80 clips in advance in order to
>build a proportional position indicator. Instead, I think it
>would be reasonable to display in that case something like "First
>of 80 clips, 20% of first clip".
>
>I think we should state explicitly that do *not* specify how such
>cases should be handled, only that the user have some indication
>of time elapse. 
>

AG:: Yes.  Should be relative where possible, absolute offset from some
articulable landmark when relative is not possible.  20% of first of 80 clips
is useless.  20% of 32 sec. for 1st of 80 clips is meaningful.

>---------
>Editorial
>---------
>
> - Section 1.2 talks about "mainstream" user agents. The Palm
> Pilot is mainstream (lots of people have one), but is not a
> target platform. Instead, we should be more precise and talk
> about personal computers or desktop personal computers or
> something similar.
>
> - It might be useful to mention the term "audio mixer" in
> checkpoint 4.10 since that's a likely technique.
>
> - The 11 April version of the document is split into a number of
> sections. There should be a next/previous/contents navigation
> bar at the bottom of each section.
>
>---------
>Techniques
>---------
>
>Checkpoint 2.4:
>
> - For a presentation that is not "live", present the user with
>   a list of time-sensitive links (essentially making them
>   time-independent).
>
>-- 
>Ian Jacobs (ij@w3.org)   <http://www.w3.org/>http://www.w3.org/People/Jacobs
>Tel:                     +1 831 457-2842
>Cell:                    +1 917 450-8783
>  

Received on Saturday, 21 April 2001 00:37:52 UTC