RE: [media] change proposals for issue-152 from Sean Hayes on 2011-03-29 (public-html-a11y@w3.org from March 2011)

From: Sean Hayes <Sean.Hayes@microsoft.com>
Date: Tue, 29 Mar 2011 16:58:39 +0000
To: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
CC: HTML Accessibility Task Force <public-html-a11y@w3.org>
Message-ID: <8DEFC0D8B72E054E97DC307774FE4B913FB27762@DB3EX14MBXC303.europe.corp.microsoft.c>

It's possible that my issues aren't all tightly coupled to the idea of having track elements handled in the same way as video elements. Let's be clear then my objection to your proposal is having tracks containing text be handled in a fundamentally different way to tracks containing video and being constrained to the video rectangle. I don't need text tracks to be top level elements, nor indeed any specific markup/API solution, but it does seem to me that striving for a smaller set of components and have them share a common model where possible is a good thing when designing a new feature.

"I'm sorry, but it seems to me that you might have mistaken a joke for agreement."
10 hours of discussion over 3 sessions and writing up a summary in the wiki-page is a very elaborate joke, I'm not sure what the point of it was but clearly worked as it does indeed seem to have been a waste of my time being present at the f2f.

In order for me to understand your proposal perhaps you'd address the following:

If it's difficult to put the text track in the viewport of a video when it's a separate element, how do you propose doing it for video?

Can you describe in the "embedded in viewport" model how I spread captions across two videos placed side by side.

The networkState states (or something very like them) are likely to be required if we ever intend to support live streamed captions, what's the plan for that?

Separating out the videos does not necessarily make life easier, not only do you have to explain away the redundant attributes, and continually repeat the timeline='...' attribute on slave elements, making it more verbose and error prone, you now have the opportunity for a whole bunch of coding errors that you wouldn't have to deal with in a nested model, for example:

What is the behavior if video A references video B's timeline, and video B references video A's timeline? Who gets the controls?

Is it legal for video A to slave to video B which slaves to video C? If not what is the error behavior. If so, what is the behavior if there is a cycle?

I strongly disagree that it is a good thing to have to make audio into a visual container in order to put captions into the page, it makes it less likely that authors are going to do the right thing. Moreover, since you still have to use CSS to make the null video have a sensible shape, why not apply CSS directly to the content you want to put in there.

" Your proposal starts with the use of a text track that stands alone.
What would be its visual representation? What the use case?"
It's would be a block container, like div. It's use case is the ability to present timed markup anywhere in the layout, and in particular to provide one encompassing caption area over any number of videos (for example a page full of thumbnails).

"So, are you saying that you still favor the #10 solution that we first discussed in San Diego?"
As addressing the principles I'm concerned about and as the starting point for continued discussion yes. As a concrete solution, no not necessarily.

"Are you concerned about black bars and the like?"
No. I'm concerned with a page that contains a set of videos, some of which may be too small (e.g. thumbnails) to effectively display captions in their viewport, and having a place to put those captions over the set as a whole.

"The main reason for moving away from it is that we realized that we
were re-inventing for audio and video tracks exactly the same
functionality that is already present for audio and video elements."
Only you are still re-inventing, because now you have to add a whole bunch of special case code for top level video elements that aren't really top level elements to unhook their controls, handle their text tracks, remove the poster etc. and logic to deal with errors in hooking the elements together.

"it makes a lot of sense to have multiple tracks
displayed next to each other rather than obstruct each other by trying
to render into the same viewport. An author would be utterly confused
if he defined multiple video tracks, but would only every by default
see a single video track".
Right, but that's only a result of the insistence that the video creates a viewport. If instead it created an equivalent to an absolutely positioned containing box that behaves as a flow container that expands to accommodate its children, authors would get an different experience. Since all of your examples put the videos in a parent div anyway that does essentially that, it seems to me that's the most likely scenario anyway. Text tracks, in order to overlay the parent, can be defined as display:absolute with default origin and extent calculated to the video rendering area.

Received on Tuesday, 29 March 2011 17:00:38 UTC