- From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
- Date: Tue, 29 Mar 2011 17:37:20 -0700
- To: Sean Hayes <Sean.Hayes@microsoft.com>
- Cc: HTML Accessibility Task Force <public-html-a11y@w3.org>
Hi Sean, On Tue, Mar 29, 2011 at 9:58 AM, Sean Hayes <Sean.Hayes@microsoft.com> wrote: > It's possible that my issues aren't all tightly coupled to the idea of having track elements handled in the same way as video elements. Let's be clear then my objection to your proposal is having tracks containing text be handled in a fundamentally different way to tracks containing video and being constrained to the video rectangle. I don't need text tracks to be top level elements, nor indeed any specific markup/API solution, but it does seem to me that striving for a smaller set of components and have them share a common model where possible is a good thing when designing a new feature. > > "I'm sorry, but it seems to me that you might have mistaken a joke for agreement." > 10 hours of discussion over 3 sessions and writing up a summary in the wiki-page is a very elaborate joke, I'm not sure what the point of it was but clearly worked as it does indeed seem to have been a waste of my time being present at the f2f. Sorry I was unclear about the situation. The two days of hard work on option 10 were indeed no joke at all, but serious design work. As was the design work on the eventual change proposal. The eventual change proposal is a compromise (as are many things in HTML). The only bit that I joked about was to pull all tracks out from underneath <video>, including <track>. I'm terribly sorry about that misunderstanding. I understand your reasoning to support option number 10 and considered it as a valid alternative during much of the F2F, too. From a "clean design" point of view for the markup, it still makes a lot of sense to me. Having gone through the exercise of defining all the details on that proposal was a very good learning experience and showed me how much duplication it actually entails, in particular in the Javascript API for audio and video. Which is why we arrived at the compromise in the last minute. I would appreciate your input into the compromise, since many of your arguments hold. I think there are indeed things we didn't fully spec out in proposal2 given the short amount of remaining time. > In order for me to understand your proposal perhaps you'd address the following: Thanks. More than happy to. > If it's difficult to put the text track in the viewport of a video when it's a separate element, how do you propose doing it for video? Difficulty in rendering is about the default rendering, which should make immediate sense to the user and cover the 80% use case. Text should by default be rendered on top of the main video's viewport. Additional video tracks should by default be rendered next to the main video. This is achieved by proposal 2. Any other custom rendering has to be done through CSS, including picture-in-picture if that is required. Your proposal 3 has the same default rendering for video as proposal 2, plus it has the additional problem that the cues are also rendered separately somewhere on the page, giving little default indication of the relationship to the user. Option 10 in contrast has the problem that everything is rendered into the same video viewport. The proposed default rendering was to have the multiple video tracks each be rendered on top of the main video, thus randomly obstructing that video, which would not satisfy the 80% use case. So, only proposal 2 provides the correct default rendering both for video and text. > Can you describe in the "embedded in viewport" model how I spread captions across two videos placed side by side. This can only be done using CSS. This is true for all existing proposals and therefore not an objection to any of them. > The networkState states (or something very like them) are likely to be required if we ever intend to support live streamed captions, what's the plan for that? As I said: I can appreciate that we might need the networkState for text tracks. That is a separate issue from the one we are discussing here and a change that may be necessary for <track> anyway. I think we have a lot of feedback on <track> itself and need to address that properly. Let's keep it for another day. > Separating out the videos does not necessarily make life easier, not only do you have to explain away the redundant attributes, and continually repeat the timeline='...' attribute on slave elements, making it more verbose and error prone, you now have the opportunity for a whole bunch of coding errors that you wouldn't have to deal with in a nested model, for example: > > What is the behavior if video A references video B's timeline, and video B references video A's timeline? Who gets the controls? > > Is it legal for video A to slave to video B which slaves to video C? If not what is the error behavior. If so, what is the behavior if there is a cycle? This is a good point. I guess a slaved video cannot also be a master, so the intention here is probably transitive and thus the master video should probably get controls for all of them. Also a cycle reference would make all of them slaves, so none gets the controls. The behaviour would probably be undefined as it would be a markup error. But you are right - the use of the @timeline attribute makes the relationship definition easily prone to faulty markup. This applies both to proposal 2 and proposal 3. > I strongly disagree that it is a good thing to have to make audio into a visual container in order to put captions into the page, it makes it less likely that authors are going to do the right thing. Moreover, since you still have to use CSS to make the null video have a sensible shape, why not apply CSS directly to the content you want to put in there. This is again a different discussion to have, since it expresses a general disagreement with the way in which the current spec works for <audio>. <audio> is an element that has no visual presentation on screen, so there is no container into which you can render text and therefore there is no default rendering for <track> elements. They are, however, allowed and then cues are just exposed through JavaScript, so for custom display, it still works. If you want to change the way in which the <audio> element works in general, that is another discussion to have. Let's keep this separate from the multitrack discussion. > " Your proposal starts with the use of a text track that stands alone. > What would be its visual representation? What the use case?" > It's would be a block container, like div. It's use case is the ability to present timed markup anywhere in the layout, and in particular to provide one encompassing caption area over any number of videos (for example a page full of thumbnails). When there is no video element on the page, do the cues continue to display on a timeline? Do they have default controls? What if there are captions that are only activated 5min in - does the user have to wait for 5min to display them? And when there is actually a video element on the page - or several. Assuming they are all slaved to a master. Do you expect a default rendering across all of them? What if they are in different locations on the page? If you don't have such a default rendering, your proposal is in worse shape than proposal 2, because at least it has a default rendering and you can use JavaScript and CSS to create the display that you are proposing. > "So, are you saying that you still favor the #10 solution that we first discussed in San Diego?" > As addressing the principles I'm concerned about and as the starting point for continued discussion yes. As a concrete solution, no not necessarily. > > "Are you concerned about black bars and the like?" > No. I'm concerned with a page that contains a set of videos, some of which may be too small (e.g. thumbnails) to effectively display captions in their viewport, and having a place to put those captions over the set as a whole. That's an authoring issue. If your video viewport is too small, you should turn off automatic rendering and render your captions manually. > "The main reason for moving away from it is that we realized that we > were re-inventing for audio and video tracks exactly the same > functionality that is already present for audio and video elements." > Only you are still re-inventing, because now you have to add a whole bunch of special case code for top level video elements that aren't really top level elements to unhook their controls, handle their text tracks, remove the poster etc. and logic to deal with errors in hooking the elements together. You are right, there is some special functionality: the control on the master element gets a menu to turn tracks on and off, which includes the tracks of the slave elements. Also, all the timelines of all the elements are one and the same. Because all the timelines are the same, some of the IDL attributes that are related to playback of the slaves obviously cannot mean the same any more as when they were standalone. These changes are necessary for any multitrack solution. However, everything else is still possible, in particular: * it is possible to turn on controls on the slave videos and audios. Interacting with them is like interacting with the master. * it is still possible to attach text tracks to each video individually with individual presentation. You could for example have a translation of a sign language track be displayed directly on top of the sign language video, while the main video has the transcript of what is being said (thus helping people that are learning sign language). * it is still possible to read the state of the individual elements and determine what they are up to and associate events to them etc. > "it makes a lot of sense to have multiple tracks > displayed next to each other rather than obstruct each other by trying > to render into the same viewport. An author would be utterly confused > if he defined multiple video tracks, but would only every by default > see a single video track". > Right, but that's only a result of the insistence that the video creates a viewport. If instead it created an equivalent to an absolutely positioned containing box that behaves as a flow container that expands to accommodate its children, authors would get an different experience. Since all of your examples put the videos in a parent div anyway that does essentially that, it seems to me that's the most likely scenario anyway. Text tracks, in order to overlay the parent, can be defined as display:absolute with default origin and extent calculated to the video rendering area. In our original approach to the problem with option 10, I suggested changing the meaning of the video viewport to an element that will be filled with the video frames from all the tracks of the resource arranged inside the viewport as neighbors ("tiling"). This is introducing a whole new flow model for the viewport, in particular when we also want to position and display text tracks. I think it was this part about option 10 that made Eric and Frank cringe the most: they don't want to introduce a new layout engine for the video viewport, when the CSS layout engine already provides what is needed for multiple videos. I still maintain that the most typical layouts of multitrack video are: tiled, picture-in-picture, and as a scrollable list. I would personally like to see these encoded in CSS and thus make multiple videos layed out by just choosing one CSS value. However, I can see how that is an enormous burden on a browser and am happy to use a different approach and expect authors to do the styling. Cheers, Silvia.
Received on Wednesday, 30 March 2011 00:38:13 UTC