- From: Philip Jägenstedt <philipj@opera.com>
- Date: Wed, 25 Nov 2009 17:26:38 +0100
- To: "Silvia Pfeiffer" <silviapfeiffer1@gmail.com>
- Cc: "HTML Accessibility Task Force" <public-html-a11y@w3.org>
On Wed, 25 Nov 2009 14:29:37 +0100, Silvia Pfeiffer <silviapfeiffer1@gmail.com> wrote: > Hi Philip, all, > > See comments below inline. > > On Wed, Nov 25, 2009 at 11:24 PM, Philip Jägenstedt <philipj@opera.com> > wrote: >> >> I agree that syncing separate video and audio files is a big challenge. >> I'd >> prefer leaving this kind of complexity either to scripting or an >> external >> manifest like SMIL. > > We have to at minimum deal with multi-track video and audio files > inside HTML, since they can potentially expose accessibility data: > audio descriptions (read by a human), sign language (signed by a > person), and captions are the particular tracks I am concerned about. I agree and think that the tracks of the resource should be exposed via a DOM API. From a scripts point of view it should look the same whether the resource is Ogg, MPEG-4 or SMIL linking several tracks together. > There is also always the needs for different recording angles, but > let's leave that to javascript, where the whole media resource is > exchanged. Similarly, when we deal with different devices, we can also > exchange the complete media resource markup. > > So, focusing on a video with a + v + audio description + sign language > track + caption track, we still need to expose these tracks to the Web > browser to decide based on user preference settings whether to display > them or not. This is on top of and beyond the <itext> proposals I have > previously discussed. > > The Google accessibility experts wanted at least the in-line caption > tracks exposed in declarative language. This is because otherwise you > cannot build a menu of all available tracks without having to start > downloading and decoding the file. With this in mind, I think we have > to expose all of the tracks available in a file in declarative > language. > Who is building the menu? I really don't see a problem with waiting until metadataloaded for the menu to be available. Picking a language in the < 1 sec before that seems like a fringe use case which can be solved by sending the information in an site-specific format using data-* attributes or similar. >> Below I focus on the HTML-specific parts: >> >> Captions/subtitles... The main problem of reusing <source> is that it >> doesn't work with the resource selection algorithm.[1] > > Yes, I have noticed that problem, too. The resource selection > algorithm regards all of the <source> elements as alternatives to each > other. > >> However, that >> algorithm only considers direct children of the media element, so >> adding a >> wrapping element would solve this problem and allow us to spec different >> rules for selecting timed-text sources. Example: >> >> <video> >> <source src="video.ogg" type="video/ogg"> >> <source src="video.mp4" type="video/mp4"> >> <overlay> >> <source src="en.srt" lang="en-US"> >> <source src="hans.srt" lang="zh-CN"> >> </overlay> >> </video> > > Yes, this works for external additional tracks. Maybe then we can add > the internal tracks inside the source elements, something like this: > > <video> > <source src="video.ogg" type="video/ogg"> > <track id='v' role='video' ref='serialno:1505760010'> > <track id='a' role='audio' lang='en' ref='serialno:0821695999'> > <track id='ad' role='auddesc' lang='en' ref='serialno:1421614520'> > <track id='s' role='sign' lang='ase' ref='serialno:1413244634'> > <track id='cc' role='caption' lang='en' ref='serialno:1421849818'> > </source> > <source src="video.mp4" type="video/mp4"> > <track id='v' role='video' ref='trackid:1'> > <track id='a' role='audio' lang='en' ref='trackid:2'> > </source> > <overlay> > <source src="en.srt" lang="en-US"> > <source src="hans.srt" lang="zh-CN"> > </overlay> > </video> > > Note I have made the track reference explicit through introducing a > new "ref" attribute which uses encapsulation format specific > references to track identifiers. > <source> is a void element, so this markup does not degrade nicely in any shipped <video>-capable browsers. Try <http://software.hixie.ch/utilities/js/live-dom-viewer/saved/318>. Firefox puts the second <source> element inside nested <track> elements and Safari just drops it. That aside, I'm not convinced this is actually needed, as per above and agree with what Eric Carlson said. >> We could possibly allow <overlay src="english.srt"></overlay> as a >> shorthand >> when there is only one captions file, just like the video <video >> src=""></video> shorthand. >> >> I'm suggesting <overlay> instead of e.g. <itext> because I have some >> special >> behavior in mind: when no (usable) source is found in <overlay>, the >> content >> of the element should be displayed overlayed on top of the video >> element as >> if it were inside a CSS box of the same size as the video. This gives >> authors a simple way to display overlay content such as custom controls >> and >> complex "subtitles" like animated karaoke to work the same both in >> normal >> rendering and in fullscreen mode. (I don't know what kind of CSS spec >> magic >> would be needed to allow such rendering, but I don't believe overlaying >> the >> content is very difficult implementation-wise.) >> >> Naturally, CSS is used to style the captions: >> >> <video src="video.ogg"> >> <overlay src="en.srt" >> style="font-size:2em;padding:1em;text-align:center"></overlay> >> </video> >> >> If there is a use case, displaying several captions/subtitles at once >> could >> be allowed as such: >> >> <video src="video.ogg"> >> <overlay src="en.srt" class="centerTop"></overlay> >> <overlay src="hans.srt" class="centerBottom"></overlay> >> </video> > > Ah yes, that is replicating the hierarchical approach I took with > itextlist / itext.[2] They could also be more generic text than just > subtitles and captions - in particular textual audio descriptions have > been confirmed at TPAC to be very useful indeed. > Sibling <overlay>s with <source> children make at most a hierarchy in 2 levels, but sure. Anything that can be displayed graphically is suitable for <overlay>, although the natively supported formats will probably be limited to timed text via SRT and maybe something more complex like DFXP. >> centerTop/centerBottom are appropriately defined in CSS. > > Those are almost like the default styling approaches I suggested for > itextlist / itext.[2] There, I also assumed there was a display area > as large as the video or actually just a little larger available to > render the time-aligend text into. It's larger since sometimes it is > better not to overlay stuff but to place it right next to the video, > e.g. just above it (title-like) or just below it but visually part of > the video window. > Just to be clear, centerTop/centerBottom are user defined, nothing magic. As for the default stylesheet for <overlay> I'm not sure, maybe just "display:box" and the rest should be defined by the user. >> For what it's worth, it's easy to get this behavior (sans fullscreen) >> using >> scripting today, simply by cloning/moving the overlay elements outside >> of >> <vide> and positioning them on top using CSS. Even SRT retrieval (XHR), >> decoding (RegExp) and syncing (timeupdate event) is easy enough to do. > > It's indeed how I implemented the demos [3]. E.g. > http://www.annodex.net/~silvia/itext/elephant_no_skin_v2.html has divs > defined just outside the video element, but styled to sit directly > over the video. Is this something that we would need to declare > explicitly into the DOM or would that be something that the browser > can introduce at that position and expose to the DOM. Without the DOM > exposure, there is no adaptive styling. > Yes, that's where I got the idea, sorry for not linking... >> Comments? > > I think your ideas re CSS are great! I am as yet unsure how that can > be solved in the browser, so any ideas are very much welcome. > I think it's mainly a spec problem, not an implementation problem. For implementation one would simply render the CSS box for <overlay> separately and then blit that on top of the video, whether it is in fullscreen or not. When switching to fullscreen the <overlay> box would be resized to the size of the screen of course, and possibly media="projection" should then apply to make it possible to use different CSS in fullscreen and normal view. > Cheers, > Silvia. > > [2] https://wiki.mozilla.org/Accessibility/HTML5_captions_v2 > [3] http://www.annodex.net/~silvia/itext/ > >> [1] >> http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#concept-media-load-algorithm > -- Philip Jägenstedt Core Developer Opera Software
Received on Wednesday, 25 November 2009 16:27:24 UTC