- From: Geoff Freed <geoff_freed@wgbh.org>
- Date: Wed, 25 Nov 2009 11:40:45 -0500
- To: Eric Carlson <eric.carlson@apple.com>, Silvia Pfeiffer <silviapfeiffer1@gmail.com>
- CC: HTML Accessibility Task Force <public-html-a11y@w3.org>
- Message-ID: <C732C83D.737A%geoff_freed@wgbh.org>
Just a few comments inline. Geoff/NCAM On 11/25/09 11:01 AM, "Eric Carlson" <eric.carlson@apple.com> wrote: On Mon, Nov 23, 2009 at 1:02 PM, Silvia Pfeiffer wrote: As a three sentence summary: Basically, I believe that the 90% use case for the Web is that of a time-linear media resource. Any other, more complex needs, that require multiple timelines can be realised using JavaScript and the APIs to audio and video that we still need to define and that will expose companion tracks to the Web page and therefore to JavaScript. I don't believe that there will be many use cases that such a combination cannot satisfy, but if there are, one can always use the "object" tag and use external plugins to render the Adobe Flash, Silverlight or SMIL experience to produce this. BTW: talking about SMIL - I would be very curious to find out if somebody has tried implementing SMIL in HTML5 and JavaScript yet. I think much of what a SMIL file defines should now be able to be presentable in a Web Browser using existing HTML5 and JavaScript constructs. It would be an interesting exercise and I'd be curious to hear if somebody has tried and where they found limitations. While it is possible to implement some (many?) of the simplest SMIL constructs using HTML, CSS, and JavaScript, things will fall apart as soon as you try to do anything that requires synchronization of media elements (eg. a <par> containing <audio> and <video>). The latency between the JavaScript context and the media engine is just too high to synchronize starting and stopping multiple elements, never mind the clipping and sync behavior attributes. On Nov 25, 2009, at 5:29 AM, Silvia Pfeiffer wrote: On Wed, Nov 25, 2009 at 11:24 PM, Philip Jägenstedt <philipj@opera.com> wrote: I agree that syncing separate video and audio files is a big challenge. I'd prefer leaving this kind of complexity either to scripting or an external manifest like SMIL. We have to at minimum deal with multi-track video and audio files inside HTML, since they can potentially expose accessibility data: audio descriptions (read by a human), sign language (signed by a person), and captions are the particular tracks I am concerned about. I agree that we must support multi-track audio and video files. If a container format permits references to external media files (as QuickTime does) it is the job of the media engine to keep them in sync so we don't need to worry about it. GF: I agree, and also want to reiterate that maintaining external files for things like captions or descriptions is a lot easier than dealing with embedded files. Syncing with external resources is, in my view, the best way to go. The Google accessibility experts wanted at least the in-line caption tracks exposed in declarative language. This is because otherwise you cannot build a menu of all available tracks without having to start downloading and decoding the file. With this in mind, I think we have to expose all of the tracks available in a file in declarative language. GF: Agreed. Yes, this works for external additional tracks. Maybe then we can add the internal tracks inside the source elements, something like this: <video> <source src="video.ogg" type="video/ogg"> <track id='v' role='video' ref='serialno:1505760010'> <track id='a' role='audio' lang='en' ref='serialno:0821695999'> <track id='ad' role='auddesc' lang='en' ref='serialno:1421614520'> <track id='s' role='sign' lang='ase' ref='serialno:1413244634'> <track id='cc' role='caption' lang='en' ref='serialno:1421849818'> </source> <source src="video.mp4" type="video/mp4"> <track id='v' role='video' ref='trackid:1'> <track id='a' role='audio' lang='en' ref='trackid:2'> </source> <overlay> <source src="en.srt" lang="en-US"> <source src="hans.srt" lang="zh-CN"> </overlay> </video> Note I have made the track reference explicit through introducing a new "ref" attribute which uses encapsulation format specific references to track identifiers. I *really* don't like the idea of requiring page authors to declare the track structure in the markup. It seems to me that because it will require new specialized tools get the information, and because it will be really difficult to do correctly (ten digit serial numbers?), people are likely to just skip it completely. We need to create a specification that makes it as simple as possible for people to do the right thing. GF: Right- the idea of having to discover and include a complex structural element sounds offputting for authors. Identifying the resource as a caption/description/subtitle, then pointing directly to it would be simple and not unlike what authors are used to with SMIL. If we do allow this, what happens when the structure declared in the markup differs from the structure of the media file? On Wed, Nov 25, 2009 at 11:24 PM, Philip Jägenstedt <philipj@opera.com> wrote: centerTop/centerBottom are appropriately defined in CSS. Those are almost like the default styling approaches I suggested for itextlist / itext.[2] There, I also assumed there was a display area as large as the video or actually just a little larger available to render the time-aligend text into. It's larger since sometimes it is better not to overlay stuff but to place it right next to the video, e.g. just above it (title-like) or just below it but visually part of the video window. We shouldn't make assumptions about the size of an overlay, if someone want to display something outside of the media element's bounds they can use CSS. GF: This would also take into account the possibility that someone would want to create a caption region that is actually wider than the video region- useful when the video region itself is too small to contain a useful amount of overlayed caption or subtitle text. eric
Received on Wednesday, 25 November 2009 16:41:26 UTC