- From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
- Date: Wed, 27 Jan 2010 22:57:51 +1100
- To: Philip Jägenstedt <philipj@opera.com>, Eric Carlson <eric.carlson@apple.com>
- Cc: HTML Accessibility Task Force <public-html-a11y@w3.org>, Ken Harrenstien <klh@google.com>
Hi, I've spent the last day and a bit trying to catch up on this whole conversation and it's going to be a bit difficult to give good feedback without replying to several emails. I will try Ian's approach of cutting an pasting relevant bits from different emails to get something of a consistent discussion together again. Sorry if that's confusing. The first part of this email will focus on how to expose the track composition to the UA & javascript. The second part will focus on the <overlay> proposal. On Thu, Nov 26, 2009 at 3:26 AM, Philip Jägenstedt <philipj@opera.com> wrote: > On Wed, 25 Nov 2009 14:29:37 +0100, Silvia Pfeiffer >> On Wed, Nov 25, 2009 at 11:24 PM, Philip Jägenstedt <philipj@opera.com> >> wrote: >>> >>> I agree that syncing separate video and audio files is a big challenge. >>> I'd >>> prefer leaving this kind of complexity either to scripting or an external >>> manifest like SMIL. >> >> We have to at minimum deal with multi-track video and audio files >> inside HTML, since they can potentially expose accessibility data: >> audio descriptions (read by a human), sign language (signed by a >> person), and captions are the particular tracks I am concerned about. > > I agree and think that the tracks of the resource should be exposed via a > DOM API. From a scripts point of view it should look the same whether the > resource is Ogg, MPEG-4 or SMIL linking several tracks together. I agree with this, which is why I tried to create a markup that has elements inside the <source> element - the idea is to allow expression of the contained tracks of a media resource explicitly in declarative markup, such that the DOM API is obvious and javascript can deal with it. Let me cite my proposal again and clarify some things / ask some questions: <video> <source src="video.ogv" type="video/ogg"> <track id='ogg_v' role='video' ref='serialno:1505760010'> <track id='ogg_a' role='audio' lang='en' ref='serialno:0821695999'> <track id='ogg_ad' role='auddesc' lang='en' ref='serialno:1421614520'> <track id='ogg_s' role='sign' lang='ase' ref='serialno:1413244634'> <track id='ogg_cc' role='caption' lang='en' ref='serialno:1421849818'> </source> <source src="video.mp4" type="video/mp4"> <track id='mp4_v' role='video' ref='trackid:1'> <track id='mp4_a' role='audio' lang='en' ref='trackid:2'> </source> <overlay> <source src="en.srt" lang="en-US"> <source src="hans.srt" lang="zh-CN"> </overlay> </video> Eric said: > I *really* don't like the idea of requiring page authors to declare the > track structure in the markup. And Philip added to this: > I really don't see a problem with waiting until > metadataloaded for the menu to be available. Picking a language in the < 1 > sec before that seems like a fringe use case which can be solved by sending > the information in an site-specific format using data-* attributes or > similar. Ken Harrenstien from Google wrote this to me (and allowed me to quote him, which is why I cc-ed him): > The principal reason for wanting to allow explicit markup is latency > and infrastructure overhead. > > Without the markup, the only way to know what's in-band is to start > streaming the video. How long will it take to find out what kinds of > captions it contains and whether they are supported? How much > bandwidth and setup is wasted in the process? At Google we care very > deeply about those things. > > I think this information is very, if not exactly, analogous to the > other markup provided for <video>. I need it to tell immediately if the video is even > playable/watchable for me (as a hearing-impaired person). I believe he has a strong case. Further, if the media elements will indeed change from using @autobuffer to using @preload, where prefetching of no video data is possible, then the UA has to be told in some other way what the resource composition is. After all, the UA should display to the user what accessibility tracks are available and allow the user to turn them on/off (suggested to happen through a menu that is built by the UA and added to the video transport bar). Also, it is really important to expose the role (and the language) that a track takes on within a multitrack media file, such that a UA can decide whether to display a track or not and where to display it. I do believe that the control of which tracks are being displayed should stay with the UA and not be forced by the file or the media framework. I cannot see a better way for exposing this functionality uniformly across multiple media file types other than explicit markup. If we buried the track information in a javascript API, we would introduce an additional dependency and we would remove the ability to simply parse the Web page to get at such information. For example, a crawler would not be able to find out that there is a resource with captions and would probably not bother requesting the resource for its captions (or other text tracks). Eric further said: > It seems to me that because it will require > new specialized tools get the information, and because it will be really > difficult to do correctly (ten digit serial numbers?), people are likely to > just skip it completely. There is a need for addressing the track in a unique way, i.e. javascript needs to be able to tell the media framework exactly which track it is talking about (e.g. to turn it on or off). In Ogg, the serial numbers of each track are ten digit numbers. They can be easily obtained using ogg-info or oggz-info and can be easily exposed by players. There is currently no other way of uniquely identifying a specific track in Ogg. (On a side note: We are working with Xiph to require encoders to also give each track in an Ogg file a unique name so it can be addressed through this, but this is not currently the case.) For MPEG, I believe the tracks are numbered through, so it is easier to identify them (though also easier to make mistakes). Eric further stated: > We need to create a specification that makes it as > simple as possible for people to do the right thing. Mostly this information will be created by tools anyway (typically a CMS), such that it's not up to the user to do this. Also, there is no need for a user to do this - it's optional and the ordinary user will most likely not produce in-band captions and audio descriptions for their video files anyway. This is for power users of videos. But when we have a power user and they want to make use of all the functionality their media files offer and they have no way of exposing this in a standard way, we will create a lot of frustration and incompatible implementations when people try to implement this with javascript. Eric further wrote: > If we do allow this, what happens when the structure declared in the > markup differs from the structure of the media file? The same as what happens when other markup is wrong or points to something that doesn't exist: we 404 or deal with the error. HTML is well know for its ability to deal with errors well. Incidentally, we do need to develop the javascript API for exposing the video's tracks no matter whether we do it in declarative syntax or not. Here's a start at a proposal for this (obviously inspired by the markup): video.numberTracks(); -> return number of available tracks video.firstTrack(); -> returns first track ("first" to be defined - e.g. there is no inherent order in Ogg) video.lastTrack(); -> returns last track ("last" to be defined) track.next(); -> returns next track in list track has the following attributes: type, ref, lang, role, media (and the usual contenders, e.g. id, style) Philip said: > <source> is a void element, so this markup does not degrade nicely in any > shipped <video>-capable browsers. Try > <http://software.hixie.ch/utilities/js/live-dom-viewer/saved/318>. Firefox > puts the second <source> element inside nested <track> elements and Safari > just drops it. That is disappointing. This means we have to try and find a different way of marking it up. Maybe we can just throw a <tracks> element underneath each <source> element, as in this: <video> <source src="video.ogv" type="video/ogg"> <tracks> <track id='ogg_v' role='video' ref='serialno:1505760010'></track> <track id='ogg_a' role='audio' lang='en' ref='serialno:0821695999'></track> <track id='ogg_ad' role='auddesc' lang='en' ref='serialno:1421614520'></track> <track id='ogg_s' role='sign' lang='ase' ref='serialno:1413244634'></track> <track id='ogg_cc' role='caption' lang='en' ref='serialno:1421849818'></track> </tracks> <source src="video.mp4" type="video/mp4"> <tracks> <track id='mp4_v' role='video' ref='trackid:1'></track> <track id='mp4_a' role='audio' lang='en' ref='trackid:2'></track> </tracks> <overlay> <source src="en.srt" lang="en-US"> <source src="hans.srt" lang="zh-CN"> </overlay> </video> Is it guaranteed that the order is retained and therefore, can we guarantee the association of the tracks element to the previous source element? An alternative would be to have such resource composition stored in a separate file - a resource composition xml file (?) - on the server and to link to it in the <source> element (or the <video> element if there's only one). Then, it's not polluting the html markup and the UA doesn't have to parse a lengthy media file but rather only has to parse a separately retrieved xml file. For example: <video> <source src="video.ogg" type="video/ogg" rcf="video.ogg.rcf"> <source src="video.mp4" type="video/mp4" rcf="video.mpg.rcf"> <overlay> <source src="en.srt" lang="en-US"> <source src="hans.srt" lang="zh-CN"> </overlay> </video> Now, let's talk about the <overlay> element. I am not too fussed about renaming <itextlist> to <overlay>. I can see why you would go for this name - because most text will be rendered on top of or next to the video generally. It essentially provides a "div" into which the data can be rendered, rather than an abstract structure like my "itextlist". My intention was to keep the structure and the presentation separate from each other. But if it's general agreement that "overlay" is a better name, I'm happy to go with it. (Also, I'm happy to rename "itext" to "source", since that was already what I had started doing in http://blog.gingertech.net/2009/11/25/manifests-exposing-structure-of-a-composite-media-resource/ , where I've also renamed "category" to "role"). I'm assuming that in an example like this one below (no matter in which way the tracks are exposed), the caption track of the ogg file would be another track in the <source> element if the UA chose that video.ogv file over the video.mp4 file? <video> <source src="video.ogv" type="video/ogg"> <tracks> <track id='ogg_v' role='video' ref='serialno:1505760010'></track> <track id='ogg_a' role='audio' lang='en' ref='serialno:0821695999'></track> <track id='ogg_ad' role='auddesc' lang='en' ref='serialno:1421614520'></track> <track id='ogg_s' role='sign' lang='ase' ref='serialno:1413244634'></track> <track id='ogg_cc' role='caption' lang='en' ref='serialno:1421849818'></track> </tracks> <source src="video.mp4" type="video/mp4"> <tracks> <track id='mp4_v' role='video' ref='trackid:1'></track> <track id='mp4_a' role='audio' lang='en' ref='trackid:2'></track> </tracks> <overlay> <source src="en.srt" lang="en-US"> <source src="hans.srt" lang="zh-CN"> </overlay> </video> I.e. it would be parsed to something like: <video> <source src="video.ogv" type="video/ogg"> <overlay> <source src="en.srt" lang="en-US"> <source src="hans.srt" lang="zh-CN"> <source ref='serialno:1421849818' lang="en"> </overlay> </video> This makes it an additional caption track to display. Is this right? There are no alternative choices between tracks? I would actually suggest that if we want to go with <overlay>, we need to specify different overlays for different types of text. In this way we can accommodate textual audio descriptions, captions, subtitles etc. Then, I would suggest that for every type of text there should every only be one <source> displayed. It is not often that you want more than one subtitle track displayed. You most certainly never want to have more than one caption track displayed and never more than one textual audio description track. But you do want each one of them displayed in addition to the other. For example: <video src="video.ogg"> <overlay role="caption" style="font-size:2em;padding:1em;text-align:center; display: block;"> <source src="en-us.srt" lang="en-US"> <source src="en.srt" lang="en"> </overlay> <overlay role="tad" style="z-index: -100; display: block;" aria-live="assertive"> <source src="tad-en.srt" lang="en"> <source src="tad-de.srt" lang="de"> </overlay> <overlay role="subtitle" style="font-size:2em;padding:1em;text-align:center; display: block;"> <source src="de.srt" lang="de"> <source src="sv.srt" lang="sv"> <source src="fi.srt" lang="fi"> </overlay> </video> BTW: somewhere along the discussion between Philip and Maciej you lost me, so no comments on those. Cheers, Silvia.
Received on Wednesday, 27 January 2010 11:58:43 UTC