- From: Philip Jägenstedt <philipj@opera.com>
- Date: Thu, 28 Oct 2010 13:05:57 +0200
- To: public-html-a11y@w3.org
On Thu, 28 Oct 2010 12:06:19 +0200, Silvia Pfeiffer <silviapfeiffer1@gmail.com> wrote: > On Thu, Oct 28, 2010 at 6:12 PM, Philip Jägenstedt <philipj@opera.com> > wrote: >> On Tue, 19 Oct 2010 19:51:31 +0200, Silvia Pfeiffer >> <silviapfeiffer1@gmail.com> wrote: >> >>> Hi all, >>> >>> This is to start a technical discussion on how to solve the multitrack >>> audio / video requirements in HTML5. >>> >>> We've got the following related bug and I want to make a start on >>> discussing the advantages / disadvantages of different approaches: >>> http://www.w3.org/Bugs/Public/show_bug.cgi?id=9452 >>> >>> Ian's comment on this was this - and I agree that his conclusion >>> should be a general goal in the technical solution that we eventually >>> propose: >>>> >>>> The ability to control multiple internal media tracks (sign language >>>> video >>>> overlays, alternate angles, dubbed audio, etc) seems like something >>>> we'd >>>> want >>>> to do in a way consistent with handling of multiple external tracks, >>>> much >>>> like >>>> how internal subtitle tracks and external subtitle tracks should use >>>> the >>>> same >>>> mechanism so that they can be enabled and disabled and generally >>>> manipulated in >>>> a consistent way. >>> >>> I can think of the following different mark-up approaches towards >>> solving this issue: >>> >>> >>> 1. Overload <track> >>> >>> For example synchronizing external audio description and sign language >>> video with main video: >>> <video id="v1" poster=“video.png” controls> >>> <source src=“video.ogv” type=”video/ogg”> >>> <source src=“video.mp4” type=”video/mp4”> >>> <track kind=”subtitles” srclang=”fr” src=”sub_fr.wsrt”> >>> <track kind=”subtitles” srclang=”ru” src=”sub_ru.wsrt”> >>> <track kind=”chapters” srclang=”en” src=”chapters.wsrt”> >>> <track src="audesc.ogg" kind="descriptions" type="audio/ogg" >>> srclang="en" label="English Audio Description"> >>> <track src="signlang.ogv" kind="signings" type="video/ogg" >>> srclang="asl" label="American Sign Language"> >>> </video> >>> >>> This adds a @type attribute to the <track> element, allowing it to >>> also be used with audio and video and not just text tracks. >>> >>> There are a number of problems with such an approach: >>> >>> * How do we reference alternative encodings? >>> It would probably require the introduction of <source> elements >>> inside <track>, making <track> more complex for selecting currentSrc >>> etc. Also, if we needed different encodings for different devices, a >>> @media attribute will be necessary. >>> >>> * How do we control synchronization issues? >>> The main resource would probably always be the one whose timeline >>> dominates and for the others we do a best effort to keep in sync with >>> that one. So, what happens if a user wants to not miss anything from >>> one of the auxiliary tracks, e.g. wants the sign language track to be >>> the time keeper? That's not possible with this approach. >>> >>> * How do we design the JavaScript API? >>> There are no cues, so TimedTrack cues and activeCues would be empty >>> elements and the cuechange would not ever be activated. The audio and >>> video tracks will be in the same TimedTrack list as the text ones and >>> possibly creating confusion for example in a accessibility menu for >>> track selection, in particular where the track @kind goes beyond mere >>> accessibility such as alternate viewing angles or director's comment. >>> >>> * What about other a/v related features, such as width/height and >>> placement of the sign language video or volume of the audio >>> description? >>> Having control over such extra features would be rather difficult to >>> specify, since the data is only regarded as an abstract alternative >>> content to the main video. The rendering algorithm would become a lot >>> more complex and attributes from audio and video elements may be >>> necessary to introduce onto the <track> element, too. It seems that >>> would lead to quite some duplication of functionality between >>> different elements. >>> >>> >>> 2. Introduce <audiotrack> and <videotrack> >>> >>> Instead of overloading <track>, one could consider creating new track >>> elements for audio and video, such as <audiotrack> and <videotrack>. >>> >>> This allows keeping different attributes on these elements and having >>> audio / video / text track lists separate in JavaScript. >>> >>> Also, it allows for <source> elements inside <track> more easily, e.g.: >>> <video id="v1" poster=“video.png” controls> >>> <source src=“video.ogv” type=”video/ogg”> >>> <source src=“video.mp4” type=”video/mp4”> >>> <track kind=”subtitles” srclang=”fr” src=”sub_fr.wsrt”> >>> <track kind=”subtitles” srclang=”ru” src=”sub_ru.wsrt”> >>> <track kind=”chapters” srclang=”en” src=”chapters.wsrt”> >>> <audiotrack kind=”descriptions” srclang=”en”> >>> <source src=”description.ogg” type=”audio/ogg”> >>> <source src=”description.mp3” type=”audio/mp3”> >>> </audiotrack> >>> </video> >>> But fundamentally we have the same issues as with approach 1, in >>> particular a replication need of some of the audio / video >>> functionality from the <audio> and <video> elements. >>> >>> >>> 3. Introduce a <par>-like element >>> >>> The fundamental challenge that we are facing is to find a way to >>> synchronise multiple audio-visual media resources, be that from >>> in-band where the overall timeline is clear or be that with separate >>> external resources where the overall timeline has to be defined. Then >>> we are suddenly not talking any more about a master resource and >>> auxiliary resources, but audio-visual resources that are equals. This >>> is more along the SMIL way of thinking, which is why I called this >>> section the "<par>-like element". >>> >>> An example markup for synchronizing external audio description and >>> sign language video with a main video could then be something like: >>> <par> >>> <video id="v1" poster=“video.png” controls> >>> <source src=“video.ogv” type=”video/ogg”> >>> <source src=“video.mp4” type=”video/mp4”> >>> <track kind=”subtitles” srclang=”fr” src=”sub_fr.wsrt”> >>> <track kind=”subtitles” srclang=”ru” src=”sub_ru.wsrt”> >>> <track kind=”chapters” srclang=”en” src=”chapters.wsrt”> >>> </video> >>> <audio controls> >>> <source src="audesc.ogg" type="audio/ogg"> >>> <source src="audesc.mp3" type="audio/mp3"> >>> </audio> >>> <video controls> >>> <source src="signing.ogv" type="video/ogg"> >>> <source src="signing.mp4" type="video/mp4"> >>> </video> >>> </par> >>> >>> This synchronisation element could of course be called something else: >>> <mastertime>, <coordinator>, <sync>, <timeline>, <container>, >>> <timemaster> >>> etc. >>> >>> The synchronisation element needs to provide the main timeline. It >>> would make sure that the elements play and seek in parallel. >>> >>> Audio and video elements can then be styled individually as their own >>> CSS block elements and deactivated with "display: none". >>> >>> The sync element could have an attribute to decide whether to have >>> drop-outs in elements if the main timeline progresses, but some >>> contained elements starved, or whether to go into overall buffering >>> mode if one of the elements goes into buffering mode. It could also >>> define one as the main element whose timeline should not be ignored >>> and the others as slaves for which buffering situations would be >>> ignored. Something like @synchronize=[block/ignore] and @master="v1" >>> attributes. >>> >>> Also, a decision would need to be made about what to do with >>> @controls. Should there be a controls display on the first/master >>> element if any of them has a @controls attribute? Should the slave >>> elements not have controls displayed? >>> >>> >>> 4. Nest media elements >>> >>> An alternative means of re-using <audio> and <video> elements for >>> synchronisation is to put the "slave" elements inside the "master" >>> element like so: >>> >>> <video id="v1" poster=“video.png” controls> >>> <source src=“video.ogv” type=”video/ogg”> >>> <source src=“video.mp4” type=”video/mp4”> >>> <track kind=”subtitles” srclang=”fr” src=”sub_fr.wsrt”> >>> <track kind=”subtitles” srclang=”ru” src=”sub_ru.wsrt”> >>> <track kind=”chapters” srclang=”en” src=”chapters.wsrt”> >>> <par> >>> <audio controls> >>> <source src="audesc.ogg" type="audio/ogg"> >>> <source src="audesc.mp3" type="audio/mp3"> >>> </audio> >>> <video controls> >>> <source src="signing.ogv" type="video/ogg"> >>> <source src="signing.mp4" type="video/mp4"> >>> </video> >>> </par> >>> </video> >>> >>> This makes clear whose timeline the element is following. But it sure >>> looks recursive and we would have to define that elements inside a >>> <par> cannot have another <par> inside them to stop that. >>> >>> === >>> >>> These are some of the thoughts I had on this topic. I am not yet >>> decided on which of the above proposals - or an alternative proposal - >>> makes the most sense. I have a gut feeling that it is probably useful >>> to be able to define both, a dominant container for synchronization >>> and one where all containers are valued the same. So, maybe the third >>> approach would be the most flexible, but it certainly needs a bit more >>> thinking. >>> >>> Cheers, >>> Silvia. >>> >> >> I think that if we want to synchronize several video tracks with >> non-trivial >> styling, then the only sensible option is to have multiple <video> >> elements >> which are linked together by some attribute. Otherwise we'd be limited >> to >> displaying one video over the other, or similar. A benefit of this >> approach >> is that it's easy to fake to within 100s of milliseconds in existing >> browsers, while <audiotrack> or nested <video>s would require more >> elaborate >> tricks to emulate (much like <track>). >> >> I can see the requirements on what to synchronize having a rather >> serious >> impact on the complexity. Mainly, these are the options: >> >> 1. Only synchronize tracks at their starting points, typically for extra >> audio tracks. This is very much like <track>. >> >> 2. Synchronize tracks at arbitrary offsets, including synchronizing the >> end >> of one track to the start of another. This is rather more SMIL-like. >> >> For option 1, something like this would do: >> >> <video id="bla"></video> >> <video sync="bla"></video> >> >> For option 2, things would be rather more complicated and I'm not going >> to >> make suggestions unless it's clear that we need it. > > How would you suggest we solve the issue of audio descriptions that > are provided as separate audio files from the main video resource? > They have to be synchronized to the main video resource since the > spoken description is timed to be given at the exact times where the > main audio resource has pauses. I have previously made an experiment > at http://www.annodex.net/~silvia/itext/elephant_separate_audesc_dub.html > to demonstrate some of the involved issues, in particular the > synchronization problem. It can be solved with frequent re-sync-ing. Assuming that the external audio track has the same timeline as the main video file, then something like this: <video id="v" src="video.webm"></video> <audio sync="v" src="description.webm"></audio> This wouldn't be particularly hard to implement, I think, most of the complexity seems to be in what the state of the video should be when the audio has stopped for buffering, and vice versa. > Is this not an acceptable use case for you? Perhaps I was unclear, if we supporting syncing separate <audio>/<video> elements, then the above is actually the very least we could do. It's beyond this most basic case I'd like to understand the actual use cases. To clarify, option 2 would allow things like this, borrowing SMIL syntax as seen in SVG: <video id="v" src="video.webm"></video> <video begin="v.begin+10s" src="video2.webm"></video> <!-- video and video2 should be synchronized with a 10s offset --> or <video id="v" src="video.webm"></video> <video begin="v.end" src="video2.webm"></video> <!-- video and video2 should play gapless back-to-back --> Are there compelling reasons to complicate things to this extent? The last example could be abused to achieve gapless playback between chunks in a HTTP live streaming setup, but I'm not a fan of the solution myself. -- Philip Jägenstedt Core Developer Opera Software
Received on Thursday, 28 October 2010 11:06:34 UTC