- From: Philip Jägenstedt <philipj@opera.com>
- Date: Thu, 28 Oct 2010 09:12:29 +0200
- To: public-html-a11y@w3.org
On Tue, 19 Oct 2010 19:51:31 +0200, Silvia Pfeiffer <silviapfeiffer1@gmail.com> wrote: > Hi all, > > This is to start a technical discussion on how to solve the multitrack > audio / video requirements in HTML5. > > We've got the following related bug and I want to make a start on > discussing the advantages / disadvantages of different approaches: > http://www.w3.org/Bugs/Public/show_bug.cgi?id=9452 > > Ian's comment on this was this - and I agree that his conclusion > should be a general goal in the technical solution that we eventually > propose: >> The ability to control multiple internal media tracks (sign language >> video >> overlays, alternate angles, dubbed audio, etc) seems like something >> we'd want >> to do in a way consistent with handling of multiple external tracks, >> much like >> how internal subtitle tracks and external subtitle tracks should use >> the same >> mechanism so that they can be enabled and disabled and generally >> manipulated in >> a consistent way. > > I can think of the following different mark-up approaches towards > solving this issue: > > > 1. Overload <track> > > For example synchronizing external audio description and sign language > video with main video: > <video id="v1" poster=“video.png” controls> > <source src=“video.ogv” type=”video/ogg”> > <source src=“video.mp4” type=”video/mp4”> > <track kind=”subtitles” srclang=”fr” src=”sub_fr.wsrt”> > <track kind=”subtitles” srclang=”ru” src=”sub_ru.wsrt”> > <track kind=”chapters” srclang=”en” src=”chapters.wsrt”> > <track src="audesc.ogg" kind="descriptions" type="audio/ogg" > srclang="en" label="English Audio Description"> > <track src="signlang.ogv" kind="signings" type="video/ogg" > srclang="asl" label="American Sign Language"> > </video> > > This adds a @type attribute to the <track> element, allowing it to > also be used with audio and video and not just text tracks. > > There are a number of problems with such an approach: > > * How do we reference alternative encodings? > It would probably require the introduction of <source> elements > inside <track>, making <track> more complex for selecting currentSrc > etc. Also, if we needed different encodings for different devices, a > @media attribute will be necessary. > > * How do we control synchronization issues? > The main resource would probably always be the one whose timeline > dominates and for the others we do a best effort to keep in sync with > that one. So, what happens if a user wants to not miss anything from > one of the auxiliary tracks, e.g. wants the sign language track to be > the time keeper? That's not possible with this approach. > > * How do we design the JavaScript API? > There are no cues, so TimedTrack cues and activeCues would be empty > elements and the cuechange would not ever be activated. The audio and > video tracks will be in the same TimedTrack list as the text ones and > possibly creating confusion for example in a accessibility menu for > track selection, in particular where the track @kind goes beyond mere > accessibility such as alternate viewing angles or director's comment. > > * What about other a/v related features, such as width/height and > placement of the sign language video or volume of the audio > description? > Having control over such extra features would be rather difficult to > specify, since the data is only regarded as an abstract alternative > content to the main video. The rendering algorithm would become a lot > more complex and attributes from audio and video elements may be > necessary to introduce onto the <track> element, too. It seems that > would lead to quite some duplication of functionality between > different elements. > > > 2. Introduce <audiotrack> and <videotrack> > > Instead of overloading <track>, one could consider creating new track > elements for audio and video, such as <audiotrack> and <videotrack>. > > This allows keeping different attributes on these elements and having > audio / video / text track lists separate in JavaScript. > > Also, it allows for <source> elements inside <track> more easily, e.g.: > <video id="v1" poster=“video.png” controls> > <source src=“video.ogv” type=”video/ogg”> > <source src=“video.mp4” type=”video/mp4”> > <track kind=”subtitles” srclang=”fr” src=”sub_fr.wsrt”> > <track kind=”subtitles” srclang=”ru” src=”sub_ru.wsrt”> > <track kind=”chapters” srclang=”en” src=”chapters.wsrt”> > <audiotrack kind=”descriptions” srclang=”en”> > <source src=”description.ogg” type=”audio/ogg”> > <source src=”description.mp3” type=”audio/mp3”> > </audiotrack> > </video> > But fundamentally we have the same issues as with approach 1, in > particular a replication need of some of the audio / video > functionality from the <audio> and <video> elements. > > > 3. Introduce a <par>-like element > > The fundamental challenge that we are facing is to find a way to > synchronise multiple audio-visual media resources, be that from > in-band where the overall timeline is clear or be that with separate > external resources where the overall timeline has to be defined. Then > we are suddenly not talking any more about a master resource and > auxiliary resources, but audio-visual resources that are equals. This > is more along the SMIL way of thinking, which is why I called this > section the "<par>-like element". > > An example markup for synchronizing external audio description and > sign language video with a main video could then be something like: > <par> > <video id="v1" poster=“video.png” controls> > <source src=“video.ogv” type=”video/ogg”> > <source src=“video.mp4” type=”video/mp4”> > <track kind=”subtitles” srclang=”fr” src=”sub_fr.wsrt”> > <track kind=”subtitles” srclang=”ru” src=”sub_ru.wsrt”> > <track kind=”chapters” srclang=”en” src=”chapters.wsrt”> > </video> > <audio controls> > <source src="audesc.ogg" type="audio/ogg"> > <source src="audesc.mp3" type="audio/mp3"> > </audio> > <video controls> > <source src="signing.ogv" type="video/ogg"> > <source src="signing.mp4" type="video/mp4"> > </video> > </par> > > This synchronisation element could of course be called something else: > <mastertime>, <coordinator>, <sync>, <timeline>, <container>, > <timemaster> etc. > > The synchronisation element needs to provide the main timeline. It > would make sure that the elements play and seek in parallel. > > Audio and video elements can then be styled individually as their own > CSS block elements and deactivated with "display: none". > > The sync element could have an attribute to decide whether to have > drop-outs in elements if the main timeline progresses, but some > contained elements starved, or whether to go into overall buffering > mode if one of the elements goes into buffering mode. It could also > define one as the main element whose timeline should not be ignored > and the others as slaves for which buffering situations would be > ignored. Something like @synchronize=[block/ignore] and @master="v1" > attributes. > > Also, a decision would need to be made about what to do with > @controls. Should there be a controls display on the first/master > element if any of them has a @controls attribute? Should the slave > elements not have controls displayed? > > > 4. Nest media elements > > An alternative means of re-using <audio> and <video> elements for > synchronisation is to put the "slave" elements inside the "master" > element like so: > > <video id="v1" poster=“video.png” controls> > <source src=“video.ogv” type=”video/ogg”> > <source src=“video.mp4” type=”video/mp4”> > <track kind=”subtitles” srclang=”fr” src=”sub_fr.wsrt”> > <track kind=”subtitles” srclang=”ru” src=”sub_ru.wsrt”> > <track kind=”chapters” srclang=”en” src=”chapters.wsrt”> > <par> > <audio controls> > <source src="audesc.ogg" type="audio/ogg"> > <source src="audesc.mp3" type="audio/mp3"> > </audio> > <video controls> > <source src="signing.ogv" type="video/ogg"> > <source src="signing.mp4" type="video/mp4"> > </video> > </par> > </video> > > This makes clear whose timeline the element is following. But it sure > looks recursive and we would have to define that elements inside a > <par> cannot have another <par> inside them to stop that. > > === > > These are some of the thoughts I had on this topic. I am not yet > decided on which of the above proposals - or an alternative proposal - > makes the most sense. I have a gut feeling that it is probably useful > to be able to define both, a dominant container for synchronization > and one where all containers are valued the same. So, maybe the third > approach would be the most flexible, but it certainly needs a bit more > thinking. > > Cheers, > Silvia. > I think that if we want to synchronize several video tracks with non-trivial styling, then the only sensible option is to have multiple <video> elements which are linked together by some attribute. Otherwise we'd be limited to displaying one video over the other, or similar. A benefit of this approach is that it's easy to fake to within 100s of milliseconds in existing browsers, while <audiotrack> or nested <video>s would require more elaborate tricks to emulate (much like <track>). I can see the requirements on what to synchronize having a rather serious impact on the complexity. Mainly, these are the options: 1. Only synchronize tracks at their starting points, typically for extra audio tracks. This is very much like <track>. 2. Synchronize tracks at arbitrary offsets, including synchronizing the end of one track to the start of another. This is rather more SMIL-like. For option 1, something like this would do: <video id="bla"></video> <video sync="bla"></video> For option 2, things would be rather more complicated and I'm not going to make suggestions unless it's clear that we need it. -- Philip Jägenstedt Core Developer Opera Software
Received on Thursday, 28 October 2010 07:13:19 UTC