Re: [media] handling multitrack audio / video

On Wed, Oct 20, 2010 at 4:51 AM, Silvia Pfeiffer
<silviapfeiffer1@gmail.com> wrote:
> Hi all,
>
> This is to start a technical discussion on how to solve the multitrack
> audio / video requirements in HTML5.
>
> We've got the following related bug and I want to make a start on
> discussing the advantages / disadvantages of different approaches:
> http://www.w3.org/Bugs/Public/show_bug.cgi?id=9452
>
> Ian's comment on this was this - and I agree that his conclusion
> should be a general goal in the technical solution that we eventually
> propose:
>> The ability to control multiple internal media tracks (sign language video
>> overlays, alternate angles, dubbed audio, etc) seems like something we'd want
>> to do in a way consistent with handling of multiple external tracks, much like
>> how internal subtitle tracks and external subtitle tracks should use the same
>> mechanism so that they can be enabled and disabled and generally manipulated in
>> a consistent way.
>
> I can think of the following different mark-up approaches towards
> solving this issue:
>
>
> 1. Overload <track>
>
> For example synchronizing external audio description and sign language
> video with main video:
> <video id="v1" poster=“video.png” controls>
>  <source src=“video.ogv” type=”video/ogg”>
>  <source src=“video.mp4” type=”video/mp4”>
>  <track kind=”subtitles” srclang=”fr” src=”sub_fr.wsrt”>
>  <track kind=”subtitles” srclang=”ru” src=”sub_ru.wsrt”>
>  <track kind=”chapters” srclang=”en” src=”chapters.wsrt”>
>  <track src="audesc.ogg" kind="descriptions" type="audio/ogg"
> srclang="en" label="English Audio Description">
>  <track src="signlang.ogv" kind="signings" type="video/ogg"
> srclang="asl" label="American Sign Language">
> </video>
>
> This adds a @type attribute to the <track> element, allowing it to
> also be used with audio and video and not just text tracks.
>
> There are a number of problems with such an approach:
>
> * How do we reference alternative encodings?
>   It would probably require the introduction of <source> elements
> inside <track>, making <track> more complex for selecting currentSrc
> etc. Also, if we needed different encodings for different devices, a
> @media attribute will be necessary.
>
> * How do we control synchronization issues?
>   The main resource would probably always be the one whose timeline
> dominates and for the others we do a best effort to keep in sync with
> that one. So, what happens if a user wants to not miss anything from
> one of the auxiliary tracks, e.g. wants the sign language track to be
> the time keeper? That's not possible with this approach.
>
> * How do we design the JavaScript API?
>  There are no cues, so TimedTrack cues and activeCues would  be empty
> elements and the cuechange would not ever be activated. The audio and
> video tracks will be in the same TimedTrack list as the text ones and
> possibly creating confusion for example in a accessibility menu for
> track selection, in particular where the track @kind goes beyond mere
> accessibility such as alternate viewing angles or director's comment.
>
> * What about other a/v related features, such as width/height and
> placement of the sign language video or volume of the audio
> description?
>  Having control over such extra features would be rather difficult to
> specify, since the data is only regarded as an abstract alternative
> content to the main video. The rendering algorithm would become a lot
> more complex and attributes from audio and video elements may be
> necessary to introduce onto the <track> element, too. It seems that
> would lead to quite some duplication of functionality between
> different elements.
>
>
> 2. Introduce <audiotrack> and <videotrack>
>
> Instead of overloading <track>, one could consider creating new track
> elements for audio and video, such as <audiotrack> and <videotrack>.
>
> This allows keeping different attributes on these elements and having
> audio / video / text track lists separate in JavaScript.
>
> Also, it allows for <source> elements inside <track> more easily, e.g.:
> <video id="v1" poster=“video.png” controls>
>  <source src=“video.ogv” type=”video/ogg”>
>  <source src=“video.mp4” type=”video/mp4”>
>  <track kind=”subtitles” srclang=”fr” src=”sub_fr.wsrt”>
>  <track kind=”subtitles” srclang=”ru” src=”sub_ru.wsrt”>
>  <track kind=”chapters” srclang=”en” src=”chapters.wsrt”>
>  <audiotrack kind=”descriptions” srclang=”en”>
>    <source src=”description.ogg” type=”audio/ogg”>
>    <source src=”description.mp3” type=”audio/mp3”>
>  </audiotrack>
> </video>
> But fundamentally we have the same issues as with approach 1, in
> particular a replication need of some of the audio / video
> functionality from the <audio> and <video> elements.
>
>
> 3. Introduce a <par>-like element
>
> The fundamental challenge that we are facing is to find a way to
> synchronise multiple audio-visual media resources, be that from
> in-band where the overall timeline is clear or be that with separate
> external resources where the overall timeline has to be defined. Then
> we are suddenly not talking any more about a master resource and
> auxiliary resources, but audio-visual resources that are equals. This
> is more along the SMIL way of thinking, which is why I called this
> section the "<par>-like element".
>
> An example markup for synchronizing external audio description and
> sign language video with a main video could then be something like:
> <par>
>  <video id="v1" poster=“video.png” controls>
>    <source src=“video.ogv” type=”video/ogg”>
>    <source src=“video.mp4” type=”video/mp4”>
>    <track kind=”subtitles” srclang=”fr” src=”sub_fr.wsrt”>
>    <track kind=”subtitles” srclang=”ru” src=”sub_ru.wsrt”>
>    <track kind=”chapters” srclang=”en” src=”chapters.wsrt”>
>  </video>
>  <audio controls>
>    <source src="audesc.ogg" type="audio/ogg">
>    <source src="audesc.mp3" type="audio/mp3">
>  </audio>
>  <video controls>
>    <source src="signing.ogv" type="video/ogg">
>    <source src="signing.mp4" type="video/mp4">
>  </video>
> </par>
>
> This synchronisation element could of course be called something else:
> <mastertime>, <coordinator>, <sync>, <timeline>, <container>, <timemaster> etc.
>
> The synchronisation element needs to provide the main timeline. It
> would make sure that the elements play and seek in parallel.
>
> Audio and video elements can then be styled individually as their own
> CSS block elements and deactivated with "display: none".
>
> The sync element could have an attribute to decide whether to have
> drop-outs in elements if the main timeline progresses, but some
> contained elements starved, or whether to go into overall buffering
> mode if one of the elements goes into buffering mode. It could also
> define one as the main element whose timeline should not be ignored
> and the others as slaves for which buffering situations would be
> ignored. Something like @synchronize=[block/ignore] and @master="v1"
> attributes.
>
> Also, a decision would need to be made about what to do with
> @controls. Should there be a controls display on the first/master
> element if any of them has a @controls attribute? Should the slave
> elements not have controls displayed?
>
>
> 4. Nest media elements
>
> An alternative means of re-using <audio> and <video> elements for
> synchronisation is to put the "slave" elements inside the "master"
> element like so:
>
> <video id="v1" poster=“video.png” controls>
>  <source src=“video.ogv” type=”video/ogg”>
>  <source src=“video.mp4” type=”video/mp4”>
>  <track kind=”subtitles” srclang=”fr” src=”sub_fr.wsrt”>
>  <track kind=”subtitles” srclang=”ru” src=”sub_ru.wsrt”>
>  <track kind=”chapters” srclang=”en” src=”chapters.wsrt”>
>  <par>
>    <audio controls>
>      <source src="audesc.ogg" type="audio/ogg">
>      <source src="audesc.mp3" type="audio/mp3">
>    </audio>
>    <video controls>
>      <source src="signing.ogv" type="video/ogg">
>      <source src="signing.mp4" type="video/mp4">
>    </video>
>  </par>
> </video>
>
> This makes clear whose timeline the element is following. But it sure
> looks recursive and we would have to define that elements inside a
> <par> cannot have another <par> inside them to stop that.
>
> ===
>
> These are some of the thoughts I had on this topic. I am not yet
> decided on which of the above proposals - or an alternative proposal -
> makes the most sense. I have a gut feeling that it is probably useful
> to be able to define both, a dominant container for synchronization
> and one where all containers are valued the same. So, maybe the third
> approach would be the most flexible, but it certainly needs a bit more
> thinking.
>
> Cheers,
> Silvia.
>

Let me add to this list of markup alternatives (plus the one Daniel
proposed) that we will require a uniform JavaScript API across all of
these.

For now I am thinking of this:

* an extension to the main media element to have a list of media tracks:
interface HTMLMediaElement : HTMLElement {
  [..]
  // media tracks
  readonly attribute MediaTrack[] mtracks;
}

* a new MediaTrack interface to identify its relationship to the main
media element:
interface MediaTrack : HTMLMediaElement {
  readonly attribute DOMString kind;
  readonly attribute DOMString label;
  readonly attribute DOMString language;

  const unsigned short OFF = 0;
  const unsigned short HIDDEN = 1;
  const unsigned short SHOWING = 2;
           attribute unsigned short mode;
};

Cheers,
Silvia.

Received on Wednesday, 27 October 2010 23:50:41 UTC