Re: [media] handling multitrack audio / video from Geoff Freed on 2010-12-02 (public-html-a11y@w3.org from December 2010)

From: Geoff Freed <geoff_freed@wgbh.org>
Date: Wed, 1 Dec 2010 21:18:09 -0500
To: "public-html-a11y@w3.org" <public-html-a11y@w3.org>
CC: "Frank.Olivier@microsoft.com" <Frank.Olivier@microsoft.com>, Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Message-ID: <C91C6C11.136D3%geoff_freed@wgbh.org>
Apologies for a somewhat late response to this thread.

I'd just like to drive in the point that it's going to be very important to provide support for, and easy UI access to, embedded audio-description tracks.  Pre-recorded, human-narrated tracks are the norm now for described video, and having a mechanism for supporting and accessing those additional audio tracks is necessary in order to support described broadcast video that is moved to the Web.  Pre-recorded descriptions aren't going to go away any time soon, even when TTS support is possible.

One other thing to note: the current practice is to supply a single audio track that contains both the program audio as well as the descriptions.  This full-mix track was necessary in analog broadcasts and is necessary now on the Web because most multimedia players do not support the playback of two tracks simultaneously (one being the the program audio and the other being only the descriptions).  Therefore, to support existing described broadcast video that is moved to the Web, we're going to need to provide the capability (or the option) of toggling between a full-mix track and an undescribed track, when both are available.  (This will be in addition to the new option of playing two tracks (program audio + descriptions) simultaneously.)

Geoff Freed
WGBH/NCAM


========


If that is indeed the solution we favor, we should move fast before
any implementations of <track> are made and <track> is confirmed as a
content-less element, seeing as we will need <source> elements inside
<track> then.

Also, a big problem with this approach is that we lose all the
functionality that is otherwise available to audio and video
resources, such as setting he volume, width, height, placement etc.
For example, did you have any thoughts on how to do the display of
sign language video in this scenario?

Cheers,
Silvia.

On Thu, Oct 28, 2010 at 11:37 AM, Frank Olivier
<Frank.Olivier@microsoft.com> wrote:
> Overall feedback from the Internet Explorer team:
>
> Option 1 - Overloading <track> and keeping the DOM parsing very simple would be the most elegant way to proceed, with every track having a 'kind' attribute - Generally, I expect to be able to activate a single alternate track of a particular type (Also, this would also sync up well with a simple javascript API - I would expect that I (as an author) would be able to enumerate a flat list of alternate representations, with metadata that indicates the kind.
>
> Additionally, it would be prudent to leave exact conformance (when enabling/disabling alternate tracks) as a 'quality of implementation' issue - For example, with multiple alternate audio tracks, there will certainly be some (mobile) devices where the user agent is only able to play back one audio or video track at a time due to hardware constraints; some devices may only have enough screen real estate to display a single caption track.
>
> Text captioning and alternate audio tracks seem to be core requirements for HTML5 at this point. The user requirements document does do a good job of enumerating all issues that users face, but it will take significant time to fully spec and prototype all features - I expect that user agents will be able to implement the two areas mentioned; the features beyond this certainly require a lot more discussion and speccing that is best addressed in future versions of the HTML spec.
>
> Thanks
> Frank Olivier
>
>
> -----Original Message-----
> From: public-html-a11y-request@w3.org [mailto:public-html-a11y-request@w3.org] On Behalf Of Silvia Pfeiffer
> Sent: Tuesday, October 19, 2010 10:52 AM
> To: HTML Accessibility Task Force
> Subject: [media] handling multitrack audio / video
>
> Hi all,
>
> This is to start a technical discussion on how to solve the multitrack audio / video requirements in HTML5.
>
> We've got the following related bug and I want to make a start on discussing the advantages / disadvantages of different approaches:
> http://www.w3.org/Bugs/Public/show_bug.cgi?id=9452
>
> Ian's comment on this was this - and I agree that his conclusion should be a general goal in the technical solution that we eventually
> propose:
>> The ability to control multiple internal media tracks (sign language
>> video overlays, alternate angles, dubbed audio, etc) seems like
>> something we'd want to do in a way consistent with handling of
>> multiple external tracks, much like how internal subtitle tracks and
>> external subtitle tracks should use the same mechanism so that they
>> can be enabled and disabled and generally manipulated in a consistent way.
>
> I can think of the following different mark-up approaches towards solving this issue:
>
>
> 1. Overload <track>
>
> For example synchronizing external audio description and sign language video with main video:
> <video id="v1" poster="video.png" controls>
>  <source src="video.ogv" type="video/ogg">
>  <source src="video.mp4" type="video/mp4">
>  <track kind="subtitles" srclang="fr" src="sub_fr.wsrt">
>  <track kind="subtitles" srclang="ru" src="sub_ru.wsrt">
>  <track kind="chapters" srclang="en" src="chapters.wsrt">
>  <track src="audesc.ogg" kind="descriptions" type="audio/ogg"
> srclang="en" label="English Audio Description">
>  <track src="signlang.ogv" kind="signings" type="video/ogg"
> srclang="asl" label="American Sign Language"> </video>
>
> This adds a @type attribute to the <track> element, allowing it to also be used with audio and video and not just text tracks.
>
> There are a number of problems with such an approach:
>
> * How do we reference alternative encodings?
>   It would probably require the introduction of <source> elements inside <track>, making <track> more complex for selecting currentSrc etc. Also, if we needed different encodings for different devices, a @media attribute will be necessary.
>
> * How do we control synchronization issues?
>   The main resource would probably always be the one whose timeline dominates and for the others we do a best effort to keep in sync with that one. So, what happens if a user wants to not miss anything from one of the auxiliary tracks, e.g. wants the sign language track to be the time keeper? That's not possible with this approach.
>
> * How do we design the JavaScript API?
>  There are no cues, so TimedTrack cues and activeCues would  be empty elements and the cuechange would not ever be activated. The audio and video tracks will be in the same TimedTrack list as the text ones and possibly creating confusion for example in a accessibility menu for track selection, in particular where the track @kind goes beyond mere accessibility such as alternate viewing angles or director's comment.
>
> * What about other a/v related features, such as width/height and placement of the sign language video or volume of the audio description?
>  Having control over such extra features would be rather difficult to specify, since the data is only regarded as an abstract alternative content to the main video. The rendering algorithm would become a lot more complex and attributes from audio and video elements may be necessary to introduce onto the <track> element, too. It seems that would lead to quite some duplication of functionality between different elements.
>
>
> 2. Introduce <audiotrack> and <videotrack>
>
> Instead of overloading <track>, one could consider creating new track elements for audio and video, such as <audiotrack> and <videotrack>.
>
> This allows keeping different attributes on these elements and having audio / video / text track lists separate in JavaScript.
>
> Also, it allows for <source> elements inside <track> more easily, e.g.:
> <video id="v1" poster="video.png" controls>
>  <source src="video.ogv" type="video/ogg">
>  <source src="video.mp4" type="video/mp4">
>  <track kind="subtitles" srclang="fr" src="sub_fr.wsrt">
>  <track kind="subtitles" srclang="ru" src="sub_ru.wsrt">
>  <track kind="chapters" srclang="en" src="chapters.wsrt">
>  <audiotrack kind="descriptions" srclang="en">
>    <source src="description.ogg" type="audio/ogg">
>    <source src="description.mp3" type="audio/mp3">
>  </audiotrack>
> </video>
> But fundamentally we have the same issues as with approach 1, in particular a replication need of some of the audio / video functionality from the <audio> and <video> elements.
>
>
> 3. Introduce a <par>-like element
>
> The fundamental challenge that we are facing is to find a way to synchronise multiple audio-visual media resources, be that from in-band where the overall timeline is clear or be that with separate external resources where the overall timeline has to be defined. Then we are suddenly not talking any more about a master resource and auxiliary resources, but audio-visual resources that are equals. This is more along the SMIL way of thinking, which is why I called this section the "<par>-like element".
>
> An example markup for synchronizing external audio description and sign language video with a main video could then be something like:
> <par>
>  <video id="v1" poster="video.png" controls>
>    <source src="video.ogv" type="video/ogg">
>    <source src="video.mp4" type="video/mp4">
>    <track kind="subtitles" srclang="fr" src="sub_fr.wsrt">
>    <track kind="subtitles" srclang="ru" src="sub_ru.wsrt">
>    <track kind="chapters" srclang="en" src="chapters.wsrt">
>  </video>
>  <audio controls>
>    <source src="audesc.ogg" type="audio/ogg">
>    <source src="audesc.mp3" type="audio/mp3">
>  </audio>
>  <video controls>
>    <source src="signing.ogv" type="video/ogg">
>    <source src="signing.mp4" type="video/mp4">
>  </video>
> </par>
>
> This synchronisation element could of course be called something else:
> <mastertime>, <coordinator>, <sync>, <timeline>, <container>, <timemaster> etc.
>
> The synchronisation element needs to provide the main timeline. It would make sure that the elements play and seek in parallel.
>
> Audio and video elements can then be styled individually as their own CSS block elements and deactivated with "display: none".
>
> The sync element could have an attribute to decide whether to have drop-outs in elements if the main timeline progresses, but some contained elements starved, or whether to go into overall buffering mode if one of the elements goes into buffering mode. It could also define one as the main element whose timeline should not be ignored and the others as slaves for which buffering situations would be ignored. Something like @synchronize=[block/ignore] and @master="v1"
> attributes.
>
> Also, a decision would need to be made about what to do with @controls. Should there be a controls display on the first/master element if any of them has a @controls attribute? Should the slave elements not have controls displayed?
>
>
> 4. Nest media elements
>
> An alternative means of re-using <audio> and <video> elements for synchronisation is to put the "slave" elements inside the "master"
> element like so:
>
> <video id="v1" poster="video.png" controls>
>  <source src="video.ogv" type="video/ogg">
>  <source src="video.mp4" type="video/mp4">
>  <track kind="subtitles" srclang="fr" src="sub_fr.wsrt">
>  <track kind="subtitles" srclang="ru" src="sub_ru.wsrt">
>  <track kind="chapters" srclang="en" src="chapters.wsrt">
>  <par>
>    <audio controls>
>      <source src="audesc.ogg" type="audio/ogg">
>      <source src="audesc.mp3" type="audio/mp3">
>    </audio>
>    <video controls>
>      <source src="signing.ogv" type="video/ogg">
>      <source src="signing.mp4" type="video/mp4">
>    </video>
>  </par>
> </video>
>
> This makes clear whose timeline the element is following. But it sure looks recursive and we would have to define that elements inside a <par> cannot have another <par> inside them to stop that.
>
> ===
>
> These are some of the thoughts I had on this topic. I am not yet decided on which of the above proposals - or an alternative proposal - makes the most sense. I have a gut feeling that it is probably useful to be able to define both, a dominant container for synchronization and one where all containers are valued the same. So, maybe the third approach would be the most flexible, but it certainly needs a bit more thinking.
>
> Cheers,
> Silvia.
>
>
Received on Thursday, 2 December 2010 02:21:32 UTC