- From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
- Date: Thu, 2 Dec 2010 14:18:17 +1100
- To: Geoff Freed <geoff_freed@wgbh.org>
- Cc: "public-html-a11y@w3.org" <public-html-a11y@w3.org>, "Frank.Olivier@microsoft.com" <Frank.Olivier@microsoft.com>
Hi Geoff, in which way are "open" (full-mix) audio descriptions like you describe them currently typically delivered? 1. as an extra track on the resource, i.e. the main resource has three tracks: video, audio, full-mix audio 2. as a separate audio resource, i.e. there are two resources, one of video & audio, and a separate full-mix audio 3. as a separate video resource, i.e. there are two resources, one of video & audio, and a separate video & full-mix audio I am asking, because the technical implications of these are rather different. Cheers, Silvia. On Thu, Dec 2, 2010 at 1:18 PM, Geoff Freed <geoff_freed@wgbh.org> wrote: > > Apologies for a somewhat late response to this thread. > > I’d just like to drive in the point that it’s going to be very important to > provide support for, and easy UI access to, embedded audio-description > tracks. Pre-recorded, human-narrated tracks are the norm now for described > video, and having a mechanism for supporting and accessing those additional > audio tracks is necessary in order to support described broadcast video that > is moved to the Web. Pre-recorded descriptions aren’t going to go away any > time soon, even when TTS support is possible. > > One other thing to note: the current practice is to supply a single audio > track that contains both the program audio as well as the descriptions. > This full-mix track was necessary in analog broadcasts and is necessary now > on the Web because most multimedia players do not support the playback of > two tracks simultaneously (one being the the program audio and the other > being only the descriptions). Therefore, to support existing described > broadcast video that is moved to the Web, we’re going to need to provide the > capability (or the option) of toggling between a full-mix track and an > undescribed track, when both are available. (This will be in addition to > the new option of playing two tracks (program audio + descriptions) > simultaneously.) > > Geoff Freed > WGBH/NCAM > > > ======== > > > If that is indeed the solution we favor, we should move fast before > any implementations of <track> are made and <track> is confirmed as a > content-less element, seeing as we will need <source> elements inside > <track> then. > > Also, a big problem with this approach is that we lose all the > functionality that is otherwise available to audio and video > resources, such as setting he volume, width, height, placement etc. > For example, did you have any thoughts on how to do the display of > sign language video in this scenario? > > Cheers, > Silvia. > > On Thu, Oct 28, 2010 at 11:37 AM, Frank Olivier > <Frank.Olivier@microsoft.com> wrote: >> Overall feedback from the Internet Explorer team: >> >> Option 1 - Overloading <track> and keeping the DOM parsing very simple >> would be the most elegant way to proceed, with every track having a 'kind' >> attribute - Generally, I expect to be able to activate a single alternate >> track of a particular type (Also, this would also sync up well with a simple >> javascript API - I would expect that I (as an author) would be able to >> enumerate a flat list of alternate representations, with metadata that >> indicates the kind. >> >> Additionally, it would be prudent to leave exact conformance (when >> enabling/disabling alternate tracks) as a 'quality of implementation' issue >> - For example, with multiple alternate audio tracks, there will certainly be >> some (mobile) devices where the user agent is only able to play back one >> audio or video track at a time due to hardware constraints; some devices may >> only have enough screen real estate to display a single caption track. >> >> Text captioning and alternate audio tracks seem to be core requirements >> for HTML5 at this point. The user requirements document does do a good job >> of enumerating all issues that users face, but it will take significant time >> to fully spec and prototype all features - I expect that user agents will be >> able to implement the two areas mentioned; the features beyond this >> certainly require a lot more discussion and speccing that is best addressed >> in future versions of the HTML spec. >> >> Thanks >> Frank Olivier >> >> >> -----Original Message----- >> From: public-html-a11y-request@w3.org >> [mailto:public-html-a11y-request@w3.org] On Behalf Of Silvia Pfeiffer >> Sent: Tuesday, October 19, 2010 10:52 AM >> To: HTML Accessibility Task Force >> Subject: [media] handling multitrack audio / video >> >> Hi all, >> >> This is to start a technical discussion on how to solve the multitrack >> audio / video requirements in HTML5. >> >> We've got the following related bug and I want to make a start on >> discussing the advantages / disadvantages of different approaches: >> http://www.w3.org/Bugs/Public/show_bug.cgi?id=9452 >> >> Ian's comment on this was this - and I agree that his conclusion should be >> a general goal in the technical solution that we eventually >> propose: >>> The ability to control multiple internal media tracks (sign language >>> video overlays, alternate angles, dubbed audio, etc) seems like >>> something we'd want to do in a way consistent with handling of >>> multiple external tracks, much like how internal subtitle tracks and >>> external subtitle tracks should use the same mechanism so that they >>> can be enabled and disabled and generally manipulated in a consistent >>> way. >> >> I can think of the following different mark-up approaches towards solving >> this issue: >> >> >> 1. Overload <track> >> >> For example synchronizing external audio description and sign language >> video with main video: >> <video id="v1" poster="video.png" controls> >> <source src="video.ogv" type="video/ogg"> >> <source src="video.mp4" type="video/mp4"> >> <track kind="subtitles" srclang="fr" src="sub_fr.wsrt"> >> <track kind="subtitles" srclang="ru" src="sub_ru.wsrt"> >> <track kind="chapters" srclang="en" src="chapters.wsrt"> >> <track src="audesc.ogg" kind="descriptions" type="audio/ogg" >> srclang="en" label="English Audio Description"> >> <track src="signlang.ogv" kind="signings" type="video/ogg" >> srclang="asl" label="American Sign Language"> </video> >> >> This adds a @type attribute to the <track> element, allowing it to also be >> used with audio and video and not just text tracks. >> >> There are a number of problems with such an approach: >> >> * How do we reference alternative encodings? >> It would probably require the introduction of <source> elements inside >> <track>, making <track> more complex for selecting currentSrc etc. Also, if >> we needed different encodings for different devices, a @media attribute will >> be necessary. >> >> * How do we control synchronization issues? >> The main resource would probably always be the one whose timeline >> dominates and for the others we do a best effort to keep in sync with that >> one. So, what happens if a user wants to not miss anything from one of the >> auxiliary tracks, e.g. wants the sign language track to be the time keeper? >> That's not possible with this approach. >> >> * How do we design the JavaScript API? >> There are no cues, so TimedTrack cues and activeCues would be empty >> elements and the cuechange would not ever be activated. The audio and video >> tracks will be in the same TimedTrack list as the text ones and possibly >> creating confusion for example in a accessibility menu for track selection, >> in particular where the track @kind goes beyond mere accessibility such as >> alternate viewing angles or director's comment. >> >> * What about other a/v related features, such as width/height and >> placement of the sign language video or volume of the audio description? >> Having control over such extra features would be rather difficult to >> specify, since the data is only regarded as an abstract alternative content >> to the main video. The rendering algorithm would become a lot more complex >> and attributes from audio and video elements may be necessary to introduce >> onto the <track> element, too. It seems that would lead to quite some >> duplication of functionality between different elements. >> >> >> 2. Introduce <audiotrack> and <videotrack> >> >> Instead of overloading <track>, one could consider creating new track >> elements for audio and video, such as <audiotrack> and <videotrack>. >> >> This allows keeping different attributes on these elements and having >> audio / video / text track lists separate in JavaScript. >> >> Also, it allows for <source> elements inside <track> more easily, e.g.: >> <video id="v1" poster="video.png" controls> >> <source src="video.ogv" type="video/ogg"> >> <source src="video.mp4" type="video/mp4"> >> <track kind="subtitles" srclang="fr" src="sub_fr.wsrt"> >> <track kind="subtitles" srclang="ru" src="sub_ru.wsrt"> >> <track kind="chapters" srclang="en" src="chapters.wsrt"> >> <audiotrack kind="descriptions" srclang="en"> >> <source src="description.ogg" type="audio/ogg"> >> <source src="description.mp3" type="audio/mp3"> >> </audiotrack> >> </video> >> But fundamentally we have the same issues as with approach 1, in >> particular a replication need of some of the audio / video functionality >> from the <audio> and <video> elements. >> >> >> 3. Introduce a <par>-like element >> >> The fundamental challenge that we are facing is to find a way to >> synchronise multiple audio-visual media resources, be that from in-band >> where the overall timeline is clear or be that with separate external >> resources where the overall timeline has to be defined. Then we are suddenly >> not talking any more about a master resource and auxiliary resources, but >> audio-visual resources that are equals. This is more along the SMIL way of >> thinking, which is why I called this section the "<par>-like element". >> >> An example markup for synchronizing external audio description and sign >> language video with a main video could then be something like: >> <par> >> <video id="v1" poster="video.png" controls> >> <source src="video.ogv" type="video/ogg"> >> <source src="video.mp4" type="video/mp4"> >> <track kind="subtitles" srclang="fr" src="sub_fr.wsrt"> >> <track kind="subtitles" srclang="ru" src="sub_ru.wsrt"> >> <track kind="chapters" srclang="en" src="chapters.wsrt"> >> </video> >> <audio controls> >> <source src="audesc.ogg" type="audio/ogg"> >> <source src="audesc.mp3" type="audio/mp3"> >> </audio> >> <video controls> >> <source src="signing.ogv" type="video/ogg"> >> <source src="signing.mp4" type="video/mp4"> >> </video> >> </par> >> >> This synchronisation element could of course be called something else: >> <mastertime>, <coordinator>, <sync>, <timeline>, <container>, <timemaster> >> etc. >> >> The synchronisation element needs to provide the main timeline. It would >> make sure that the elements play and seek in parallel. >> >> Audio and video elements can then be styled individually as their own CSS >> block elements and deactivated with "display: none". >> >> The sync element could have an attribute to decide whether to have >> drop-outs in elements if the main timeline progresses, but some contained >> elements starved, or whether to go into overall buffering mode if one of the >> elements goes into buffering mode. It could also define one as the main >> element whose timeline should not be ignored and the others as slaves for >> which buffering situations would be ignored. Something like >> @synchronize=[block/ignore] and @master="v1" >> attributes. >> >> Also, a decision would need to be made about what to do with @controls. >> Should there be a controls display on the first/master element if any of >> them has a @controls attribute? Should the slave elements not have controls >> displayed? >> >> >> 4. Nest media elements >> >> An alternative means of re-using <audio> and <video> elements for >> synchronisation is to put the "slave" elements inside the "master" >> element like so: >> >> <video id="v1" poster="video.png" controls> >> <source src="video.ogv" type="video/ogg"> >> <source src="video.mp4" type="video/mp4"> >> <track kind="subtitles" srclang="fr" src="sub_fr.wsrt"> >> <track kind="subtitles" srclang="ru" src="sub_ru.wsrt"> >> <track kind="chapters" srclang="en" src="chapters.wsrt"> >> <par> >> <audio controls> >> <source src="audesc.ogg" type="audio/ogg"> >> <source src="audesc.mp3" type="audio/mp3"> >> </audio> >> <video controls> >> <source src="signing.ogv" type="video/ogg"> >> <source src="signing.mp4" type="video/mp4"> >> </video> >> </par> >> </video> >> >> This makes clear whose timeline the element is following. But it sure >> looks recursive and we would have to define that elements inside a <par> >> cannot have another <par> inside them to stop that. >> >> === >> >> These are some of the thoughts I had on this topic. I am not yet decided >> on which of the above proposals - or an alternative proposal - makes the >> most sense. I have a gut feeling that it is probably useful to be able to >> define both, a dominant container for synchronization and one where all >> containers are valued the same. So, maybe the third approach would be the >> most flexible, but it certainly needs a bit more thinking. >> >> Cheers, >> Silvia. >> >>
Received on Thursday, 2 December 2010 03:19:13 UTC