Re: Meaning of audio track kind 'descriptions' from David Singer on 2011-06-20 (public-html-a11y@w3.org from June 2011)

From: David Singer <singer@apple.com>
Date: Mon, 20 Jun 2011 12:34:17 +0200
To: Mark Watson <watsonm@netflix.com>
Cc: Bob Lund <B.Lund@cablelabs.com>, Silvia Pfeiffer <silviapfeiffer1@gmail.com>, HTML Accessibility Task Force <public-html-a11y@w3.org>
Message-id: <7ABB4B5A-43CD-45E1-8553-8FDE0407B98B@apple.com>
On Jun 20, 2011, at 10:55 , Mark Watson wrote:

> Is it not the case than when authoring audio descriptions you might want to make decisions to attenuate part of the main audio track to make the descriptions more audible ?

Yes.  I think both cases arise:
a) the main audio has enough gaps in it that I can overlay the descriptions audio track on it without changing it;
b) the main audio has to be 'doctored' (level adjustments, maybe shifted around a bit) to leave room for the descriptions.

(a) is covered by an additional audio track.  (b) by a replacement.


> 
> That would be a reason other than legacy technical restrictions for creating "alternative" audio descriptions tracks.
> 
> Silvia: I didn't understand you comment of not seeing a need to distinguish between alternative and additional tracks: surely I need to know this so that I know whether to enable this track in addition to the main track or instead of the main track.
> 
> It seems like we have use-cases for both alternative and additional audio descriptions. I think the track should be explicitly marked as descriptions in both cases, so we can apply user preferences, include it in the right menu etc.
> 
> So, how should be distinguish the two cases ? We could have two kind values or use data-*as Silvia suggested (Silvia, could you explain how that works ?).
> 
> ...Mark
> 
> 
> 
> On Jun 17, 2011, at 5:04 PM, Bob Lund wrote:
> 
>> 
>> 
>> 
>>> -----Original Message-----
>>> From: Silvia Pfeiffer [mailto:silviapfeiffer1@gmail.com]
>>> Sent: Thursday, June 16, 2011 5:18 PM
>>> To: Bob Lund; Silvia Pfeiffer; Mark Watson; HTML Accessibility Task
>>> Force
>>> Subject: Re: Meaning of audio track kind 'descriptions'
>>> 
>>> I agree with the way that Janina describes the situation and the future.
>>> 
>>> I also understand Bob's current situation of having to deal with set-top
>>> boxes and TVs.
>> 
>> It's a fair point that legacy set-top boxes and TVs might not be the primary target for future browser-based clients. The important underlying issue, from what I hear, is that content owners would prefer not to re-author content, already being delivered to legacy devices, for browser-based clients.
>> 
>>> 
>>> Bob, I further wonder: when you get the described audio as mixed-in, do
>>> you get it as a separate audio track or is it actually a completely
>>> different video file? So do you deal with one video file (with original
>>> video track + original audio track) and one audio file (with mixed audio
>>> + descriptions) or do you deal with two video files (one with the
>>> original audio track and one with the mixed one)?
>> 
>> The video in question is delivered as MPEG-2 multi-program transport streams, rather than file based. The MPEG-2 MPTS has multiple programs, where each might have 1 video stream, 1 main dialogue audio track and a secondary audio track consisting of the main dialogue + audio description. MPEG-2 TS can also carry MPEG-4 elementary media streams. This same multiplexing structure (1 program with multiple audio tracks) can also be replicated in the MPEG-4 base file format and used with adaptive delivery.
>> 
>> Bob
>> 
>>> 
>>> Silvia.
>>> 
>>> On Fri, Jun 17, 2011 at 5:15 AM, Janina Sajka <janina@rednote.net>
>>> wrote:
>>>> Bob is correct, imho. But please note the reason human narrated audio
>>>> description is pre-mixed with the audio of the primary resource and
>>>> delivered in a single channelis the historical technology that first
>>>> brought such content to market. You just couldn't do anything else in
>>>> analog TV, where only the SAP channel was available for this content,
>>>> because the home premises equipment wasn't designed to do audio
>>> mixing.
>>>> 
>>>> This history also condemmed the described video to mono playback.
>>>> 
>>>> While this model will predominate early on, I'm by no means convinced
>>>> it's it describes the future. To start with, it would sure be nice to
>>>> have stereo sound. Oh, and it would be nice to be able to
>>>> independently adjust the volume of the video descriptions, and even to
>>>> direct them at specific audio devices while audio from the primary
>>>> resource is played through different audio devices.
>>>> 
>>>> 
>>>> Given that many people with disabilities have multiple disabilities
>>>> and hearing loss together with blindness is by no means uncommon,
>>>> separating the description track from the primary audio has
>>>> advantages. And, now we have a technology platform that can deliver
>>>> it, unlike the 1950's SAP specifications.
>>>> 
>>>> Janina
>>>> 
>>>> Bob Lund writes:
>>>>> The use case today in cable is descriptive video service where the
>>> description and main dialogue are pre-mixed and delivered as a single
>>> channel. The single channel is preferred because the majority of video
>>> receivers (set-top-boxes and TVs) do not have the capability to mix
>>> audio. There are emerging regulatory requirements to provide descriptive
>>> video service (audio descriptions) so in the short term the single
>>> channel approach might be more prevalent. Related to this is a desire by
>>> content owners to not have to re-author content to deliver it on the
>>> Web.
>>>>> 
>>>>> If pre-mixed description and dialogue is identified as @kind =
>>> alternative then there will need to be a way to distinguish it from
>>> other types of alternative audio. @label could be used but then it will
>>> be desirable to have some definition of @label string semantics so
>>> clients know how to interpret them.
>>>>> 
>>>>> Bob
>>>>> 
>>>>>> -----Original Message-----
>>>>>> From: Silvia Pfeiffer [mailto:silviapfeiffer1@gmail.com]
>>>>>> Sent: Wednesday, June 15, 2011 10:56 PM
>>>>>> To: Mark Watson
>>>>>> Cc: Bob Lund; HTML Accessibility Task Force
>>>>>> Subject: Re: Meaning of audio track kind 'descriptions'
>>>>>> 
>>>>>> Note that I haven't yet seen a use case that absolutely requires us
>>>>>> to know if a track is additional or alternative. If we do, we can
>>>>>> always use a data-* attribute for this right now. If we see the
>>>>>> data-* attribute being required to solve use cases, then we can ask
>>>>>> for the introduction of an additional marker.
>>>>>> 
>>>>>> Bob: what was your use case?
>>>>>> 
>>>>>> Cheers,
>>>>>> Silvia.
>>>>>> 
>>>>>> On Thu, Jun 16, 2011 at 2:53 PM, Silvia Pfeiffer
>>>>>> <silviapfeiffer1@gmail.com> wrote:
>>>>>>> On Thu, Jun 16, 2011 at 1:07 PM, Mark Watson
>>>>>>> <watsonm@netflix.com>
>>>>>> wrote:
>>>>>>>> I had a different understanding.
>>>>>>>> 
>>>>>>>> We keep coming back to these cases where we can imagine both
>>>>>> "alternative" and "additional" tracks as solutions to some problem.
>>>>>>>> 
>>>>>>>> I've argued at length before that it doesn't work to have a
>>>>>>>> blanket
>>>>>> mechanism whereby any track can be labeled as either "alternative"
>>>>>> or "additional" - and indeed we have no such mechanism: it's
>>>>>> implicit in the track kind - you need to understand the kind to
>>>>>> know whether it is alternative or additional.
>>>>>>>> 
>>>>>>>> I actually thought that all our audio kinds were alternatives.
>>>>>>>> I'm no
>>>>>> expect, but I would guess that it's hard to create a descriptions
>>>>>> track which can be freely mixed with the original audio.
>>>>>>> 
>>>>>>> 
>>>>>>> I've done so before. It's not hard at all. You listen to the
>>>>>>> original track and you speak into the microphone. It is easier to
>>>>>>> record it in this way because the quality of the original audio
>>>>>>> doesn't degrade. It is also the way in which for example the
>>> jwplayer works:
>>>>>>> http://www.longtailvideo.com/support/addons/audio-description/151
>>>>>>> 36/au
>>>>>>> dio-description-reference-guide
>>>>>>> .
>>>>>>> 
>>>>>>> It would be bad if you have to mix in the original audio because
>>>>>>> that both degrades the quality of that track, increases the
>>>>>>> required bandwidth (because compressed silence is smaller than
>>>>>>> compressed sound), requires re-recording the original content
>>>>>>> (which might end up in copyright trouble), and requires switching
>>>>>>> between tracks rather than just adding and removing a track.
>>>>>>> Switching between tracks will be a lot more perceptible than
>>> adding/removing a second track.
>>>>>>> 
>>>>>>> So, I can only see advantages to having an audio description
>>>>>>> provided as a separate track.
>>>>>>> 
>>>>>>> 
>>>>>>>> If both kinds exists (alternative descriptions and additive
>>>>>> descriptions), then we need two kind values. Given that it's an
>>>>>> accessibility requirement it would be nice for it to be explicit,
>>>>>> so I would expect to have two "descriptions" kinds e.g.
>>>>>> descriptions-add and descriptions-alt.
>>>>>>> 
>>>>>>> I've only ever seen audio descriptions that come as separate
>>> tracks.
>>>>>>> In the TV case you would have had to mix it for transmission
>>>>>>> because there was only one channel available for transmission,
>>>>>>> but I believe that is the artificial case. The more natural case
>>>>>>> is to have them separate.
>>>>>>> 
>>>>>>> 
>>>>>>> Cheers,
>>>>>>> Silvia.
>>>>>>> 
>>>> 
>>>> --
>>>> 
>>>> Janina Sajka,   Phone:  +1.443.300.2200
>>>>               sip:janina@asterisk.rednote.net
>>>> 
>>>> Chair, Open Accessibility       janina@a11y.org Linux
>>> Foundation
>>>> http://a11y.org
>>>> 
>>>> Chair, Protocols & Formats
>>>> Web Accessibility Initiative    http://www.w3.org/wai/pf World Wide
>>>> Web Consortium (W3C)
>>>> 
>>>> 
>> 
> 
> 

David Singer
Multimedia and Software Standards, Apple Inc.
Received on Monday, 20 June 2011 10:34:47 UTC