Re: Meaning of audio track kind 'descriptions'

On Tue, Jun 21, 2011 at 5:58 PM, Mark Watson <watsonm@netflix.com> wrote:
>
> On Jun 21, 2011, at 4:13 AM, Silvia Pfeiffer wrote:
>
>> On Mon, Jun 20, 2011 at 10:47 PM, David Singer <singer@apple.com> wrote:
>>> I'd like to return to a general question of the client software and the user.  It seems we should enable the client software to expose the choices to the user (or match them against user preferences), and enable the client software to enable or disable the right tracks to get that effect -- ideally, without having special-purpose decisions based on the actual adaptation.  So if, for example, we introduce a new accessibility adaptation in future, old clients that don't recognize the keyword can still ask the user "do you want/need X?" and still 'do the right thing' with the content (in terms of enable/disable actions).
>>>
>>> So, here are two simple ways to achieve this.  I am sure there are others.  Both of these allow multiple tags in a 'kind' label.
>>>
>>> A) In the set of tags for a given track's kind, have either "+" or "-" before each tag.  "+X" means 'enable this track if the user wants X' and "-X" means 'disable this track if the user wants X.  Disables over-ride enables, that is, if a label says "+X -Y" and you want both X and Y, you disable the track.
>>>
>>> In both cases I think the labelling can be complete enough that the initial state is irrelevant - the labels and algorithm give a clear outcome for every track - but perhaps it could be said that the initial state has "main" tracks enabled and everything else disabled.
>>>
>>> [examples below]
>>>
>>> B) In the set of tags for a given track, say that the tag set "alternate X" means 'disable the main content of the same media type if you want X, and enable this track' and "X" means just 'enable this track if you want X'.
>>>
>>>
>>> Examples:
>>>
>>> 1) text captions as an add-on text track
>>>  A) the text track has kind="+captions" (or whatever the word is)
>>>  B) the text track has kind="captions"
>>>
>>> 2) burned-in captions in an alternative video track
>>>  A) the main video has kind="+main -captions" and the alternative video with captions has kind="+captions"
>>>  B) the main video has kind="main" and the alternative video has kind="alternative captions"
>>>
>>> 3) audio description as an add-on to the main audio - just like example 1.
>>>
>>> 4) audio description as a replacement to the main audio - just like example 2.
>>>
>>> 5) clean audio as an alternative to the main audio - just like example 2.
>>>
>>> 6) clean audio, where the main audio is delivered in two tracks - the dialog and the background music separately - and the background music is disabled for the user needing clean audio:
>>>  A) the two tracks say kind="main" and kind="main -cleanaudio"
>>>  B) ... I don't see how to express this.
>>>
>>> 7) Repetitive stimulus avoidance as an alternative - just like example 2.
>>>
>>> 8) Repetitive stimulus avoidance as an overlay (e.g. a black square in front of the flashing light) - just like example 1.
>>>
>>>
>>>
>>> I really don't like the case where you have to recognize that a kind of "Q", when wanted, implicitly means disable the main content of the same media type, whereas "R" doesn't.
>>>
>>> Nor am I crazy about this implicit matching over media type - there are people (especially in asia) that use alpha-coded images (aka a video track) to deliver captions on occasion, for example.
>>>
>>>
>>> Why is it hard to come up with a simple scheme to enable the client software to get out of the business of being *required* to understand the labels?
>>
>> What you are proposing above *is* hard: it takes the complexity of
>> announcing what is available in a track one step further towards a
>> language of how to compose tracks together. I really think that is
>> over-engineering for the 10% use case.
>>
>>
>> The way I look at things is that the most optimal content is delivered
>> in tracks that are always additions to the main track and can be
>> turned on/off.
>
> So, one thing is that we need to be clear in the definition of each kind whether the track is explicitly intended to be presented together with the "main" material, or instead of it.
>
>> Anything that is not an addition is legacy content and
>> should not get special treatment so as to discourage it.
>
> I disagree with this statement. Two concrete examples:
>
> 1) Alternative languages are better treated as an alternative, not an addition.
>
> 2) Cleanaudio would be difficult to do as an addition to the main audio track - both the main audio and the clean audio contain the dialog, so they need to be sample-aligned to ensure they can be mixed. The "efficient" way to do cleanaudio is that the main audio is split into two tracks - say dialog and background - which are mixed together at the client. The clean audio service is then achieved by adjusting the relative volume of these two. But I think cleanaudio is unlikely to be done this way any time soon.


Yes, there are exceptions where a replacement is quite explicit and
necessary. I was just talking about those situations where you could
do either.


> Furthermore, even for the cases where delivering the track as an addition is likely to be common (e.g. descriptions or commentary), professional content authors are able to do a better job of ducking/modifying the original audio than the client will. There are audio formats which include instructions for mixing the tracks, in which case the descriptions/commentary can still be delivered as an additional stream with these detailed mixing instructions, but this still implies that there is no change to the audio mixing *within* the original audio (e.g. changing relative volume of dialog vs music).


See other email - I disagree that professional mixing is always better
at ducking than user-controlled.


> This is not a "legacy" concern - authors may still want to construct a descriptions or commentary as an alternative track.

That's what the "alternative" label is for.


>> There are
>> always means to deal with such content in JavaScript anyway and we
>> have the capture-all phrase of "alternate" for such content.
>
> We should always ensure that tracks provided for accessibility purposes are marked with their specific accessibility purpose (e.g. descriptions) for the reasons discussed in another mail. Otherwise we make these tracks more difficult to use when the exact purpose of the track is to make the service easier (or even just possible) to use for some people.

During a transition phase, we can probably do something like
kind="main+descriptions" as a marker for what the track provides. I
would be open for this, though I believe that we really need
implementation experience first before we should make any more changes
to those parts of the specification.

Cheers,
Silvia.

Received on Tuesday, 21 June 2011 08:59:12 UTC