Re: Track kinds

On Apr 29, 2011, at 12:47 PM, David Singer wrote:

On Apr 29, 2011, at 15:33 , Mark Watson wrote:


I'm still not understanding....

Suppose I have two presentations (A and B), each with two audio tracks (A1, A2 and B1, B2) as follows:
A1: main audio track for presentation A
A2: audio description track for presentation A containing all the main audio, plus the descriptions

B1: main audio track for presentation B
B2: audio description track for presentation B containing only the descriptions

Two possible answers (1 is what I described):

1. A and B (the multiplexes) are both labeled as "capable of description" at the HTML level ("main + description"), and then there is no need for the HTML engine to 'open the kimono' and look at the tracks. If desired, media engines are told 'please provide descriptions' (we've been told you can), and they work out what that means.  e.g. it is left to DASH's xor/inc-or rules.

The only way the web page tell the video element "please provide descriptions" is by enabling the track marked with "descriptions".

Should the page also disable the track marked "main" ?

One approach would be to define an answer to that question (in the definition of the "descriptions" tag) and let the framework underneath work out what to do. Is this what you mean ?

That would mean that the "alternative vs additional" property of the track would need to be visible in the container but would be hidden from the HTML layer.

It's a nice simplification for the HTML layer, but it is likely that the subtlety might be lost on container format designers, who may just provide support for the HTML kinds and ignore the "alternative vs additional" property.

2. A2 is labelled "main+description", not just "description", and B2 is labelled "alternate+description", not just "description"

This is what I suggested below although I think B2 would be labeled just "description", because it is not an alternative to B1 - it is intended to be presented together with B1.

This would expose the "alternative vs additional" property to HTML.

The remaining question would be syntactically, is "main+description" two tags, with a global convention that any number of tags can be joined with a "+" character. Or is it a single tag defined separately from "main" and "description".


And then the kinds are:

A1: main
A2: audiodesc

B1: main
B2: audiodesc

How does the media framework know that in case A, when audio descriptions are enabled, it should render only A2 but in case B is should render B1+B2 ? Or do you suggest:

A1: main
A2: main audiodesc

B1: main
B2: audiodesc


Same question for clear audio. Repetitive Stimulus Safe is a different issue.

If we must be dogmatic, no, it is not.  A2 is an alternative suitable for someone with a need for descriptions.  The same can apply to clear media, and to repetitive stimulus safeness.  Don't invent distinctions where they are not needed.

The labels mean "this track meets the identified need".  Most people need main content. Some need additional content, and some need alternative content.  The only minor difference I see for rep. stimulus is that it's unlikely to be met with additional tracks (though, actually, a visual overlay might obscure a strobing light, for example), and more likely to be met with alternatives.


On Apr 29, 2011, at 11:13 AM, David Singer wrote:

On Apr 29, 2011, at 12:54 , Mark Watson wrote:

We may be talking at slightly crossed-purposes.

What I understand is that both clear audio and audio descriptions can be supplied in two different ways, either with or without the rest of the soundtrack mixed in.

In one case the player needs to render both the original and the accessibility audio track.

In the other case it's either-or.

Right.  And I am saying that audio mixing is out of scope for HTML5.  That's a media player feature.  If within, say, a movie file the author provides several audio tracks that need mixing, that's the QuickTime framework's problem.  DASH has the ability to to exclusive-or and inclusive-or, so I would (sketching wildly) look for tracks with media type audio labelled "main music", "main clear", "main background" -- which highlights that if we are to get into a language for elements, we'll need more terms.

If the client always chooses either-or then depending on how the content has been provided you may or may not get the non-dialog or non-description sounds.

So, at the HTML level:

a) if one source is already delivering, or can be configured to deliver, clear audio (or be rep. stim. safe), label it as "clear" (or safe)
b) if there is an alternative source that is safe, label that.

the requirement that the media engine responds to the same needs has to remain (as is being discussed in DASH).

If the accessibility track has been provided without the (non-dialog) or (non-description) sounds from the main track then you can mix the two and adjust the relative volume.

media engine issue, I hope.

If the accessibility track has been provided with the (non-dialog) or (non-description) sounds from the main track (i.e. as an alternative) then I think it would be brave to mix them and expect it to work. Whilst you might expect the timing to be precise and the result to be just a difference in relative volume I'm not sure you could rely on this unless the content was specifically authored with that intent. I'm not a sound engineer, but is seems if the alternative track has been independently mixed, edited, prepared - even a little bit - then things could go wrong if you mix it with another track.

Which approach is taken by the content provider can't be worked out by the client - it needs to be signaled.

But I think the existence of these two approaches is quite specific to these use-cases and so I don't see a need for a generic mechanism. Two kinds for each would be sufficient.


On Apr 29, 2011, at 6:52 AM, David Singer wrote:

On Apr 29, 2011, at 1:43 , Mark Watson wrote:

This is a more general question about how to signal whether a track is "additive" or "alternative". I think we discussed this before (or maybe that was a different list, I forget).

I don't think we are designing SMIL, or other mixing systems, in general.  I think we need to say at the HTML or DASH level 'pick this source if you need clearaudio or repetitivestimulussafe', and leave to the media engine underneath any necessary configuring of the media resource to provide that experience.

That is, it's a two-step process; select at the DASH/HTML level, configure at the media engine level.

Putting the tag on all sources mean that the program content naturally has this characteristic.  Putting it on some indicates a choice.  Not mentioning it means the content author didn't think about it (typically).

In the case of a built-up audio experience, it's the media engine's problem to adjust the build to respond to 'I need clearaudio' (perhaps by adjusting relative volumes, perhaps by disabling some audio completely).

In the context of audio descriptions I've seen it stated that sometimes an audio description track is a replacement for the main audio track and in other cases it's intended to be mixed in (i.e. the descriptions fit somehow into gaps in the main audio track.)

Does anyone on this list know whether that is true ?

As with the repetitive stimulus question there are three approaches:
(a) treat it as a new property (additive vs alternative)
(b) allow multiple kinds (so we wold have "alternate clearaudio" and just "clearaudio")
(c) define separate kinds for the different cases, where it makes sense ("clearaudio-alt" and "clearaudio-mix")

Personally, I prefer (c) as I don't think the concept is universal enough to warrant a separate property.



David Singer
Multimedia and Software Standards, Apple Inc.

David Singer
Multimedia and Software Standards, Apple Inc.

David Singer
Multimedia and Software Standards, Apple Inc.

Received on Monday, 2 May 2011 17:57:48 UTC