Re: Track kinds from Mark Watson on 2011-04-29 (public-html-a11y@w3.org from April 2011)

From: Mark Watson <watsonm@netflix.com>
Date: Fri, 29 Apr 2011 12:33:04 -0700
To: David Singer <singer@apple.com>
CC: Silvia Pfeiffer <silviapfeiffer1@gmail.com>, HTML Accessibility Task Force <public-html-a11y@w3.org>
Message-ID: <21ED66C3-2E89-456C-9AF7-E0A00CB770C5@netflix.com>

David,

I'm still not understanding....

Suppose I have two presentations (A and B), each with two audio tracks (A1, A2 and B1, B2) as follows:
A1: main audio track for presentation A
A2: audio description track for presentation A containing all the main audio, plus the descriptions

B1: main audio track for presentation B
B2: audio description track for presentation B containing only the descriptions

And then the kinds are:

A1: main
A2: audiodesc

B1: main
B2: audiodesc

How does the media framework know that in case A, when audio descriptions are enabled, it should render only A2 but in case B is should render B1+B2 ? Or do you suggest:

A1: main
A2: main audiodesc

B1: main
B2: audiodesc

Same question for clear audio. Repetitive Stimulus Safe is a different issue.

...Mark

On Apr 29, 2011, at 11:13 AM, David Singer wrote:

On Apr 29, 2011, at 12:54 , Mark Watson wrote:

We may be talking at slightly crossed-purposes.

What I understand is that both clear audio and audio descriptions can be supplied in two different ways, either with or without the rest of the soundtrack mixed in.

In one case the player needs to render both the original and the accessibility audio track.

In the other case it's either-or.

Right. And I am saying that audio mixing is out of scope for HTML5. That's a media player feature. If within, say, a movie file the author provides several audio tracks that need mixing, that's the QuickTime framework's problem. DASH has the ability to to exclusive-or and inclusive-or, so I would (sketching wildly) look for tracks with media type audio labelled "main music", "main clear", "main background" -- which highlights that if we are to get into a language for elements, we'll need more terms.

If the client always chooses either-or then depending on how the content has been provided you may or may not get the non-dialog or non-description sounds.

So, at the HTML level:

a) if one source is already delivering, or can be configured to deliver, clear audio (or be rep. stim. safe), label it as "clear" (or safe)
b) if there is an alternative source that is safe, label that.

the requirement that the media engine responds to the same needs has to remain (as is being discussed in DASH).

If the accessibility track has been provided without the (non-dialog) or (non-description) sounds from the main track then you can mix the two and adjust the relative volume.

media engine issue, I hope.

If the accessibility track has been provided with the (non-dialog) or (non-description) sounds from the main track (i.e. as an alternative) then I think it would be brave to mix them and expect it to work. Whilst you might expect the timing to be precise and the result to be just a difference in relative volume I'm not sure you could rely on this unless the content was specifically authored with that intent. I'm not a sound engineer, but is seems if the alternative track has been independently mixed, edited, prepared - even a little bit - then things could go wrong if you mix it with another track.

Which approach is taken by the content provider can't be worked out by the client - it needs to be signaled.

But I think the existence of these two approaches is quite specific to these use-cases and so I don't see a need for a generic mechanism. Two kinds for each would be sufficient.

...Mark

On Apr 29, 2011, at 6:52 AM, David Singer wrote:

On Apr 29, 2011, at 1:43 , Mark Watson wrote:

This is a more general question about how to signal whether a track is "additive" or "alternative". I think we discussed this before (or maybe that was a different list, I forget).

I don't think we are designing SMIL, or other mixing systems, in general. I think we need to say at the HTML or DASH level 'pick this source if you need clearaudio or repetitivestimulussafe', and leave to the media engine underneath any necessary configuring of the media resource to provide that experience.

That is, it's a two-step process; select at the DASH/HTML level, configure at the media engine level.

Putting the tag on all sources mean that the program content naturally has this characteristic. Putting it on some indicates a choice. Not mentioning it means the content author didn't think about it (typically).

In the case of a built-up audio experience, it's the media engine's problem to adjust the build to respond to 'I need clearaudio' (perhaps by adjusting relative volumes, perhaps by disabling some audio completely).

In the context of audio descriptions I've seen it stated that sometimes an audio description track is a replacement for the main audio track and in other cases it's intended to be mixed in (i.e. the descriptions fit somehow into gaps in the main audio track.)

Does anyone on this list know whether that is true ?

As with the repetitive stimulus question there are three approaches:
(a) treat it as a new property (additive vs alternative)
(b) allow multiple kinds (so we wold have "alternate clearaudio" and just "clearaudio")
(c) define separate kinds for the different cases, where it makes sense ("clearaudio-alt" and "clearaudio-mix")

Personally, I prefer (c) as I don't think the concept is universal enough to warrant a separate property.

...Mark

Cheers,
Silvia.

David Singer
Multimedia and Software Standards, Apple Inc.

Received on Friday, 29 April 2011 19:33:34 UTC