Re: [media] handling multitrack audio / video from Geoff Freed on 2010-10-28 (public-html-a11y@w3.org from October 2010)

From: Geoff Freed <geoff_freed@wgbh.org>
Date: Thu, 28 Oct 2010 08:46:32 -0400
To: Philip Jägenstedt <philipj@opera.com>, "public-html-a11y@w3.org" <public-html-a11y@w3.org>
Message-ID: <C8EEE8E8.12568%geoff_freed@wgbh.org>

Hi,Philip:

Comments below.
Geoff/NCAM

======
<snip>


> How would you suggest we solve the issue of audio descriptions that
> are provided as separate audio files from the main video resource?
> They have to be synchronized to the main video resource since the
> spoken description is timed to be given at the exact times where the
> main audio resource has pauses. I have previously made an experiment
> at http://www.annodex.net/~silvia/itext/elephant_separate_audesc_dub.html
> to demonstrate some of the involved issues, in particular the
> synchronization problem. It can be solved with frequent re-sync-ing.

Assuming that the external audio track has the same timeline as the main
video file, then something like this:

<video id="v" src="video.webm"></video>
<audio sync="v" src="description.webm"></audio>

This wouldn't be particularly hard to implement, I think, most of the
complexity seems to be in what the state of the video should be when the
audio has stopped for buffering, and vice versa.

> Is this not an acceptable use case for you?

Perhaps I was unclear, if we supporting syncing separate <audio>/<video>
elements, then the above is actually the very least we could do.

======
GF:  I agree, and it may be one way in which authors will want to sync multiple video descriptions all contained in a single audio file, with silence in between each description.  For example, if you have a 60-second video clip playing in parallel with an audio clip containing three five-second video descriptions, and these descriptions are timed to play at 15s, 30s and 45s, the audio file would be structured like this:

-- silence from 0s-15s
--  video description #1 from 15s-20s
--  silence from 20s-30s
-- video description #2 from 30s-35s
-- silence from 35s-45s
-- video description #3 from 45s-50s
-- silence from 50s-60s.
======

It's
beyond this most basic case I'd like to understand the actual use cases.
To clarify, option 2 would allow things like this, borrowing SMIL syntax
as seen in SVG:

<video id="v" src="video.webm"></video>
<video begin="v.begin+10s" src="video2.webm"></video>
<!-- video and video2 should be synchronized with a 10s offset -->

or

<video id="v" src="video.webm"></video>
<video begin="v.end" src="video2.webm"></video>
<!-- video and video2 should play gapless back-to-back -->

Are there compelling reasons to complicate things to this extent? The last
example could be abused to achieve gapless playback between chunks in a
HTTP live streaming setup, but I'm not a fan of the solution myself.

======
GF:
I think there are compelling cases which are likely to occur in production environment because they are more efficient than the example I outlined above.  For example, an author could store the same three descriptions discretely, rather than in a single audio file, and then fire each one at the appropriate point in the timeline, in a manner similar to the one you've noted above:

<video id="v" src="video.webm"></video>
<audio sync="v.begin+15s" src="description1.webm"></audio>
<audio sync="v.begin+30s" src="description2.webm"></audio>
<audio sync="v.begin+45s" src="description3.webm"></audio>

======

Received on Thursday, 28 October 2010 12:52:16 UTC