Re: Tech Discussions on the Multitrack Media (issue-152) from Silvia Pfeiffer on 2011-02-25 (public-html@w3.org from February 2011)

From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Date: Fri, 25 Feb 2011 11:57:30 +1100
To: David Singer <singer@apple.com>
Cc: Bob Lund <B.Lund@cablelabs.com>, "public-html@w3.org" <public-html@w3.org>
Message-ID: <AANLkTimis-n08_1Om6x1wpYrRZ0hP6FToAftprczbv=S@mail.gmail.com>

On Fri, Feb 25, 2011 at 10:24 AM, David Singer <singer@apple.com> wrote:
>
> On Feb 24, 2011, at 8:57 , Bob Lund wrote:
>
> [Bob Lund] I agree with your observation that timed text, audio and video
> all layout a presentation along a timeline. In the context of HTML5, though,
> “Timed Text Tracks” expose “cues” with a start and end time, and data –
> either text or metadata. The proposed multi-track media APIs expose the
> presence of additional tracks along with the ability to denote whether the
> track is showing.. There is no “cue” and there is no access to the data in
> the track.
>
>
> True, access to data in the track would be useful for any media type, not
> just text.  (Audio processing, for example, or extracting an image from
> video and painting it onto a Canvas).  So rather than treating text tracks
> as special, I'd prefer to see all tracks treated powerfully enough to meet
> the text and other needs.

That's not really possible.

The main feature of text tracks is that their data are sparse chunks
along the timeline with relatively little data, therefore it is
possible to parse all of this data into a cue list, keep it in memory
and make it available as a TextTrackCueList to JS, as well as throw an
event on the track when cues change, and on the activated and
deactivated cues themselves. We need this kind of flexibility on the
text cues to allow people to build their own interfaces around the
text cues.

Introducing an API of this kind for audio and video tracks is,
however, not really possible. Assuming we take the concept of a "cue"
to mean a "group of samples" for audio and video, then we'd have to
hold a large amount of data in memory for an AudioCueList or a
VideoCueList. And since the individual cues would typically related to
a small amount of time (for video I would think they relate to a
frame, for audio maybe to 40ms), then as we play, we'd have constantly
firing events both on the track and the cues and we'd have to
constantly adapt the CueList making it not useful to the JS programmer
anyway and probably exploding the browser.

So, basically, the audio and video API for data can only really be a
polling API, while for text it is and totally should be a push API.
Even with in-band streams: as soon as the browser finds a text cue, it
needs to make it available to the browser to add to the
TextTrackCueList, so that the browser and JS developer get sufficient
early notice and can do something with it.

The discontinuous and sparse nature of text tracks make them a very
different beast to audio and video track. If we did want to deal with
audio and video tracks as well as text tracks through the TextTrack
API, I would think we can only make it such that this part of the
TextTrack API is disabled for audio and video tracks:
    readonly attribute TextTrackCueList cues;
    readonly attribute TextTrackCueList activeCues;
                   attribute Function oncuechange;

I don't know what the downsides of such an approach would be, but it
certainly feels a bit clunky.

Cheers,
Silvia.

Received on Friday, 25 February 2011 00:58:22 UTC