Re: Tech Discussions on the Multitrack Media (issue-152) from David Singer on 2011-02-28 (public-html@w3.org from February 2011)

From: David Singer <singer@apple.com>
Date: Mon, 28 Feb 2011 10:46:59 -0800
To: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Cc: Bob Lund <B.Lund@cablelabs.com>, Sean Hayes <Sean.Hayes@microsoft.com>, "public-html@w3.org" <public-html@w3.org>
Message-Id: <F8BCB9D8-D25B-44A4-BAA0-FB22A7DEE9CD@apple.com>
On Feb 27, 2011, at 16:39 , Silvia Pfeiffer wrote:

> For both of the below to work, the browser has to implement an
> extraction of the cues from the binary media resource into a TextTrack
> with TextTrackCues, which then exposes it to the browser and to
> JavaScript. Since by default only TextTracks of @kind=subtitles or
> @kind=captions are rendered and these can only contain raw text, it
> probably makes sense to declare both the content advisory descriptor
> and the image-based caption as @kind=metadata.
> 

Yes, I would have the media type of these be 'text', the coding format be something suitable, and the 'kind' (purpose, function, really) be something that indicated it's metadata of some sort.

I think a general API 

Tell:
* what the times at which the content of this track changes (in the case that the UA has a time-map, e.g. a complete caption or media file)
* when the content of this track changes (i.e. a frame-by-frame callback)
* what the content of this track is at this (possibly current) time

covers the uses, doesn't it?

Bob's message talks of the information being repeated -- carouseled -- but that is just refreshing what the terminal already knew.  I don't think that those should constitute a change of content.

> In the case of the content advisory descriptor, I would expect the
> browser to indeed only expose the change of the descriptor in a new
> cue. Anything else is just repeated information.
> 
> In the case of image-based captions, I am uncertain how the images can
> actually be exposed to the browser. One way would be to structure the
> cue's text content in such a way that there is a text bit and then
> there is a base64 encoded text bit that can be extracted in JavaScript
> and displayed. Or alternatively it might be possible to introduce a
> service on the server that does this extraction through a URL, e.g.
> http://example.com/example_video.mp4?get_caption_image#lang=en&t=20
> (or something more nicely engineered). Then this url can be inferred
> by the JavaScript parser of the related @kind=metadata track. Since
> this involves some sort of server-side service, I am not sure what the
> browser would expose in the actual TextTrack, but maybe it can just
> extract the metadata of the track and put that in.
> 
> Anyway, these two are special use cases which IIUC rely on special
> track types. I am not sure how much of these special track types
> browser vendors actually would want to implement support for. I expect
> that Web developers may instead consider just using external WebVTT
> files with @kind=metadata and their own special markup to make such
> things work across browsers an media formats.

I don't think that 'text' tracks are any more (or less) special than any others, except perhaps audio, where getting a single sample is remarkably unhelpful.

> 
> Cheers,
> Silvia.
> 
> 
> 
> On Mon, Feb 28, 2011 at 10:34 AM, Bob Lund <B.Lund@cablelabs.com> wrote:
>> Content advisories are another type of metadata timed text track that is important for parental controls. The content advisory descriptor is embedded in various ways depending on the content transport mechanism - in MPEG-2 transport streams they are carried in a data PID. Other adaptive bit rate formats, e.g. DECE common container, can also carry these descriptors.
>> 
>> In an MPEG-2 transport stream these descriptors are sent many times a second, although the data changes infrequently - on the order of 10's or 100's of minutes. It would be desirable if the application could only be made aware of changes in the descriptor. I guess this could be done by specifying such behavior in the user agent. A more general interface might be better if this situation occurs with other track data types.
>> 
>> Regards,
>> Bob Lund
>> 
>> -----Original Message-----
>> From: Sean Hayes [mailto:Sean.Hayes@microsoft.com]
>> Sent: Friday, February 25, 2011 3:46 AM
>> To: Silvia Pfeiffer; David Singer
>> Cc: Bob Lund; public-html@w3.org
>> Subject: RE: Tech Discussions on the Multitrack Media (issue-152)
>> 
>> I'm not sure it's 100% safe to assume that <track> elements are 'relatively little data' [1], if the source was originally an image based caption format (like DVB subtitles, or DVD sub-pictures) then each caption may contain a fairly large image. When you convert that to HTML for getCueAsHTML(); it could contain <img src="data:image/png;base64,....>.
>> I guess it would be possible to treat image based caption formats as sparse video, but then there would be nowhere to put a text equivalent.
>> 
>> Sean.
>> 
>> [1] Silvia: "That's not really possible. The main feature of text tracks is that their data are sparse chunks along the timeline with relatively little data, therefore it is possible to parse all of this data into a cue list, keep it in memory and make it available as a TextTrackCueList to JS, as well as throw an event on the track when cues change, and on the activated and deactivated cues themselves. "
>> 
>> 
> 

David Singer
Multimedia and Software Standards, Apple Inc.
Received on Monday, 28 February 2011 18:48:33 UTC