Re: Tech Discussions on the Multitrack Media (issue-152) from Silvia Pfeiffer on 2011-02-28 (public-html@w3.org from February 2011)

From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Date: Mon, 28 Feb 2011 11:39:03 +1100
To: Bob Lund <B.Lund@cablelabs.com>
Cc: Sean Hayes <Sean.Hayes@microsoft.com>, David Singer <singer@apple.com>, "public-html@w3.org" <public-html@w3.org>
Message-ID: <AANLkTimxyzLgY=VPcEbHFPqpyLohSu_om+pLYi7xj=Wz@mail.gmail.com>

For both of the below to work, the browser has to implement an
extraction of the cues from the binary media resource into a TextTrack
with TextTrackCues, which then exposes it to the browser and to
JavaScript. Since by default only TextTracks of @kind=subtitles or
@kind=captions are rendered and these can only contain raw text, it
probably makes sense to declare both the content advisory descriptor
and the image-based caption as @kind=metadata.

In the case of the content advisory descriptor, I would expect the
browser to indeed only expose the change of the descriptor in a new
cue. Anything else is just repeated information.

In the case of image-based captions, I am uncertain how the images can
actually be exposed to the browser. One way would be to structure the
cue's text content in such a way that there is a text bit and then
there is a base64 encoded text bit that can be extracted in JavaScript
and displayed. Or alternatively it might be possible to introduce a
service on the server that does this extraction through a URL, e.g.
http://example.com/example_video.mp4?get_caption_image#lang=en&t=20
(or something more nicely engineered). Then this url can be inferred
by the JavaScript parser of the related @kind=metadata track. Since
this involves some sort of server-side service, I am not sure what the
browser would expose in the actual TextTrack, but maybe it can just
extract the metadata of the track and put that in.

Anyway, these two are special use cases which IIUC rely on special
track types. I am not sure how much of these special track types
browser vendors actually would want to implement support for. I expect
that Web developers may instead consider just using external WebVTT
files with @kind=metadata and their own special markup to make such
things work across browsers an media formats.

Cheers,
Silvia.

On Mon, Feb 28, 2011 at 10:34 AM, Bob Lund <B.Lund@cablelabs.com> wrote:
> Content advisories are another type of metadata timed text track that is important for parental controls. The content advisory descriptor is embedded in various ways depending on the content transport mechanism - in MPEG-2 transport streams they are carried in a data PID. Other adaptive bit rate formats, e.g. DECE common container, can also carry these descriptors.
>
> In an MPEG-2 transport stream these descriptors are sent many times a second, although the data changes infrequently - on the order of 10's or 100's of minutes. It would be desirable if the application could only be made aware of changes in the descriptor. I guess this could be done by specifying such behavior in the user agent. A more general interface might be better if this situation occurs with other track data types.
>
> Regards,
> Bob Lund
>
> -----Original Message-----
> From: Sean Hayes [mailto:Sean.Hayes@microsoft.com]
> Sent: Friday, February 25, 2011 3:46 AM
> To: Silvia Pfeiffer; David Singer
> Cc: Bob Lund; public-html@w3.org
> Subject: RE: Tech Discussions on the Multitrack Media (issue-152)
>
> I'm not sure it's 100% safe to assume that <track> elements are 'relatively little data' [1], if the source was originally an image based caption format (like DVB subtitles, or DVD sub-pictures) then each caption may contain a fairly large image. When you convert that to HTML for getCueAsHTML(); it could contain <img src="data:image/png;base64,....>.
> I guess it would be possible to treat image based caption formats as sparse video, but then there would be nowhere to put a text equivalent.
>
> Sean.
>
> [1] Silvia: "That's not really possible. The main feature of text tracks is that their data are sparse chunks along the timeline with relatively little data, therefore it is possible to parse all of this data into a cue list, keep it in memory and make it available as a TextTrackCueList to JS, as well as throw an event on the track when cues change, and on the activated and deactivated cues themselves. "
>
>

Received on Monday, 28 February 2011 00:39:55 UTC