Re: TextTrack API changes from Bob Lund on 2013-05-13 (public-html@w3.org from May 2013)

From: Bob Lund <B.Lund@CableLabs.com>
Date: Mon, 13 May 2013 20:06:33 +0000
To: Silvia Pfeiffer <silviapfeiffer1@gmail.com>, Glenn Adams <glenn@skynav.com>
CC: Simon Pieters <simonp@opera.com>, public-html <public-html@w3.org>, "Jerry Smith, (WINDOWS)" <jdsmith@microsoft.com>, "Mark Vickers @ Comcast" <mark_vickers@cable.comcast.com>
Message-ID: <CDB6A360.2C602%b.lund@cablelabs.com>
On 5/12/13 8:47 PM, "Silvia Pfeiffer" <silviapfeiffer1@gmail.com> wrote:

>On Mon, May 13, 2013 at 5:00 AM, Glenn Adams <glenn@skynav.com> wrote:
>>
>> First, I'm talking about the Media Type of a text track resource here,
>>not a specific @kind (usage) of a text track resource. For example,
>>"text/vtt",
>> "application/ttml+xml", "application/x-mpeg2-psi" [I just made that
>>up], etc.
>
>OK, this is taking the discussion into a completely different and
>unrelated direction, because we were discussing TextTrackCue and not
>TextTrack types. Also, the changes you are proposing below are not
>possible because <track> is an empty element and we are not going to
>break backwards compatibility on the markup. But I'll entertain the
>discussion of the use cases that they imply rather than the particular
>specification proposal.
>
>I'm still curious about the one question I had before: are you or
>anyone else aware of any implementations of the
>inBandMetadataTrackDispatchType attribute? Since it's not even used in
>http://www.cablelabs.com/specifications/CL-SP-HTML5-MAP-I02-120510.pdf
>but instead @label is used, I don't know if it's satisfying its use
>case.

This document was written before the definition of the dispatch type
attribute. The spec will be rev'd to make use of this new attribute.

>
>
>> Now, let me try to be more concrete regarding uses:
>>
>> Ideally, <track> would use <source> in the same fashion as <video> and
>> <audio>, in order to allow use of the resource selection algorithm for
>> alternate track resources:
>>
>> <video src="video.mp2t">
>>   <!-- in- or out-of-band captions, three alternative sources -->
>>   <track kind="captions">
>>     <!-- out-of-band VTT -->
>>     <source src="video.vtt" type="text/vtt">
>>     <!-- out-of-band TTML -->
>>     <source src="video.ttml" type="application/ttml+xml">
>>     <!-- in-band 708 -->
>>     <source src="video.mp2t" type="application/x-cea-708">
>>   </track>
>>   <!-- in-band MPEG-2 PSI, only one source -->
>>   <track kind="metadata" src="video.mp2t"
>>type="application/x-mpeg2-psi" />
>>   <!-- out-of-band custom metadata, two alternative sources -->
>>   <track kind="metadata">
>>     <!-- out-of-band custom metadata, type 1 -->
>>     <source src="video.md1" type="application/x-metadata-1">
>>     <!-- out-of-band custom metadata, type 2, in case type 1 not
>>supported
>> -->
>>     <source src="video.md2" type="application/x-metadata-2">
>>   </track>
>> </video>
>
>This is overly complicated and not necessary to do in markup, because
>you get all of this in the JavaScript TextTrack API. Plus you do not
>have to deal with special cases like the inband text tracks - markup
>is the wrong approach for this. As for the suggestion of doing
><source> inside <track> - that is not necessary, because all supported
>track formats are exposed in a track list to JS or even the user -
>this is contrary to <video> where only a single @src is always active.
>You can, however, achieve all the use cases that you are trying to
>emulate in the complex markup above in JS right now.
>
>Here's how it's done with the current spec:
>
>Markup:
><video src="video.mp2t">
>  <track kind="captions" src="video.vtt">
>  <track kind="captions" src="video.ttml">
>  <track kind="metadata" src="video.md1">
>  <track kind="metadata" src="video.md2">
></video>
>
>JavaScript:
>
>Assuming the browser can parse the following file formats:
>* mp2t video file
>* VTT  file
>* mp2t cea-708 inband  track
>* mp2t mpeg2-psi inband  track
>* md2 metadata file
>But is unable to parse:
>* TTML file
>* md1 metadata file
>
>The following objects are available in JavaScript:
>* for the WebVTT track:
>TextTrack(kind="captions", cues=TextTrackCueList,...)
>(the TextTrackCues in the TextTrackCueList are of type WebVTTCue)
>
>* for the TTML track (because there is no support for the format):
>TextTrack(kind="captions", cues=null,...)
>
>* for the mp2t cea-708 inband  track:
>TextTrack(kind="captions", cues=TextTrackCueList,...)
>(the TextTrackCues in the TextTrackCueList are of type CEA708Cue)
>
>* for the mp2t mpeg2-psi inband  track
>TextTrack(kind="metadata", cues=TextTrackCueList,...)
>per spec with a inBandMetadataTrackDispatchType containing the
>stream_type and the descriptor bytes
>likely accompanied with a label="program description" or something
>similar that explains to the user what they will get when they choose
>this track
>(the TextTrackCues in the TextTrackCueList are generic so just
>TextTrackCue objects, but could also be more specific PSICue if the
>browser supports such)
>
>* for the md1 track (because there is no support for the format):
>TextTrack(kind="metadata", cues=null,...)
>
>* for the md2 track:
>TextTrack(kind="metadata", cues=TextTrackCueList,...)
>(the TextTrackCues in the TextTrackCueList are generic so just
>TextTrackCue objects)
>
>As a JS developer, you can now decide which of the tracks to expose to
>the user and could just loop through the video.textTracks list and
>remove those tracks that you don't want them to see. E.g. you can
>remove all those that have no cues, which still provides the users
>with a choice as to whether to see the 708 captions or the WebVTT
>captions. You would also parse the cues in the metadata according to
>what you know them to be.
>
>So, this is how it currently goes. In all cases, the JS developer does
>not need to know what file format the text track is provided in,
>because if the UA can parse it, it will expose it in JS with cues and
>if it can't, then it can't expose cues anyway. So, I am not concerned
>about the file formats in which text tracks are provided.
>
>What concerns me, though, is the format of the individual cues.
>
>
>> Using this mechanism, the UA fetches track resources according to what
>>track
>> media types it supports and what resources are actually resolvable.
>>
>> Once it has resolved a track's alternate source references to an actual
>> resource (whether out-of-band or in-band), the UA determines the actual
>> content type of the resource (when it sniffs/parses it).
>>
>> So, let's say that:
>>
>> (1) HTMLSourceElement.type (or HTMLTrackElement.type) returns the
>>advisory
>> (hint) author supplied type (may or may not be the resolved type); and
>
>The file format type? That's irrelevant as explained above.
>
>
>> (2) HTLTrackElement.track.type returns the actual (sniffed/parsed) type
>>as
>> determined by the UA and selected by the resource selection algorithm;
>>
>> Why is this useful? Because it could help the client JS code to
>>determine
>> things like:
>>
>> what possible interface types are supported by a cue instance that the
>>UA
>> constructs for that type;
>>
>> what possible different formats may be returned from TextTrackCue.text;
>
>Now you are arguing for cue format types and not file format types. I
>agree with providing a hint for these, which is why I suggested making
>inBandMetadataTrackDispatchType more generic and calling it cueType
>and having browsers expose these where available.
>
>
>> Now, for the case where client JS wants to construct a track, then
>> HTMLMediaElement.addTextTrack (possibly renamed to createTextTrack)
>>should
>> support an optional type parameter which is used to initialize
>> TextTrack.type, and subsequently, TextTrack.type is used to constrain
>>the
>> type(s) of cues constructed by a TextTrack.createCue method or
>>constrain the
>> type(s) of cues that can be added via TextTrack.addCue.
>
>s/type/cueType/ and we are basically arguing for the same thing.
>Except, my proposal is to set the cueType by the browser to "generic
>text", which will be replaced with a more specific cue object (e.g.
>WebVTTCue or CEA708Cue) on the first addition of a cue of such type,
>after which only cues of that type are allowed to be added. The
>cueType is also a hint that the browser can set for metadata tracks if
>it knows some more about the content of the track but doesn't have an
>actual parser.
>
>Basically, what I'd like to make possible is:
>track = video.addTextTrack('metadata', myLabel', 'en');
>cue0 = new TextTrackCue(0, 5, '{cue: content}');
>track.addCue(cue0);
>
>and even
>
>track = video.addTextTrack('subtitles', myLabel', 'en');
>cue0 = new TextTrackCue(0, 5, 'this is a subtitle');
>track.addCue(cue0);
>
>This is not currently possible because the TextTrackCue constructor is
>gone, but I can see these as the use case to add it back.
>
>
>Cheers,
>Silvia.
Received on Monday, 13 May 2013 20:07:43 UTC