Re: TextTrack API changes from Silvia Pfeiffer on 2013-05-13 (public-html@w3.org from May 2013)

From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Date: Mon, 13 May 2013 12:47:47 +1000
To: Glenn Adams <glenn@skynav.com>
Cc: Simon Pieters <simonp@opera.com>, Bob Lund <B.Lund@cablelabs.com>, public-html <public-html@w3.org>, "Jerry Smith, (WINDOWS)" <jdsmith@microsoft.com>, "Mark Vickers @ Comcast" <mark_vickers@cable.comcast.com>
Message-ID: <CAHp8n2m03dsonbFL=5112en4OKWuVQPr82xe4ukG3PSpkCSMwg@mail.gmail.com>
On Mon, May 13, 2013 at 5:00 AM, Glenn Adams <glenn@skynav.com> wrote:
>
> First, I'm talking about the Media Type of a text track resource here, not a specific @kind (usage) of a text track resource. For example, "text/vtt",
> "application/ttml+xml", "application/x-mpeg2-psi" [I just made that up], etc.

OK, this is taking the discussion into a completely different and
unrelated direction, because we were discussing TextTrackCue and not
TextTrack types. Also, the changes you are proposing below are not
possible because <track> is an empty element and we are not going to
break backwards compatibility on the markup. But I'll entertain the
discussion of the use cases that they imply rather than the particular
specification proposal.

I'm still curious about the one question I had before: are you or
anyone else aware of any implementations of the
inBandMetadataTrackDispatchType attribute? Since it's not even used in
http://www.cablelabs.com/specifications/CL-SP-HTML5-MAP-I02-120510.pdf
but instead @label is used, I don't know if it's satisfying its use
case.


> Now, let me try to be more concrete regarding uses:
>
> Ideally, <track> would use <source> in the same fashion as <video> and
> <audio>, in order to allow use of the resource selection algorithm for
> alternate track resources:
>
> <video src="video.mp2t">
>   <!-- in- or out-of-band captions, three alternative sources -->
>   <track kind="captions">
>     <!-- out-of-band VTT -->
>     <source src="video.vtt" type="text/vtt">
>     <!-- out-of-band TTML -->
>     <source src="video.ttml" type="application/ttml+xml">
>     <!-- in-band 708 -->
>     <source src="video.mp2t" type="application/x-cea-708">
>   </track>
>   <!-- in-band MPEG-2 PSI, only one source -->
>   <track kind="metadata" src="video.mp2t" type="application/x-mpeg2-psi" />
>   <!-- out-of-band custom metadata, two alternative sources -->
>   <track kind="metadata">
>     <!-- out-of-band custom metadata, type 1 -->
>     <source src="video.md1" type="application/x-metadata-1">
>     <!-- out-of-band custom metadata, type 2, in case type 1 not supported
> -->
>     <source src="video.md2" type="application/x-metadata-2">
>   </track>
> </video>

This is overly complicated and not necessary to do in markup, because
you get all of this in the JavaScript TextTrack API. Plus you do not
have to deal with special cases like the inband text tracks - markup
is the wrong approach for this. As for the suggestion of doing
<source> inside <track> - that is not necessary, because all supported
track formats are exposed in a track list to JS or even the user -
this is contrary to <video> where only a single @src is always active.
You can, however, achieve all the use cases that you are trying to
emulate in the complex markup above in JS right now.

Here's how it's done with the current spec:

Markup:
<video src="video.mp2t">
  <track kind="captions" src="video.vtt">
  <track kind="captions" src="video.ttml">
  <track kind="metadata" src="video.md1">
  <track kind="metadata" src="video.md2">
</video>

JavaScript:

Assuming the browser can parse the following file formats:
* mp2t video file
* VTT  file
* mp2t cea-708 inband  track
* mp2t mpeg2-psi inband  track
* md2 metadata file
But is unable to parse:
* TTML file
* md1 metadata file

The following objects are available in JavaScript:
* for the WebVTT track:
TextTrack(kind="captions", cues=TextTrackCueList,...)
(the TextTrackCues in the TextTrackCueList are of type WebVTTCue)

* for the TTML track (because there is no support for the format):
TextTrack(kind="captions", cues=null,...)

* for the mp2t cea-708 inband  track:
TextTrack(kind="captions", cues=TextTrackCueList,...)
(the TextTrackCues in the TextTrackCueList are of type CEA708Cue)

* for the mp2t mpeg2-psi inband  track
TextTrack(kind="metadata", cues=TextTrackCueList,...)
per spec with a inBandMetadataTrackDispatchType containing the
stream_type and the descriptor bytes
likely accompanied with a label="program description" or something
similar that explains to the user what they will get when they choose
this track
(the TextTrackCues in the TextTrackCueList are generic so just
TextTrackCue objects, but could also be more specific PSICue if the
browser supports such)

* for the md1 track (because there is no support for the format):
TextTrack(kind="metadata", cues=null,...)

* for the md2 track:
TextTrack(kind="metadata", cues=TextTrackCueList,...)
(the TextTrackCues in the TextTrackCueList are generic so just
TextTrackCue objects)

As a JS developer, you can now decide which of the tracks to expose to
the user and could just loop through the video.textTracks list and
remove those tracks that you don't want them to see. E.g. you can
remove all those that have no cues, which still provides the users
with a choice as to whether to see the 708 captions or the WebVTT
captions. You would also parse the cues in the metadata according to
what you know them to be.

So, this is how it currently goes. In all cases, the JS developer does
not need to know what file format the text track is provided in,
because if the UA can parse it, it will expose it in JS with cues and
if it can't, then it can't expose cues anyway. So, I am not concerned
about the file formats in which text tracks are provided.

What concerns me, though, is the format of the individual cues.


> Using this mechanism, the UA fetches track resources according to what track
> media types it supports and what resources are actually resolvable.
>
> Once it has resolved a track's alternate source references to an actual
> resource (whether out-of-band or in-band), the UA determines the actual
> content type of the resource (when it sniffs/parses it).
>
> So, let's say that:
>
> (1) HTMLSourceElement.type (or HTMLTrackElement.type) returns the advisory
> (hint) author supplied type (may or may not be the resolved type); and

The file format type? That's irrelevant as explained above.


> (2) HTLTrackElement.track.type returns the actual (sniffed/parsed) type as
> determined by the UA and selected by the resource selection algorithm;
>
> Why is this useful? Because it could help the client JS code to determine
> things like:
>
> what possible interface types are supported by a cue instance that the UA
> constructs for that type;
>
> what possible different formats may be returned from TextTrackCue.text;

Now you are arguing for cue format types and not file format types. I
agree with providing a hint for these, which is why I suggested making
inBandMetadataTrackDispatchType more generic and calling it cueType
and having browsers expose these where available.


> Now, for the case where client JS wants to construct a track, then
> HTMLMediaElement.addTextTrack (possibly renamed to createTextTrack) should
> support an optional type parameter which is used to initialize
> TextTrack.type, and subsequently, TextTrack.type is used to constrain the
> type(s) of cues constructed by a TextTrack.createCue method or constrain the
> type(s) of cues that can be added via TextTrack.addCue.

s/type/cueType/ and we are basically arguing for the same thing.
Except, my proposal is to set the cueType by the browser to "generic
text", which will be replaced with a more specific cue object (e.g.
WebVTTCue or CEA708Cue) on the first addition of a cue of such type,
after which only cues of that type are allowed to be added. The
cueType is also a hint that the browser can set for metadata tracks if
it knows some more about the content of the track but doesn't have an
actual parser.

Basically, what I'd like to make possible is:
track = video.addTextTrack('metadata', myLabel', 'en');
cue0 = new TextTrackCue(0, 5, '{cue: content}');
track.addCue(cue0);

and even

track = video.addTextTrack('subtitles', myLabel', 'en');
cue0 = new TextTrackCue(0, 5, 'this is a subtitle');
track.addCue(cue0);

This is not currently possible because the TextTrackCue constructor is
gone, but I can see these as the use case to add it back.


Cheers,
Silvia.
Received on Monday, 13 May 2013 02:48:35 UTC