Re: In-band text track captions and subtitles

[Replying only on the HTML WG to avoid cross-posting.]



On Wed, Jun 11, 2014 at 7:26 AM, Bob Lund <B.Lund@cablelabs.com> wrote:
> In-band Tracks CG and HTML WG members,
>
> "Sourcing In-band Media Resource Tracks from Media Containers into HTML” [1]
> defines a method for using DataCue to expose MPEG-2 Transport Stream
> captions (CEA 708 [2]) and subtitles (SCTE 27 [3]). This same approach could
> be used for exposing Text Track Cues for other media containers that don’t
> use VTTCue. Discussion during development of the definition raised some
> questions about TextTrack and DataCues that might benefit from discussion in
> these groups.
>
> - DataCue is currently defined in W3C HTML5 CR [4] for use on metadata text
> tracks. Does text need to be added to [4] to clarify that DataCue can be
> used for non-metadata text tracks?

DataCue could be defined on text tracks of any kind - in fact, we have
already stopped throwing errors when this happens:
https://www.w3.org/Bugs/Public/show_bug.cgi?id=25261 .


> - The sourcing spec [1] defines DataCue.data to contain the CEA 708 or SCTE
> 27 data. [2] and [3], respectively, define the rendering behavior required
> for these formats. Should there be a clarification in HTML specs that
> DataCue can be rendered by the UA as long as a rendering specification is
> referenced?

It would be possible to source CEA708 captions into DataCue objects
and have the kind=captions and the UA render the captions according to
[2]. This would expose the cue content to JS, but without JS
developers being able to make use of the CEA708 rendering capabilities
of the browser. In my opinion in this case the browser should expose
CEA708Cue objects and the rendering abilities instead.


> - There may be the implication that since DataCue is currently specified for
> use with metadata text tracks, then “captions" and “subtitles" text tracks
> that use DataCue will never be rendered by the UA. Is language needed in
> HTML to clarify that non-metadata TextTracks using DataCue should be
> rendered according to @mode state?

I don't think there is anything unclear about the DataCue and its
rendering abilities. The spec already says:
"The rules for updating the text track rendering for a DataCue simply
state that there is no rendering, even when the cues are in showing
mode and the text track kind is one of subtitles or captions or
descriptions or chapters."

This just means that mode=showing will "overlay the cues as
appropriate", which in the case of DataCue means: showing nothing.


> - The question arose whether it is ever the case where “captions”,
> “subtitles”, “descriptions” and “chapters” text tracks would NOT be rendered
> by the UA. The existing definition for UA behavior seems to imply that the
> UA must render these types of text tracks when TextTrack.mode is set to
> “showing” [5] . Does the HTML spec language need to be more explicit?

I do wonder what to do with CEA708 captions while browsers don't
convert them to WebVTT to expose as VTTCue, and while they don't have
rendering implemented for them, but while they are able to parse
CEA708 chunks and throw them to JavaScript. Would it make sense to use
kind=captions but with cues being exposed as DataCue to indicate to
the JS developer that they have to do the rendering manually?


> - Is it OK to have a “captions” or “subtitles” text track that that does not
> define a cue format, i.e. is only rendered by the UA?

I think that once the browser implements rendering, a specific cue
format should be defined, too.


> A couple of alternatives to the use of DataCue for “captions” and
> “subtitles” text tracks were discussed.
>
> Alternative #1: Format specific “captions” and “subtitles” cues. A CEA708Cue
> and SCTE27Cue could be defined that derives from DataCue.  These format
> specific cues would have @data attribute that would contain the raw CEA708
> and SCTE27 data. Is there any advantage to such a format specific cue
> definition over direct use of DataCue?

Since the browser actually renders a CEA708Cue, it would most
certainly have more properties parsed out from the CEA708 format than
just the plain data. It would, for example, know the Window and the
Pen attributes. A proper cue object should expose these properties
properly.

> Alternative #2: Translate MPEG-2 “captions” and “subtitles to WebVTT and use
> a derivative of VTTCue (derivative is necessary as you’d still want to make
> the raw, binary cue data available). CEA 708 captions could be exposed as a
> VTTCue derivative according to [6]. SCTE 27 subtitles are images and no
> mapping to VTTCue is defined (or possible?). DVB subtitles [7] also mostly
> uses the image alternative and would need a mapping to WebVTT.

I don't actually mind this.

Cheers,
Silvia.

> Are there any other points to consider on this topic?
>
> Thanks,
> Bob Lund
>
> [1] http://rawgit.com/w3c/HTMLSourcingInbandTracks/master/index.html
> [2] Good explanation http://en.wikipedia.org/wiki/CEA-708. Non-free spec
> http://www.ce.org/Standards/Standard-Listings/R4-3-Television-Data-Systems-Subcommittee/CEA-708-D.aspx
> [3] http://www.scte.org/documents/pdf/standards/SCTE_27_2011.pdf
> [4]
> http://www.w3.org/TR/html5/embedded-content-0.html#guidelines-for-exposing-cues-in-various-formats-as-text-track-cues
> [5] http://www.w3.org/TR/html5/embedded-content-0.html#text-track-model
> [6]
> https://dvcs.w3.org/hg/text-tracks/raw-file/default/608toVTT/608toVTT.html
> [7]
> http://www.etsi.org/deliver/etsi_en/300700_300799/300743/01.03.01_60/en_300743v010301p.pdf
>
>
>

Received on Sunday, 15 June 2014 23:33:28 UTC