Re: In-band text track captions and subtitles from Glenn Adams on 2014-06-16 (public-html@w3.org from June 2014)

From: Glenn Adams <glenn@skynav.com>
Date: Mon, 16 Jun 2014 17:07:51 -0600
To: Bob Lund <B.Lund@cablelabs.com>
Cc: Silvia Pfeiffer <silviapfeiffer1@gmail.com>, "public-html@w3.org" <public-html@w3.org>
Message-ID: <CACQ=j+dgk0_M+UEybr3F-tew0uqZfiDKaFC0-O5Zp3ONhqVb_w@mail.gmail.com>
On Mon, Jun 16, 2014 at 3:23 PM, Bob Lund <B.Lund@cablelabs.com> wrote:

>
>
> On 6/15/14, 5:32 PM, "Silvia Pfeiffer" <silviapfeiffer1@gmail.com> wrote:
>
> >[Replying only on the HTML WG to avoid cross-posting.]
> >
> >
> >
> >On Wed, Jun 11, 2014 at 7:26 AM, Bob Lund <B.Lund@cablelabs.com> wrote:
> >> In-band Tracks CG and HTML WG members,
> >>
> >> "Sourcing In-band Media Resource Tracks from Media Containers into
> >>HTML² [1]
> >> defines a method for using DataCue to expose MPEG-2 Transport Stream
> >> captions (CEA 708 [2]) and subtitles (SCTE 27 [3]). This same approach
> >>could
> >> be used for exposing Text Track Cues for other media containers that
> >>don¹t
> >> use VTTCue. Discussion during development of the definition raised some
> >> questions about TextTrack and DataCues that might benefit from
> >>discussion in
> >> these groups.
> >>
> >> - DataCue is currently defined in W3C HTML5 CR [4] for use on metadata
> >>text
> >> tracks. Does text need to be added to [4] to clarify that DataCue can be
> >> used for non-metadata text tracks?
> >
> >DataCue could be defined on text tracks of any kind
>
> Then the spec language needs to be changed to reflect this.
>
> > - in fact, we have
> >already stopped throwing errors when this happens:
> >https://www.w3.org/Bugs/Public/show_bug.cgi?id=25261 .
> >
> >
> >> - The sourcing spec [1] defines DataCue.data to contain the CEA 708 or
> >>SCTE
> >> 27 data. [2] and [3], respectively, define the rendering behavior
> >>required
> >> for these formats. Should there be a clarification in HTML specs that
> >> DataCue can be rendered by the UA as long as a rendering specification
> >>is
> >> referenced?
> >
> >It would be possible to source CEA708 captions into DataCue objects
> >and have the kind=captions and the UA render the captions according to
> >[2].
>
> Good -seems that way to me, also.
>
> > This would expose the cue content to JS, but without JS
> >developers being able to make use of the CEA708 rendering capabilities
> >of the browser. In my opinion in this case the browser should expose
> >CEA708Cue objects and the rendering abilities instead.
>
> While this might be desirable, there are several factors that need to be
> taken into account:
>
> 1) What is most critical for accessibility and regulatory reasons is that
> a mechanism exist for 708 captions to be rendered with controls for
> caption tracks to be showing or hidden/disabled.
>
> 2) The only ³spec² for 708 rendering is CEA708. So there is no defined set
> of higher level rendering capabilities.  The 708 to VTT spec [6] could be
> used but I think that needs broader consensus before it becomes the
>  ¹standard¹ 708 cue representation. This can happen but it can be done as
> a second phase of this work.
>
> 3) JS might want access to the cue for non-rendering purposes, e.g.
> searching content based on keywords/phrases. Exposing the raw 708 data
> suffices for this.
>
> 4) It's presumed that there is some intermediate, higher level form the
> captions take, prior to rendering. This needn¹t be the case, for example
> if the UA contains a hardware 708 rendering capability.
>
> IMO, we should expose 708 data as proposed - service blocks, either via
> the DataCue or a 708Cue, ASAP. We can work on a more semantically rich cue
> format if we want in parallel with that.
>
> >
> >
> >> - There may be the implication that since DataCue is currently
> >>specified for
> >> use with metadata text tracks, then ³captions" and ³subtitles" text
> >>tracks
> >> that use DataCue will never be rendered by the UA. Is language needed in
> >> HTML to clarify that non-metadata TextTracks using DataCue should be
> >> rendered according to @mode state?
> >
> >I don't think there is anything unclear about the DataCue and its
> >rendering abilities. The spec already says:
> >"The rules for updating the text track rendering for a DataCue simply
> >state that there is no rendering, even when the cues are in showing
> >mode and the text track kind is one of subtitles or captions or
> >descriptions or chapters."
> >
> >This just means that mode=showing will "overlay the cues as
> >appropriate", which in the case of DataCue means: showing nothing.
>
> I don¹t see this in the either the lastest HTML5 CR or ED. However, if
> there is consensus and spec language that precludes rendering tracks that
> expose data via DataCue, then we could define a format specific cue, e.g.
> CEA708Cue that exposes the same data, i.e. service blocks binary data.
>
>
> >
> >
> >> - The question arose whether it is ever the case where ³captions²,
> >> ³subtitles², ³descriptions² and ³chapters² text tracks would NOT be
> >>rendered
> >> by the UA. The existing definition for UA behavior seems to imply that
> >>the
> >> UA must render these types of text tracks when TextTrack.mode is set to
> >> ³showing² [5] . Does the HTML spec language need to be more explicit?
> >
> >I do wonder what to do with CEA708 captions while browsers don't
> >convert them to WebVTT to expose as VTTCue, and while they don't have
> >rendering implemented for them,
>
> I think that tracks that don¹t have rendering implemented are, by
> definition, metadata text tracks.
>

I disagree. The semantic kind of a text track is (or should be) independent
of whether or not a UA can render the track or not. The semantic kind is a
property of the content of the track or the content format of the track.

I find it troubling that folks are mixing such semantics with
implementation status.


>
> A UA that supports MPEG-2 TS media resource should be capable of rendering
> 708 captions. It may expose cue data as Œservice blocks¹, through DataCue.
> If DataCue is objectionable for some reason, then we should define a
> CEA708 Cue with a Œservice_block¹ attribute.
>
> >but while they are able to parse
> >CEA708 chunks and throw them to JavaScript. Would it make sense to use
> >kind=captions but with cues being exposed as DataCue to indicate to
> >the JS developer that they have to do the rendering manually?
>
> It seems metadata tracks exist for this purpose. The
> ŒinBandMetadataTrackDispatchType¹ can be set to identify the data as 708
> caption Œservice blocks¹.
>
> >
> >
> >> - Is it OK to have a ³captions² or ³subtitles² text track that that
> >>does not
> >> define a cue format, i.e. is only rendered by the UA?
> >
> >I think that once the browser implements rendering, a specific cue
> >format should be defined, too.
> >
> >
> >> A couple of alternatives to the use of DataCue for ³captions² and
> >> ³subtitles² text tracks were discussed.
> >>
> >> Alternative #1: Format specific ³captions² and ³subtitles² cues. A
> >>CEA708Cue
> >> and SCTE27Cue could be defined that derives from DataCue.  These format
> >> specific cues would have @data attribute that would contain the raw
> >>CEA708
> >> and SCTE27 data. Is there any advantage to such a format specific cue
> >> definition over direct use of DataCue?
> >
> >Since the browser actually renders a CEA708Cue, it would most
> >certainly have more properties parsed out from the CEA708 format than
> >just the plain data.
>
> This is implementation specific. A smartTV might use an embedded 708
> rendering capability that takes as input the 708 caption coding data.
>
> > It would, for example, know the Window and the
> >Pen attributes. A proper cue object should expose these properties
> >properly.
>
> What constitutes ³proper²? CEA708 is the only ³standard² that exists today
> so exposing cues using that syntax should be considered ³proper². Your 708
> to WebVTT mapping definition would enable exposing 708 as a VTTCue. But, I
> think broader consensus on using this is needed. There are no other
> alternatives that I am aware of.
>
> >
> >> Alternative #2: Translate MPEG-2 ³captions² and ³subtitles to WebVTT
> >>and use
> >> a derivative of VTTCue (derivative is necessary as you¹d still want to
> >>make
> >> the raw, binary cue data available). CEA 708 captions could be exposed
> >>as a
> >> VTTCue derivative according to [6]. SCTE 27 subtitles are images and no
> >> mapping to VTTCue is defined (or possible?). DVB subtitles [7] also
> >>mostly
> >> uses the image alternative and would need a mapping to WebVTT.
> >
> >I don't actually mind this.
>
> I think this is a possibility but IMO it¹s a longer term solution. We need
> a 708 captions solution before this longer term solution would be
> available. IMO, a reasonable one is point to the CEA708 spec for rendering
> requirements and expose 708 Œservice block¹ data by DataCue or something
> similar, e.g CEA708Cue.
>
> Thanks,
> Bob
>
> >Cheers,
> >Silvia.
> >
> >> Are there any other points to consider on this topic?
> >>
> >> Thanks,
> >> Bob Lund
> >>
> >> [1] http://rawgit.com/w3c/HTMLSourcingInbandTracks/master/index.html
> >> [2] Good explanation http://en.wikipedia.org/wiki/CEA-708. Non-free
> spec
> >>
> >>
> http://www.ce.org/Standards/Standard-Listings/R4-3-Television-Data-System
> >>s-Subcommittee/CEA-708-D.aspx
> >> [3] http://www.scte.org/documents/pdf/standards/SCTE_27_2011.pdf
> >> [4]
> >>
> >>
> http://www.w3.org/TR/html5/embedded-content-0.html#guidelines-for-exposin
> >>g-cues-in-various-formats-as-text-track-cues
> >> [5] http://www.w3.org/TR/html5/embedded-content-0.html#text-track-model
> >> [6]
> >>
> >>
> https://dvcs.w3.org/hg/text-tracks/raw-file/default/608toVTT/608toVTT.htm
> >>l
> >> [7]
> >>
> >>
> http://www.etsi.org/deliver/etsi_en/300700_300799/300743/01.03.01_60/en_3
> >>00743v010301p.pdf
> >>
> >>
> >>
>
>
>
Received on Monday, 16 June 2014 23:08:40 UTC