Re: Resolving TextTrackCue issues from Bob Lund on 2013-09-04 (public-html@w3.org from September 2013)

From: Bob Lund <B.Lund@CableLabs.com>
Date: Wed, 4 Sep 2013 15:52:19 +0000
To: Silvia Pfeiffer <silviapfeiffer1@gmail.com>, Eric Carlson <eric.carlson@apple.com>
CC: Jer Noble <jer.noble@apple.com>, Glenn Adams <glenn@skynav.com>, Philip Jägenstedt <philipj@opera.com>, public-html <public-html@w3.org>, Ian Hickson <ian@hixie.ch>
Message-ID: <CE4CB285.33756%b.lund@cablelabs.com>
On 9/4/13 12:48 AM, "Silvia Pfeiffer" <silviapfeiffer1@gmail.com> wrote:

>On Wed, Sep 4, 2013 at 2:14 PM, Eric Carlson <eric.carlson@apple.com>
>wrote:
>>
>> On Sep 3, 2013, at 5:16 PM, Silvia Pfeiffer <silviapfeiffer1@gmail.com>
>>wrote:
>>
>>> On Wed, Sep 4, 2013 at 9:19 AM, Eric Carlson <eric.carlson@apple.com>
>>>wrote:
>>>>
>>>>> Do you expose the existance of these in-band captions somehow to JS?
>>>>>
>>>>> I'm concerned that if the browser renders captions automatically on
>>>>> top of video without the JS developer being able to find out about
>>>>>it,
>>>>> how would the JS developer know that there are captions and to avoid
>>>>> rendering another lot themselves - or rendering something else in the
>>>>> space of the captions?
>>>>
>>>>  On versions of the OS where it is possible for WebKit to "take over"
>>>> rendering of in-band captions from the media engine, they behave just
>>>>like
>>>> out-of-band tracks: in-band tracks are part of the video.textTracks
>>>>and the
>>>> cues are part of track.cues/activeCues (when appropriate).
>>>
>>> So for the JS dev they are exposed as instances of the old
>>> TextTrackCue interface?
>>
>>   Yes.
>>
>>> The VTTCue interface sufficiently satisfies this use case then?
>>
>>   True, but the in-bad cues may or may not be WebVTT originally. Is
>>there an advantage to using VTTCue instead of the old TextTrackCue
>>interface?
>
>getCueAsHTML() of VTTCue interprets what's in .text as WebVTT content.
>Exposing other format content in that way would not make much sense,
>just as parsing HTML through a Word Doc format parser makes not much
>sense. So, IMHO neither VTTCue nor the old TextTrackCue interface are
>appropriate here since you're not dealing with WebVTT content
>originally.
>
>
>>> What is the content in .text ?
>>>
>>   The cue text.
>
>Just to clarify: Is that a plain text version of the original cue
>content? For example, for CEA608 content from a SCC file it would just
>be the plain text of the cue stripped of all the other characters? Or
>in the case of SRT, all tags are stripped and just the plain text is
>exposed?
>
>If you do that, then we're just pretending the in-band text track was
>actually WebVTT content.
>
>If you don't return plain text, but the actual original content of the
>cue, you get the wrong behaviour with getCueAsHTML() and the rendering
>algorithm of the old TextTrackCue or the new VTTCue interface.
>
>
>>>>  On versions of the OS where the system frameworks do not have the
>>>> necessary API to override cue rendering, in-band tracks are part of
>>>> video.tracks so they can be enabled/disabled by script but cues are
>>>>rendered
>>>> by the media engine.
>>>
>>> In this case, I assume only the existence of the track, but not of the
>>> cues is exposed to JS? I.e. track.cues/activeCues is empty? Or are you
>>> listing fully-abstract TextTrackCue instances here to at least provide
>>> starttime/endtime to the JS devs?
>>>
>>   Correct, the cues are not available to WebKit at all in this case.
>
>OK. Would you consider using the abstract TextTrackCue interface of
>the WHATWG spec for exposing these cues, so JS developers can at least
>react to cue change events?
>
>
>>> And an orthogonal question: you've probably seen the Cablelabs spec
>>> for exposing MPEG-2 in-band text tracks of different types to HTML[1].
>>> Seeing as both Cablelabs and the HTML spec are trying to accommodate
>>> using in-band text tracks in HTML/JS, what do you suggest is the best
>>> way forward to specify this?
>>>
>>> * Program Map Table: would you suggest to use VTTCue with
>>> @kind=metadata, GenericCue, or a new PMTCue interface?
>>>
>>   It seems to me that text track content that a UA does not render
>>itself is, by definition, metadata.
>
>In this case, that definition works, because no @kind value matches
>the semantic content of a PMT track.
>But caption data that is not rendered is still semantically of
>@kind=captions.
>
>> Again, it is not WebVTT data so is there an advantage to VTTCue versus
>>old TextTrackCue interface?
>
>Not between those two, since they are identical.
>
>
>>> * CEA708 track: assuming we don't want to introduce CEA708Cue, how
>>> would that best be supported?
>>>
>>   Are any browsers planning to support 708 captions natively?
>
>What does "support" mean?
>Parse them out of a MPEG-2 file (like the Cablelabs spec suggests) and
>expose them to JS?
>Or.. Rendering them like you do for formats that the QuickTime
>framework already supports, but without exposing the original data and
>pretending it's WebVTT?
>Or .. go all the way to exposing the format with its own features?
>
>Only the 3rd option requires specification of a new interface. The
>first option requires something like the GenericCue interface, since
>we need to give the original content of the cue to JS to parse.
>
>It seems that Cablelabs expected that there would be browsers that
>implement parsing, but not rendering for text tracks in MPEG-2. It
>would be good if there was a statement from browser vendors (or set
>top box developers that use browser rendering engines or the like) if
>that is a realistic use case.

We've implemented UA rendering of 708 captions - this is done by
converting 708 to WebVTT and then taking advantage of the existing WebVTT
rendering. The spec calls for making the caption data from the MPEG-2 TS
available to JS via .text.

>
>Thanks,
>Silvia.
Received on Wednesday, 4 September 2013 15:53:30 UTC