Re: Resolving TextTrackCue issues from Silvia Pfeiffer on 2013-09-05 (public-html@w3.org from September 2013)

From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Date: Fri, 6 Sep 2013 00:22:50 +1000
To: Bob Lund <B.Lund@cablelabs.com>
Cc: Eric Carlson <eric.carlson@apple.com>, Jer Noble <jer.noble@apple.com>, Glenn Adams <glenn@skynav.com>, Philip Jägenstedt <philipj@opera.com>, public-html <public-html@w3.org>, Ian Hickson <ian@hixie.ch>
Message-ID: <CAHp8n2=uRaHQndomXvG7Cpg8dnXP5DpAJVrzP2MkjaxsoBNSZw@mail.gmail.com>
On Thu, Sep 5, 2013 at 1:52 AM, Bob Lund <B.Lund@cablelabs.com> wrote:
>
>
> On 9/4/13 12:48 AM, "Silvia Pfeiffer" <silviapfeiffer1@gmail.com> wrote:
>
>>On Wed, Sep 4, 2013 at 2:14 PM, Eric Carlson <eric.carlson@apple.com>
>>wrote:
>>>
>>> On Sep 3, 2013, at 5:16 PM, Silvia Pfeiffer <silviapfeiffer1@gmail.com>
>>>wrote:
>>>
>>>> On Wed, Sep 4, 2013 at 9:19 AM, Eric Carlson <eric.carlson@apple.com>
>>>>wrote:
>>>>>
>>>>>> Do you expose the existance of these in-band captions somehow to JS?
>>>>>>
>>>>>> I'm concerned that if the browser renders captions automatically on
>>>>>> top of video without the JS developer being able to find out about
>>>>>>it,
>>>>>> how would the JS developer know that there are captions and to avoid
>>>>>> rendering another lot themselves - or rendering something else in the
>>>>>> space of the captions?
>>>>>
>>>>>  On versions of the OS where it is possible for WebKit to "take over"
>>>>> rendering of in-band captions from the media engine, they behave just
>>>>>like
>>>>> out-of-band tracks: in-band tracks are part of the video.textTracks
>>>>>and the
>>>>> cues are part of track.cues/activeCues (when appropriate).
>>>>
>>>> So for the JS dev they are exposed as instances of the old
>>>> TextTrackCue interface?
>>>
>>>   Yes.
>>>
>>>> The VTTCue interface sufficiently satisfies this use case then?
>>>
>>>   True, but the in-bad cues may or may not be WebVTT originally. Is
>>>there an advantage to using VTTCue instead of the old TextTrackCue
>>>interface?
>>
>>getCueAsHTML() of VTTCue interprets what's in .text as WebVTT content.
>>Exposing other format content in that way would not make much sense,
>>just as parsing HTML through a Word Doc format parser makes not much
>>sense. So, IMHO neither VTTCue nor the old TextTrackCue interface are
>>appropriate here since you're not dealing with WebVTT content
>>originally.
>>
>>
>>>> What is the content in .text ?
>>>>
>>>   The cue text.
>>
>>Just to clarify: Is that a plain text version of the original cue
>>content? For example, for CEA608 content from a SCC file it would just
>>be the plain text of the cue stripped of all the other characters? Or
>>in the case of SRT, all tags are stripped and just the plain text is
>>exposed?
>>
>>If you do that, then we're just pretending the in-band text track was
>>actually WebVTT content.
>>
>>If you don't return plain text, but the actual original content of the
>>cue, you get the wrong behaviour with getCueAsHTML() and the rendering
>>algorithm of the old TextTrackCue or the new VTTCue interface.
>>
>>
>>>>>  On versions of the OS where the system frameworks do not have the
>>>>> necessary API to override cue rendering, in-band tracks are part of
>>>>> video.tracks so they can be enabled/disabled by script but cues are
>>>>>rendered
>>>>> by the media engine.
>>>>
>>>> In this case, I assume only the existence of the track, but not of the
>>>> cues is exposed to JS? I.e. track.cues/activeCues is empty? Or are you
>>>> listing fully-abstract TextTrackCue instances here to at least provide
>>>> starttime/endtime to the JS devs?
>>>>
>>>   Correct, the cues are not available to WebKit at all in this case.
>>
>>OK. Would you consider using the abstract TextTrackCue interface of
>>the WHATWG spec for exposing these cues, so JS developers can at least
>>react to cue change events?
>>
>>
>>>> And an orthogonal question: you've probably seen the Cablelabs spec
>>>> for exposing MPEG-2 in-band text tracks of different types to HTML[1].
>>>> Seeing as both Cablelabs and the HTML spec are trying to accommodate
>>>> using in-band text tracks in HTML/JS, what do you suggest is the best
>>>> way forward to specify this?
>>>>
>>>> * Program Map Table: would you suggest to use VTTCue with
>>>> @kind=metadata, GenericCue, or a new PMTCue interface?
>>>>
>>>   It seems to me that text track content that a UA does not render
>>>itself is, by definition, metadata.
>>
>>In this case, that definition works, because no @kind value matches
>>the semantic content of a PMT track.
>>But caption data that is not rendered is still semantically of
>>@kind=captions.
>>
>>> Again, it is not WebVTT data so is there an advantage to VTTCue versus
>>>old TextTrackCue interface?
>>
>>Not between those two, since they are identical.
>>
>>
>>>> * CEA708 track: assuming we don't want to introduce CEA708Cue, how
>>>> would that best be supported?
>>>>
>>>   Are any browsers planning to support 708 captions natively?
>>
>>What does "support" mean?
>>Parse them out of a MPEG-2 file (like the Cablelabs spec suggests) and
>>expose them to JS?
>>Or.. Rendering them like you do for formats that the QuickTime
>>framework already supports, but without exposing the original data and
>>pretending it's WebVTT?
>>Or .. go all the way to exposing the format with its own features?
>>
>>Only the 3rd option requires specification of a new interface. The
>>first option requires something like the GenericCue interface, since
>>we need to give the original content of the cue to JS to parse.
>>
>>It seems that Cablelabs expected that there would be browsers that
>>implement parsing, but not rendering for text tracks in MPEG-2. It
>>would be good if there was a statement from browser vendors (or set
>>top box developers that use browser rendering engines or the like) if
>>that is a realistic use case.
>

> We've implemented UA rendering of 708 captions - this is done by
> converting 708 to WebVTT and then taking advantage of the existing WebVTT
> rendering.

Out of curiosity: In the .text attribute of VTTCue / the old
TextTrackCue interface, are you exposing the 708 cue content or the
converted WebVTT cue content?


> The spec calls for making the caption data from the MPEG-2 TS
> available to JS via .text.

Right. Is that on a track with @kind=captions or @kind=metadata?

Thanks for your input!
Silvia.
Received on Thursday, 5 September 2013 14:23:40 UTC