Re: Resolving TextTrackCue issues

On Wed, Sep 4, 2013 at 2:14 PM, Eric Carlson <eric.carlson@apple.com> wrote:
>
> On Sep 3, 2013, at 5:16 PM, Silvia Pfeiffer <silviapfeiffer1@gmail.com> wrote:
>
>> On Wed, Sep 4, 2013 at 9:19 AM, Eric Carlson <eric.carlson@apple.com> wrote:
>>>
>>>> Do you expose the existance of these in-band captions somehow to JS?
>>>>
>>>> I'm concerned that if the browser renders captions automatically on
>>>> top of video without the JS developer being able to find out about it,
>>>> how would the JS developer know that there are captions and to avoid
>>>> rendering another lot themselves - or rendering something else in the
>>>> space of the captions?
>>>
>>>  On versions of the OS where it is possible for WebKit to "take over"
>>> rendering of in-band captions from the media engine, they behave just like
>>> out-of-band tracks: in-band tracks are part of the video.textTracks and the
>>> cues are part of track.cues/activeCues (when appropriate).
>>
>> So for the JS dev they are exposed as instances of the old
>> TextTrackCue interface?
>
>   Yes.
>
>> The VTTCue interface sufficiently satisfies this use case then?
>
>   True, but the in-bad cues may or may not be WebVTT originally. Is there an advantage to using VTTCue instead of the old TextTrackCue interface?

getCueAsHTML() of VTTCue interprets what's in .text as WebVTT content.
Exposing other format content in that way would not make much sense,
just as parsing HTML through a Word Doc format parser makes not much
sense. So, IMHO neither VTTCue nor the old TextTrackCue interface are
appropriate here since you're not dealing with WebVTT content
originally.


>> What is the content in .text ?
>>
>   The cue text.

Just to clarify: Is that a plain text version of the original cue
content? For example, for CEA608 content from a SCC file it would just
be the plain text of the cue stripped of all the other characters? Or
in the case of SRT, all tags are stripped and just the plain text is
exposed?

If you do that, then we're just pretending the in-band text track was
actually WebVTT content.

If you don't return plain text, but the actual original content of the
cue, you get the wrong behaviour with getCueAsHTML() and the rendering
algorithm of the old TextTrackCue or the new VTTCue interface.


>>>  On versions of the OS where the system frameworks do not have the
>>> necessary API to override cue rendering, in-band tracks are part of
>>> video.tracks so they can be enabled/disabled by script but cues are rendered
>>> by the media engine.
>>
>> In this case, I assume only the existence of the track, but not of the
>> cues is exposed to JS? I.e. track.cues/activeCues is empty? Or are you
>> listing fully-abstract TextTrackCue instances here to at least provide
>> starttime/endtime to the JS devs?
>>
>   Correct, the cues are not available to WebKit at all in this case.

OK. Would you consider using the abstract TextTrackCue interface of
the WHATWG spec for exposing these cues, so JS developers can at least
react to cue change events?


>> And an orthogonal question: you've probably seen the Cablelabs spec
>> for exposing MPEG-2 in-band text tracks of different types to HTML[1].
>> Seeing as both Cablelabs and the HTML spec are trying to accommodate
>> using in-band text tracks in HTML/JS, what do you suggest is the best
>> way forward to specify this?
>>
>> * Program Map Table: would you suggest to use VTTCue with
>> @kind=metadata, GenericCue, or a new PMTCue interface?
>>
>   It seems to me that text track content that a UA does not render itself is, by definition, metadata.

In this case, that definition works, because no @kind value matches
the semantic content of a PMT track.
But caption data that is not rendered is still semantically of @kind=captions.

> Again, it is not WebVTT data so is there an advantage to VTTCue versus old TextTrackCue interface?

Not between those two, since they are identical.


>> * CEA708 track: assuming we don't want to introduce CEA708Cue, how
>> would that best be supported?
>>
>   Are any browsers planning to support 708 captions natively?

What does "support" mean?
Parse them out of a MPEG-2 file (like the Cablelabs spec suggests) and
expose them to JS?
Or.. Rendering them like you do for formats that the QuickTime
framework already supports, but without exposing the original data and
pretending it's WebVTT?
Or .. go all the way to exposing the format with its own features?

Only the 3rd option requires specification of a new interface. The
first option requires something like the GenericCue interface, since
we need to give the original content of the cue to JS to parse.

It seems that Cablelabs expected that there would be browsers that
implement parsing, but not rendering for text tracks in MPEG-2. It
would be good if there was a statement from browser vendors (or set
top box developers that use browser rendering engines or the like) if
that is a realistic use case.

Thanks,
Silvia.

Received on Wednesday, 4 September 2013 06:48:52 UTC