Re: Resolving TextTrackCue issues from Eric Carlson on 2013-09-04 (public-html@w3.org from September 2013)

From: Eric Carlson <eric.carlson@apple.com>
Date: Wed, 04 Sep 2013 09:56:48 -0700
To: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Cc: Jer Noble <jer.noble@apple.com>, Glenn Adams <glenn@skynav.com>, Philip Jägenstedt <philipj@opera.com>, public-html <public-html@w3.org>, Ian Hickson <ian@hixie.ch>, Bob Lund <B.Lund@cablelabs.com>
Message-id: <9B58A862-DA5E-4AEA-9487-BD019CD13D86@apple.com>

On Sep 3, 2013, at 11:48 PM, Silvia Pfeiffer <silviapfeiffer1@gmail.com> wrote:

> On Wed, Sep 4, 2013 at 2:14 PM, Eric Carlson <eric.carlson@apple.com> wrote:
>> 
>> On Sep 3, 2013, at 5:16 PM, Silvia Pfeiffer <silviapfeiffer1@gmail.com> wrote:
>> 
>>> What is the content in .text ?
>>> 
>>  The cue text.
> 
> Just to clarify: Is that a plain text version of the original cue
> content? For example, for CEA608 content from a SCC file it would just
> be the plain text of the cue stripped of all the other characters? Or
> in the case of SRT, all tags are stripped and just the plain text is
> exposed?
> 

> If you do that, then we're just pretending the in-band text track was
> actually WebVTT content.
> 
  AVFoundation converts the in-band cue data (CEA608, QTText, 3GPP timed text, etc) to plain text, which sometimes has position and style information. WebKit converts that to WebVTT. 

  This is "pretending" the in-band data is WebVTT, but does that matter? I think this is actually an advantage, both because it makes our implementation simpler and because it makes it simpler for developers.   


> 
>>>> On versions of the OS where the system frameworks do not have the
>>>> necessary API to override cue rendering, in-band tracks are part of
>>>> video.tracks so they can be enabled/disabled by script but cues are rendered
>>>> by the media engine.
>>> 
>>> In this case, I assume only the existence of the track, but not of the
>>> cues is exposed to JS? I.e. track.cues/activeCues is empty? Or are you
>>> listing fully-abstract TextTrackCue instances here to at least provide
>>> starttime/endtime to the JS devs?
>>> 
>>  Correct, the cues are not available to WebKit at all in this case.
> 
> OK. Would you consider using the abstract TextTrackCue interface of
> the WHATWG spec for exposing these cues, so JS developers can at least
> react to cue change events?
> 
  I literally meant that the cues aren't available to WebKit - it is not possible to expose anything about them.


> 
>>> And an orthogonal question: you've probably seen the Cablelabs spec
>>> for exposing MPEG-2 in-band text tracks of different types to HTML[1].
>>> Seeing as both Cablelabs and the HTML spec are trying to accommodate
>>> using in-band text tracks in HTML/JS, what do you suggest is the best
>>> way forward to specify this?
>>> 
>>> * Program Map Table: would you suggest to use VTTCue with
>>> @kind=metadata, GenericCue, or a new PMTCue interface?
>>> 
>>  It seems to me that text track content that a UA does not render itself is, by definition, metadata.
> 
> In this case, that definition works, because no @kind value matches
> the semantic content of a PMT track.
> But caption data that is not rendered is still semantically of @kind=captions.
> 
  I disagree. A generic script that sees a track with kind=captions is going to expect the UA to render those "captions" when it makes the track visible. 

  It seems clear to me that "caption data" that a UA is not able to render is metadata. This matches the definition of metadata in the spec: "Tracks intended for use from script. Not displayed by the user agent"


>> Again, it is not WebVTT data so is there an advantage to VTTCue versus old TextTrackCue interface?
> 
> Not between those two, since they are identical.
> 
> 
>>> * CEA708 track: assuming we don't want to introduce CEA708Cue, how
>>> would that best be supported?
>>> 
>>  Are any browsers planning to support 708 captions natively?
> 
> What does "support" mean?
> Parse them out of a MPEG-2 file (like the Cablelabs spec suggests) and
> expose them to JS?
> Or.. Rendering them like you do for formats that the QuickTime
> framework already supports, but without exposing the original data and
> pretending it's WebVTT?
> Or .. go all the way to exposing the format with its own features?
> 
  I meant parse and render natively.

eric

Received on Wednesday, 4 September 2013 16:57:53 UTC