Re: Resolving TextTrackCue issues from Silvia Pfeiffer on 2013-09-07 (public-html@w3.org from September 2013)

From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Date: Sat, 7 Sep 2013 15:23:48 +1000
To: Eric Carlson <eric.carlson@apple.com>
Cc: Jer Noble <jer.noble@apple.com>, Glenn Adams <glenn@skynav.com>, Philip Jägenstedt <philipj@opera.com>, public-html <public-html@w3.org>, Ian Hickson <ian@hixie.ch>, Bob Lund <B.Lund@cablelabs.com>
Message-ID: <CAHp8n2mBbn4RM8iXRPhtKP6_rMntZmTRCB-BzLd7RhGrDVtEEA@mail.gmail.com>

On Fri, Sep 6, 2013 at 2:08 AM, Eric Carlson <eric.carlson@apple.com> wrote:
>
> On Sep 5, 2013, at 7:39 AM, Silvia Pfeiffer <silviapfeiffer1@gmail.com> wrote:
>
>> On Thu, Sep 5, 2013 at 2:56 AM, Eric Carlson <eric.carlson@apple.com> wrote:
>>>
>>> On Sep 3, 2013, at 11:48 PM, Silvia Pfeiffer <silviapfeiffer1@gmail.com> wrote:
>>>
>>>> On Wed, Sep 4, 2013 at 2:14 PM, Eric Carlson <eric.carlson@apple.com> wrote:
>>>>>
>>>>> On Sep 3, 2013, at 5:16 PM, Silvia Pfeiffer <silviapfeiffer1@gmail.com> wrote:
>>>>>
>>>>>> What is the content in .text ?
>>>>>>
>>>>> The cue text.
>>>>
>>>> Just to clarify: Is that a plain text version of the original cue
>>>> content? For example, for CEA608 content from a SCC file it would just
>>>> be the plain text of the cue stripped of all the other characters? Or
>>>> in the case of SRT, all tags are stripped and just the plain text is
>>>> exposed?
>>>>
>>>
>>>> If you do that, then we're just pretending the in-band text track was
>>>> actually WebVTT content.
>>>>
>>>  AVFoundation converts the in-band cue data (CEA608, QTText, 3GPP timed text, etc) to plain text, which sometimes has position and style information. WebKit converts that to WebVTT.
>>>
>>>  This is "pretending" the in-band data is WebVTT, but does that matter? I think this is actually an advantage, both because it makes our implementation simpler and because it makes it simpler for developers.
>>
>> I'd be happy if everything was exposed as WebVTT. However, that also
>> requires that the cue content (and not just the attributes) are
>> converted to WebVTT format, unless they are @kind=metadata.
>>
>> Where you have styling information associated with the cue content
>> (italics, bold, color, etc), are you also converting the cue text to
>> WebVTT and thus exposing that in .text for rendered cues?
>>
>   Yes, where the styling information is one of the WebVTT cue components.
>
>
>>
>>
>>>>>> And an orthogonal question: you've probably seen the Cablelabs spec
>>>>>> for exposing MPEG-2 in-band text tracks of different types to HTML[1].
>>>>>> Seeing as both Cablelabs and the HTML spec are trying to accommodate
>>>>>> using in-band text tracks in HTML/JS, what do you suggest is the best
>>>>>> way forward to specify this?
>>>>>>
>>>>>> * Program Map Table: would you suggest to use VTTCue with
>>>>>> @kind=metadata, GenericCue, or a new PMTCue interface?
>>>>>>
>>>>> It seems to me that text track content that a UA does not render itself is, by definition, metadata.
>>>>
>>>> In this case, that definition works, because no @kind value matches
>>>> the semantic content of a PMT track.
>>>> But caption data that is not rendered is still semantically of @kind=captions.
>>>>
>>>  I disagree. A generic script that sees a track with kind=captions is going to expect the UA to render those "captions" when it makes the track visible.
>>>
>>>  It seems clear to me that "caption data" that a UA is not able to render is metadata. This matches the definition of metadata in the spec: "Tracks intended for use from script. Not displayed by the user agent"
>>
>> So would the "in-band metadata track dispatch type" attribute tell the
>> JS developer what is really in this track?
>>
>> For example, TTML in MP4, not supported by the browser, not rendered
>> by the browser, but able to be extracted from MP4 by the browser would
>> in your suggestion end up as VTTCue objects of @kind=metadata with
>> @inBandMetadataTrackDispatchType conveying some information about it
>> being captions?
>>
>   Yes, exactly.


That's not what
http://www.w3.org/html/wg/drafts/html/master/embedded-content-0.html#steps-to-expose-a-media-resource-specific-text-track
says. It requires setting the new text track's kind based on the
semantics of the relevant data (step 2).

If we follow your argument that a cue format for which browsers don't
support rendering would be regarded as @kind=metadata with
@inBandMetadataTrackDispatchType conveying the information that it's
semantically caption data and that it's in format x (e.g.
x=ttml-intermediate), then it could safely be supported by VTTCue
alone (even if that's a lie from a semantic POV) and we can remove
GenericCue.

Is that what all browser devs want?

Note that this requires updating the algorithm at
http://www.w3.org/html/wg/drafts/html/master/embedded-content-0.html#steps-to-expose-a-media-resource-specific-text-track
.

Cheers,
Silvia.

Received on Saturday, 7 September 2013 05:24:35 UTC