Re: A new proposal for how to deal with text track cues

On Fri, 14 Jun 2013 09:30:51 +0200, Silvia Pfeiffer  
<silviapfeiffer1@gmail.com> wrote:

> On Fri, Jun 14, 2013 at 4:40 PM, Philip Jägenstedt <philipj@opera.com>  
> wrote:
>> On Fri, 14 Jun 2013 06:45:08 +0200, Silvia Pfeiffer
>> <silviapfeiffer1@gmail.com> wrote:
>>
>>> On Thu, Jun 13, 2013 at 6:07 PM, Philip Jägenstedt <philipj@opera.com>
>>> wrote:
>>>>
>>>> On Thu, 13 Jun 2013 09:50:57 +0200, Silvia Pfeiffer
>>>> <silviapfeiffer1@gmail.com> wrote:
>>>>
>>>>> On Thu, Jun 13, 2013 at 5:02 PM, Philip Jägenstedt  
>>>>> <philipj@opera.com>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> I don't think trying to split the interfaces for WebVTT cues make
>>>>>> sense,
>>>>>> since the kind can be changed dynamically,
>>>>>
>>>>>
>>>>>
>>>>> The @kind is a readonly attribute - I don't think it can be changed
>>>>> dynamically. If you know of a way to change the @kind dynamically, I
>>>>> think we should stop that possibility. I don't think it makes sense
>>>>> to, e.g., have cues with image content be able to dynamically convert
>>>>> to descriptions or other types of cue content.
>>>>
>>>>
>>>>
>>>> HTMLTrackElement.track is writable, and TextTrack.kind "must return  
>>>> the
>>>> text
>>>> track kind of the text track that the TextTrack object represents."  
>>>> so in
>>>> effect TextTrack.kind can change at any time.
>>>
>>>
>>> Right, that's a problem. I suggest making it readonly, or
>>> alternatively have that re-parse the whole file and re-build a new
>>> TextTrack object.
>>
>>
>> It's not a problem for implementors, in fact it's simpler.
>
> I'm confused: are you saying what I'm suggesting would be simpler to
> implement or what's currently there is simpler?

First, some sloppy reasoning on my part. I assumed that the solution would  
be to redefine TextTrack.kind to something like "must return the text  
track kind COPIED AT PARSE TIME from the text track that the TextTrack  
object represents" but you never said that.

Letting TextTrack.kind return HTMLTrackElement.kind (I typo'd that above)  
is not a problem. I was too quick to say that it is simpler, because I  
didn't realize precisely what your suggestion was.

>> Having TextTrack.kind frozen at parse time and potentially in  
>> disagreement
>> with the associated HTMLTextTrack.kind also seems like potential source  
>> of
>> confusion, albeit not a huge one.
>
> What I'm suggesting would solve that, right?

Currently TextTrack.kind and HTMLTrackElement.kind will always be in sync,  
which is good. Adding kind to "Whenever a track element has its src  
attribute set, changed, or removed, the user agent must synchronously  
empty the element's text track's text track list of cues." is a simple  
change and would also keep them in sync. However...

>> Making the parser depend on attributes on the track element is  
>> unnecessary
>> coupling, and requiring later re-parsing means that the WebVTT file  
>> must be
>> pinned in cache, even if the HTTP cache headers don't approve. Note also
>> that re-parsing would throw away all existing cues together with any
>> modifications made by scripts.
>
> I think those are all positive consequences: changing the @kind on a
> <track> should not become something that programmers frequently use -
> I don't see a common use case for it. If it requires re-fetching the
> WebVTT file, then so be it. And re-parsing makes sense, because you
> may have made changes because you thought the cues were of a
> particular type, but they are not, so it's better to reset that.

The way I see it, re-parsing serves no purpose, because the WebVTT file is  
still the same and will be parsed into the same result, it's just the  
interpretation of the resulting cues that is different between kinds. This  
looks like clean layering to me, is it unsightly from some other  
perspective?

>>>>>> and the actual content of a
>>>>>> WebVTT file can be the same regardless of the kind. For example,  
>>>>>> what
>>>>>> would
>>>>>> happen if the kind is changed while the WebVTT files is being  
>>>>>> received
>>>>>> and
>>>>>> parsed? And if the kind is changed later, should the file be  
>>>>>> re-parsed?
>>>>>
>>>>>
>>>>>
>>>>> If you wanted cues to end up on a track of a different kind, you'd
>>>>> have to copy them to a different cue type first and then add them to
>>>>> that track. I think that's a reasonable requirement given the vastly
>>>>> different types of content that can end up in a cue.
>>>>
>>>>
>>>>
>>>> But they're not vastly different, every WebVTT kind can contain the  
>>>> exact
>>>> same markup. Adding new interfaces just hides some of the information,
>>>> right?
>>>
>>>
>>> If you are not authoring captions/subtitles, you would not use any of
>>> the caption-specific cue settings. It's not just about hiding
>>> information - it's about providing accurate, useful attributes for the
>>> type of object that is being handled.
>>
>>
>> Unless you're suggesting preventing (by means of parsing errors) any of  
>> the
>> WebVTT syntax from being used depending on the kind, it's a certainty  
>> that
>> people *will* use all settings and syntax for all kinds, and hiding that
>> just doesn't seem useful. In fact there's even a use case for  
>> positioning
>> chapter cues: they can then be debugged by changing kind without  
>> interfering
>> with the caption/subtitle cues.
>
> It's easier to simply turn off all other tracks when debugging a
> specific track than having to edit each cue of a WebVTT file just to
> debug its content.

True. Still, the settings can still be there, will be parsed, so it's just  
a matter of hiding them in the interface.

> Note also that we're about to write a rendering algorithm for
> chapters, so there's no need to turn them into captions/subtitles just
> to make them visible.

Can you tell me more about this? Aren't chapters used only in the UI?

>>> The alternative would be to just add attributes to a common Cue object
>>> every time we define new cue format. That's like saying: why don't we
>>> just throw all the attributes that any HTMLxxxElement ever needs onto
>>> the Element object.
>>>
>>> It's simply a poor way to deal with structured data - by ignoring
>>> structure completely.
>>
>>
>> I think a better alternative is to leave WebVTTCue as it is and to add
>> interfaces for other formats that represent the details of those formats
>> well.
>
> I think we're in agreement - I was indeed suggesting that we need to
> add interfaces for other cue formats than the WebVTT caption cue
> format.
>
>> (I disagree that this is like the HTMLxxxElement example, I think it's  
>> more
>> like putting HTMLImageElement and SVGImageElement behind common  
>> interface
>> even though they differ in important ways.)
>
> You're confusing me - are you supporting the introduction of other
> interfaces for other cue formats?

I think that for each sufficiently different serialization format for  
which there is implementor interest, a cue interface able to well  
represent the underlying format should be added.

> Don't get me wrong, though: I still believe that TTMLCaptionCue will
> get created and it will get created, because it follows a different
> caption model than VTTCaptionCue. However, VTTChapterCue and
> TTMLChapterCue should not be different and should instead just result
> in a ChapterCue object, because we want chapters represented the same
> way independent of what serialisation introduced them into the
> browser.

OK, so I guess this is the crux of the matter: unifying the representation  
of chapter cues. What formats other than WebVTT are able to represent  
chapters? I can't find anything in the TTML spec. If TTML chapters look  
like normal TTML cues, I think it would make more sense to just use a  
common TTMLCue interface for all TTML cues, like for WebVTT. Unifying the  
processing of chapters can be layered on top of that, simply by letting  
each cue format define how to extract a chapter name and whatever other  
information is needed. Would that not be simpler?

-- 
Philip Jägenstedt
Opera Software

Received on Friday, 14 June 2013 08:20:43 UTC