Re: A new proposal for how to deal with text track cues from Pierre-Anthony Lemieux on 2013-06-14 (public-texttracks@w3.org from June 2013)

From: Pierre-Anthony Lemieux <pal@sandflow.com>
Date: Fri, 14 Jun 2013 09:21:07 -0700
To: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Cc: Philip Jägenstedt <philipj@opera.com>, "public-texttracks@w3.org" <public-texttracks@w3.org>
Message-ID: <CAF_7JxAM_E2mmLC2LhrK6Bi9Vyi7TOqUkb9UrM60Avy63-qQng@mail.gmail.com>
>I was told that TTML indeed supports chapters, though I haven't seen
>any TTML files in use for that purpose. They would also just be timed
>cues with plain text, I was told.

Right, based on the algorithm specified in the section "Text tracks
describing chapters".

-- Pierre

On Fri, Jun 14, 2013 at 1:57 AM, Silvia Pfeiffer
<silviapfeiffer1@gmail.com> wrote:
> On Fri, Jun 14, 2013 at 6:20 PM, Philip Jägenstedt <philipj@opera.com> wrote:
>>
>>>> Making the parser depend on attributes on the track element is
>>>> unnecessary
>>>> coupling, and requiring later re-parsing means that the WebVTT file must
>>>> be
>>>> pinned in cache, even if the HTTP cache headers don't approve. Note also
>>>> that re-parsing would throw away all existing cues together with any
>>>> modifications made by scripts.
>>>
>>>
>>> I think those are all positive consequences: changing the @kind on a
>>> <track> should not become something that programmers frequently use -
>>> I don't see a common use case for it. If it requires re-fetching the
>>> WebVTT file, then so be it. And re-parsing makes sense, because you
>>> may have made changes because you thought the cues were of a
>>> particular type, but they are not, so it's better to reset that.
>>
>>
>> The way I see it, re-parsing serves no purpose, because the WebVTT file is
>> still the same and will be parsed into the same result, it's just the
>> interpretation of the resulting cues that is different between kinds. This
>> looks like clean layering to me, is it unsightly from some other
>> perspective?
>
> Re-parsing the cues will have to happen anyway, because the parsing
> and the rendering algorithm both depend on what the cues are being
> interpreted as. For example, a kind=descriptions cue that has SSML
> markup, in contrast to a kind=captions cue that has WebVTT caption cue
> markup. When rendering the first one, a SSML parser will be activated
> and then a SSML descriptions renderer. When rendering the second one,
> the WebVTT caption parser will be activated and then the WebVTT
> caption renderer.
>
> The difference is that right now we shove all this into a single
> object and attach all the different parsing and rendering algorithms
> that are possible with the same object. This is bound to eventually
> end up in a complicated mess with statements such as "these attributes
> and these parsing and rendering algorithms are to be used when the cue
> is interpreted as a caption cue, these other ones for interpretation
> as descriptions, etc etc". Doesn't look like clean layering to me.
>
>
>
>>> It's easier to simply turn off all other tracks when debugging a
>>> specific track than having to edit each cue of a WebVTT file just to
>>> debug its content.
>>
>>
>> True. Still, the settings can still be there, will be parsed, so it's just a
>> matter of hiding them in the interface.
>
> How do you hide them in the interface?
>
>
>>> Note also that we're about to write a rendering algorithm for
>>> chapters, so there's no need to turn them into captions/subtitles just
>>> to make them visible.
>>
>> Can you tell me more about this? Aren't chapters used only in the UI?
>
> We have to write a rendering algorithm for chapters at
> http://localhost/~silvia/html5/text-tracks/webvtt/webvtt.html#cues-in-isolation
>  so we get interoperable display of chapters.
>
> I'm going to propose to add them as a list into a menu on the video
> controls. But it is possible to introduce other displays like the
> chapter markers in the examples here:
> http://wiki.whatwg.org/wiki/Use_cases_for_API-level_access_to_timed_tracks#Chapter_Markers
> . We should discuss this separately.
>
>
>>> You're confusing me - are you supporting the introduction of other
>>> interfaces for other cue formats?
>>
>>
>> I think that for each sufficiently different serialization format for which
>> there is implementor interest, a cue interface able to well represent the
>> underlying format should be added.
>>
>>
>>> Don't get me wrong, though: I still believe that TTMLCaptionCue will
>>> get created and it will get created, because it follows a different
>>> caption model than VTTCaptionCue. However, VTTChapterCue and
>>> TTMLChapterCue should not be different and should instead just result
>>> in a ChapterCue object, because we want chapters represented the same
>>> way independent of what serialisation introduced them into the
>>> browser.
>>
>>
>> OK, so I guess this is the crux of the matter: unifying the representation
>> of chapter cues. What formats other than WebVTT are able to represent
>> chapters?
>
> Plenty others, including DVD chapters, chapters in QuickTime files, in
> MP2, or in MP4 files. But they all parse down to a start time (an
> optional end time) and a plain text string.
>
>> I can't find anything in the TTML spec.
>
> I was told that TTML indeed supports chapters, though I haven't seen
> any TTML files in use for that purpose. They would also just be timed
> cues with plain text, I was told.
>
>> If TTML chapters look like
>> normal TTML cues, I think it would make more sense to just use a common
>> TTMLCue interface for all TTML cues, like for WebVTT. Unifying the
>> processing of chapters can be layered on top of that, simply by letting each
>> cue format define how to extract a chapter name and whatever other
>> information is needed. Would that not be simpler?
>
> I don't think so. I think we should distinguish between Cue formats
> based on semantics and not based on the name of the serialisation file
> format that provides it, because there are many file formats that will
> provide the same information to the browser.
>
> Captions are indeed a bit more complicated than all the other timed
> cue formats, which is why I think there will be a TTMLCaptionCue
> object that will be substantially different from a WebVTTCaptionCue.
> It would, though, be nice if we can were able to define a CaptionCue
> object that can be filled either from a TTML or a WebVTT or form a
> CEA708 file or other caption format (unfortunately, WebVTTCue isn't it
> - it has too much WebVTT specifics in it).
>
> Silvia.
>
Received on Friday, 14 June 2013 16:21:57 UTC