Re: A new proposal for how to deal with text track cues from Silvia Pfeiffer on 2013-06-14 (public-texttracks@w3.org from June 2013)

From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Date: Fri, 14 Jun 2013 18:57:17 +1000
To: Philip Jägenstedt <philipj@opera.com>
Cc: public-texttracks@w3.org
Message-ID: <CAHp8n2m_3KJ-jZ3H-cgWbTp2dFXv4PSXxbqm0GnsL8C3-xBN-Q@mail.gmail.com>
On Fri, Jun 14, 2013 at 6:20 PM, Philip Jägenstedt <philipj@opera.com> wrote:
>
>>> Making the parser depend on attributes on the track element is
>>> unnecessary
>>> coupling, and requiring later re-parsing means that the WebVTT file must
>>> be
>>> pinned in cache, even if the HTTP cache headers don't approve. Note also
>>> that re-parsing would throw away all existing cues together with any
>>> modifications made by scripts.
>>
>>
>> I think those are all positive consequences: changing the @kind on a
>> <track> should not become something that programmers frequently use -
>> I don't see a common use case for it. If it requires re-fetching the
>> WebVTT file, then so be it. And re-parsing makes sense, because you
>> may have made changes because you thought the cues were of a
>> particular type, but they are not, so it's better to reset that.
>
>
> The way I see it, re-parsing serves no purpose, because the WebVTT file is
> still the same and will be parsed into the same result, it's just the
> interpretation of the resulting cues that is different between kinds. This
> looks like clean layering to me, is it unsightly from some other
> perspective?

Re-parsing the cues will have to happen anyway, because the parsing
and the rendering algorithm both depend on what the cues are being
interpreted as. For example, a kind=descriptions cue that has SSML
markup, in contrast to a kind=captions cue that has WebVTT caption cue
markup. When rendering the first one, a SSML parser will be activated
and then a SSML descriptions renderer. When rendering the second one,
the WebVTT caption parser will be activated and then the WebVTT
caption renderer.

The difference is that right now we shove all this into a single
object and attach all the different parsing and rendering algorithms
that are possible with the same object. This is bound to eventually
end up in a complicated mess with statements such as "these attributes
and these parsing and rendering algorithms are to be used when the cue
is interpreted as a caption cue, these other ones for interpretation
as descriptions, etc etc". Doesn't look like clean layering to me.



>> It's easier to simply turn off all other tracks when debugging a
>> specific track than having to edit each cue of a WebVTT file just to
>> debug its content.
>
>
> True. Still, the settings can still be there, will be parsed, so it's just a
> matter of hiding them in the interface.

How do you hide them in the interface?


>> Note also that we're about to write a rendering algorithm for
>> chapters, so there's no need to turn them into captions/subtitles just
>> to make them visible.
>
> Can you tell me more about this? Aren't chapters used only in the UI?

We have to write a rendering algorithm for chapters at
http://localhost/~silvia/html5/text-tracks/webvtt/webvtt.html#cues-in-isolation
 so we get interoperable display of chapters.

I'm going to propose to add them as a list into a menu on the video
controls. But it is possible to introduce other displays like the
chapter markers in the examples here:
http://wiki.whatwg.org/wiki/Use_cases_for_API-level_access_to_timed_tracks#Chapter_Markers
. We should discuss this separately.


>> You're confusing me - are you supporting the introduction of other
>> interfaces for other cue formats?
>
>
> I think that for each sufficiently different serialization format for which
> there is implementor interest, a cue interface able to well represent the
> underlying format should be added.
>
>
>> Don't get me wrong, though: I still believe that TTMLCaptionCue will
>> get created and it will get created, because it follows a different
>> caption model than VTTCaptionCue. However, VTTChapterCue and
>> TTMLChapterCue should not be different and should instead just result
>> in a ChapterCue object, because we want chapters represented the same
>> way independent of what serialisation introduced them into the
>> browser.
>
>
> OK, so I guess this is the crux of the matter: unifying the representation
> of chapter cues. What formats other than WebVTT are able to represent
> chapters?

Plenty others, including DVD chapters, chapters in QuickTime files, in
MP2, or in MP4 files. But they all parse down to a start time (an
optional end time) and a plain text string.

> I can't find anything in the TTML spec.

I was told that TTML indeed supports chapters, though I haven't seen
any TTML files in use for that purpose. They would also just be timed
cues with plain text, I was told.

> If TTML chapters look like
> normal TTML cues, I think it would make more sense to just use a common
> TTMLCue interface for all TTML cues, like for WebVTT. Unifying the
> processing of chapters can be layered on top of that, simply by letting each
> cue format define how to extract a chapter name and whatever other
> information is needed. Would that not be simpler?

I don't think so. I think we should distinguish between Cue formats
based on semantics and not based on the name of the serialisation file
format that provides it, because there are many file formats that will
provide the same information to the browser.

Captions are indeed a bit more complicated than all the other timed
cue formats, which is why I think there will be a TTMLCaptionCue
object that will be substantially different from a WebVTTCaptionCue.
It would, though, be nice if we can were able to define a CaptionCue
object that can be filled either from a TTML or a WebVTT or form a
CEA708 file or other caption format (unfortunately, WebVTTCue isn't it
- it has too much WebVTT specifics in it).

Silvia.
Received on Friday, 14 June 2013 08:58:06 UTC