Re: [blink-dev] WebVTT vs TTML Features

On Wed, Dec 11, 2013 at 10:33 AM, Glenn Adams <glenn@chromium.org> wrote:
>
>
> On Wed, Dec 11, 2013 at 7:08 AM, Silvia Pfeiffer <silviapfeiffer1@gmail.com>
> wrote:
>>
>> On Wed, Dec 11, 2013 at 8:09 AM, Glenn Adams <glenn@chromium.org> wrote:
>> >
>> > On Wed, Dec 11, 2013 at 2:50 AM, David Singer <singer@apple.com> wrote:
>> >>
>> >>
>> >> On Dec 9, 2013, at 11:31 , Glenn Adams <glenn@chromium.org> wrote:
>> >>
>> >> > Another significant design difference between TTML and WebVTT comes
>> >> > into
>> >> > play here as well: TTML was designed for smart authoring systems and
>> >> > dumb
>> >> > clients, while WebVTT was designed for dumb authoring systems and
>> >> > smart
>> >> > clients.
>> >>
>> >> I don’t think this can possibly be true.  The client-side
>> >> implementation
>> >> of VTT is vastly simpler than TTML, and indeed does not require
>> >> profiles, or
>> >> complicated specification.
>> >
>> >
>> > This is speculation, since we don't have a reasonable open-source client
>> > implementation of TTML to compare against. I do know that VTT requires a
>> > number of things that TTML does not require, including:
>> >
>> > a parser
>> >
>> > TTML would reuse existing an XML or generic HTML markup parser, while
>> > VTT
>> > requires a new parser
>> >
>> > logic to perform overlap avoidance and other automatic functions
>> > expected to
>> > be performed by VTT
>> >
>> > TTML assigns this responsibility to the authoring system, not the client
>>
>> VTT assigns this responsibility to both the authoring system AND the
>> client, since the authoring system cannot foresee everything. If,
>> e.g., the client changes the fontsize or the size of the video element
>> substantially over what the author expected, the client has to deal
>> with overlap no matter what.
>>
>>
>> > A TTML client does not have to process any profile information, e.g., it
>> > can
>> > be built to support one or more specific, pre-defined profiles (of
>> > feature
>> > sets).
>> >
>> > It's too early to say how complicated the VTT specification will be. In
>> > fact, if you review the number of algorithmic steps specified in the
>> > current
>> > VTT draft, it vastly exceeds the number of algorithmic steps specified
>> > in
>> > TTML.
>> >
>> >>
>> >> > That these design choices are very different will continue to stymy
>> >> > efforts to unify the two intentionally different expressions of timed
>> >> > text
>> >> > content.
>> >>
>> >> Since TTML’s initial “raison d’etre” was as a flexible authoring
>> >> system,
>> >> this would seem to be a problem.  I doubt that it’s true.
>> >
>> >
>> > It is certainly true that one could extend TTML by defining new style
>> > extension properties that semantically map to the VTT style semantics
>> > that
>> > support automatic region overlap avoidance, etc., but the opposite isn't
>> > true, i.e., VTT doesn't yet support an authoring defined region
>> > placement
>> > model,
>>
>> It most certainly does! VTT regions are exactly that: explicitly
>> placed regions by the author.
>
>
> I should have been more specific. VTT does not presently support pixel
> addressing for region placement.

None of the caption formats in wide use use pixel addressing for
caption placement - not CEA608/708 nor Teletext FAIK. Pixels are
ephemeral, in particular when your video is rendered at different
resolutions. It really doesn't make sense to me to attach cues at
pixel offsets.


>> BTW: even normal cues can be explicitly
>> placed by the author, but the rendering engine is allowed to move them
>> somewhat in case of overlap. So, VTT has a dual model for region
>> placement: one where the author is 100% in control of placement and
>> one where the author makes an informed decision, but the final word
>> stays with the browser when it sees problems with the author's
>> decision.
>>
>>
>> > and doesn't support a number of other stylistic,
>>
>> Anything major that is missing in this respect should indeed be fixed.
>>
>> > or timing functions.
>>
>> What do you mean by that? I am not aware of any missing features wrt
>> "timing functions".
>
>
> Here I was referring to support for:
>
> SMPTE code time expressions
> wall clock time expressions
> frame addressing in time expressions
> sub-frame addressing in time expressions
>
> VTT supports only a media time base model with time expressions that map to
> NPT.

Right. All the others can be mapped to that. It's less error prone if
the mapping is done in the conversion system, which references a
particular instance and encoding format of a video file. The resulting
VTT is then applicable to all transcoded versions of that file, too.

Silvia.

Received on Tuesday, 10 December 2013 23:40:31 UTC