Re: [blink-dev] WebVTT vs TTML Features from Silvia Pfeiffer on 2013-12-11 (public-texttracks@w3.org from December 2013)

From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Date: Wed, 11 Dec 2013 12:11:05 +1100
To: Glenn Adams <glenn@chromium.org>
Cc: David Singer <singer@apple.com>, Victor Carbune <vcarbune@chromium.org>, Silvia Pfeiffer <silviapf@chromium.org>, "public-texttracks@w3.org" <public-texttracks@w3.org>, Nigel Megitt <nigel.megitt@bbc.co.uk>
Message-ID: <CAHp8n2kmmZ5qm2GGUj8NYBmxfsHLW1Ge44TSV=4mrA7E_eJt8g@mail.gmail.com>

On Wed, Dec 11, 2013 at 10:49 AM, Silvia Pfeiffer
<silviapfeiffer1@gmail.com> wrote:
> On Wed, Dec 11, 2013 at 10:46 AM, Glenn Adams <glenn@chromium.org> wrote:
>>
>>
>>
>> On Wed, Dec 11, 2013 at 7:43 AM, David Singer <singer@apple.com> wrote:
>>>
>>>
>>> On Dec 10, 2013, at 15:33 , Glenn Adams <glenn@chromium.org> wrote:
>>>
>>> > I should have been more specific. VTT does not presently support pixel
>>> > addressing for region placement.
>>>
>>> What is a pixel?  OK, so that sounds flip, but text does not have pixels
>>> in it.  They are ephemeral aspects of the encoding, or (increasingly
>>> irrelevant) aspects of the client device.
>>>
>>> On Dec 10, 2013, at 15:33 , Glenn Adams <glenn@chromium.org> wrote:
>>>
>>> > Here I was referring to support for:
>>> >       • SMPTE code time expressions
>>> >       • wall clock time expressions
>>> >       • frame addressing in time expressions
>>> >       • sub-frame addressing in time expressions
>>> > VTT supports only a media time base model with time expressions that map
>>> > to NPT.
>>>
>>> SMPTE time-code is useful for addressing specific sub-sections of a video
>>> that is labeled with them, and being frame accurate with it. However, since
>>> they are labels, they can be discontinuous, and this is problematic.
>>>
>>> Wall-clock times are useful if you want something done at, or recorded as
>>> having been done at, a specific date/time.  This rarely, if ever, arises, in
>>> captioning; one is more interested in having the content express “<this>
>>> caption should be shown for <this> time interval over <that> video” than
>>> “show this caption at 2pm on Thursday”
>>>
>>> Frame addressing is relevant if you need to extract a specific frame, but
>>> the temporal sampling structure of video is sometimes an artifact of its
>>> delivery environment.  If video is re-sampled from 60i to 30p, or from 60p
>>> to 120p, or 3:2 pulldown is introduced or reversed, frame expressions become
>>> fragile.
>>>
>>> I am not sure what you mean by sub-frame, unless you mean the ability to
>>> time something at other than a frame time, which clearly can be done.
>>>
>>> On Dec 10, 2013, at 15:33 , Glenn Adams <glenn@chromium.org> wrote:
>>>
>>> > Are you saying we should not even try coming up with a mapping?
>>> >
>>> > No. I think various folks (including me) are already working on such a
>>> > mapping. The question is to what extent it will be complete and remain
>>> > complete.
>>> >
>>>
>>> TTML is supposed to be an authoring language.  If there are aspects of VTT
>>> that cannot be represented in TTML, we may need to look at extending TTML.
>>> But just as it covers aspects of 3GPP Timed Text, SMIL text, SAMI, and other
>>> systems that we looked at way back, as well as x08, full TTML is and should
>>> be a superset of almost any delivery format.
>>>
>>> David Singer
>>> Multimedia and Software Standards, Apple Inc.
>>>
>>
>> Whether or not pixels, SMPTE time codes, frames etc are useful are not is
>> irrelevant to this discussion. TTML supports them. So any discussion of
>> mapping TTML to VTT or HTML or anything else for that matter has to deal
>> with these at some level.
>>
>> This thread is not about second guessing or redesigning TTML.
>>
>
> Correct. TTML is for authoring, VTT for rendering.

Glenn asked me to clarify this. My point on this is that TTML was
created as a format that encapsulates all features of all caption
formats so that anything that is authored anywhere can be transcoded
to TTML without loss. That is a grand goal and has been the driving
force of TTML. In contrast, VTT was created specifically to render
captions and other time-aligned text in browsers. That is what
motivated this statement. It's on this background that the feature
sets of the two formats have been developed, which explains a lot of
the differences and their respective strengths. It also explains why
we are not interested in some things in VTT.

Hope that clarifies that.

Regards,
Silvia.

> At the point in
> time that you are converting a TTML file with SMPTE time code to VTT,
> you have to reference a video file or at minimum the interpretation of
> the SMPTE time code for making the conversion.
>
> Silvia.

Received on Wednesday, 11 December 2013 01:11:59 UTC