Re: Issue-270 and Issue-335 from Glenn Adams on 2014-09-23 (public-tt@w3.org from September 2014)

From: Glenn Adams <glenn@skynav.com>
Date: Tue, 23 Sep 2014 12:25:48 -0600
To: Nigel Megitt <nigel.megitt@bbc.co.uk>
Cc: Timed Text Working Group <public-tt@w3.org>
Message-ID: <CACQ=j+ek=Jx8=ySuRyEG8cMT4r1p5MX1Q4bW4wnVbNuGi929Hw@mail.gmail.com>
On Tue, Sep 23, 2014 at 4:15 AM, Nigel Megitt <nigel.megitt@bbc.co.uk>
wrote:

>   Glenn Adams <glenn@skynav.com>, Monday, 22 September 2014 22:14 wrote:
>
>   On Mon, Sep 22, 2014 at 8:38 AM, Nigel Megitt <nigel.megitt@bbc.co.uk>
> wrote:
>
>>  Glenn, Courtney, all,
>>
>>  The edit to TTML2 ascribed to issue-270 and issue-335 (
>> https://dvcs.w3.org/hg/ttml/rev/3cbc109b90bd) is causing me some
>> concern. I have added notes to both those issues, and additionally I have a
>> number of queries to raise for discussion:
>>
>>  *Concerns*
>>
>>  1. it appears to define an addition/subtraction operation on SMPTE time
>> values even if they're discontinuous. The processing of these seems to be
>> undefined, so they should be disallowed, shouldn't they?
>>
>
>  I had intended to add material to deal with the discontinuous smpte
> mode, but it didn't get into the edit. Will add.
>
>
>>
>>  2. It blurs the layers of interpretation of time values from documents
>> up into any external context. For example it opens up the ambiguity that,
>> when a sequence of TTML documents is wrapped e.g. in ISOBMFF, there are
>> media time offsets available both in TTML and in the wrapper, and authors
>> may be unclear whether they are intended as independent (additive) offsets
>> or as duplicate offsets in which one may be considered not for processing,
>> i.e. metadata.
>>
>
>  Since TTML doesn't know anything about external wrapper metadata, it
> isn't the right place to deal with such possible ambiguity (e.g., in
> different offset values internal and external). The correct place to deal
> with this is in the external spec.
>
>
>
>  Since those external specs already exist we should work in sympathy with
> them rather than redefining what's already there and creating confusion.
> Can we avoid redefining TTML so that it invalidates external wrappers that
> should be independent?
>

It depends on specifics. I need to know the exact text in an external spec
that may intersect with this feature. It may also require that external
spec to add a note to avoid confusion. In any case, I have not seen a
worked out example of how this proposed feature would invalidate an
external spec.



>
>
>>  3. It is actually the opposite proposal to the one I made in Issue-335:
>> I've added a note there and re-opened it.
>>
>>  4. If clock time is prohibited from using media offset because the
>> discontinuityOffset can not be derived in the absence of a date, then I
>> would certainly be happy to propose the addition of a date value. A use
>> case for this is when a TTML document is created as an archive artefact by
>> a processor that observes some real world timed events and converts them
>> into TTML.
>>
>
>  My reason for excluding clock mode is because it doesn't have a related
> media object.
>
>
>  Ah, right. There may in fact be a related media object, but the temporal
> relationship would be indirect, and mediated by the clock rather than some
> other time embedded in the media.
>

Yes, that is a better way of saying what I intended.


>
>
>>  5. It does nothing to address the scenario where the media time
>> corresponding to the beginning of the related media object is known at
>> authoring time, and is non-zero. This media begin time is distinct from,
>> and possibly earlier than, the beginning of the contents of the TTML
>> document.
>>
>
>  I don't understand this statement, since this is precisely what
> ttp:mediaOffset does: allow the beginning of the root temporal extent to be
> offset either before or after the beginning of the related media object.
>
>
>  ttp:mediaOffset doesn't do that though: it merely allows for times in
> the document to be offset prior to processing. It doesn't extend the root
> temporal extent beyond the document's contents.
>

Correct, since it isn't intended to do that.


>
>  I'm puzzled by this: in your ISD generation use case, if the TTML
> document were untimed but you knew ttp:mediaOffset then how would you
> derive the begin time of the first ISD?
>

SMIL semantics dictates that an unspecified begin time resolve to 0, for
both par and seq parents. ttp:mediaOffset doesn't have any role any
resolving active begin/end for document elements. It only comes into play
when synchronizing document time coordinates with media time coordinates.


> ttp:mediaBegin would define the begin time of the first possible ISD
> without further calculation, unless you also want to map the times into
> another time base.
>

But that isn't something I'm trying to do here. Indeed, I'm saying we can't
do that without changing SMIL semantics. "the begin time of the first
possible ISD without further calculation [in the document time base]" is
always 0.


>
>           There are two distinct one-dimensional temporal coordinate
> spaces here that are potentially related:
>
>    - document's temporal coordinate space, call this TIME(document)
>       - origin is at ORIGIN(document), which is always ZERO (0)
>       - has begin time BEGIN(document)
>       - has explicit or implied duration of DUR(document)
>       - so root temporal extent is always the open interval:
>          - [ 0, DUR(document) )
>        - related media object's temporal coordinate space, call this
>    TIME(media)
>       - origin is at ORIGIN(media)
>       - has begin time BEGIN(media)
>       - has explicit or implied duration of DUR(media)
>       - so media temporal extent is always the open interval:
>          - [ BEGIN(media), BEGIN(media) + DUR(media) )
>
>    Since TIME(media) may have a different play rate or frame rate to
> TIME(document) I think we need to introduce the concept of evaluation time
> of this parameter, since conversion between the document time base and the
> media time base may only be achievable by a simple addition at one instant.
>

I agree that the play rate of TIME(media) and TIME(document) could be
different, a point mentioned in a few notes in the current spec text:

*6.2.1 ttp:timeBase*

*Note:*

When using a media time base, if that time base is paused or scaled
positively or negatively, i.e., the media play rate is not unity, then it
is expected that the presentation of associated Timed Text content will be
similarly paused, accelerated, or decelerated, respectively. The means for
controlling an external media time base is outside the scope of this
specification.
*Appendix N Time Expression Semantics*

*Note:*

The phrase *play rate* as used below is intended to model a (possibly
variable) parameter in the document processing context wherein the rate of
playback (or interpretation) of time may artificially dilated or narrowed,
for example, when slowing down or speeding up the rate of playback of a related
media object <#148a2b877da9f8c4_terms-related-media-object>. Without loss
of generality, the following discussion assumes a fixed play(back) rate. In
the case of variable play rates, appropriate adjustments may need to be
made to the resulting computations.
*Appendix N.1 Clock Time Base*

*Note:*

That is to say, timing is disconnected from (not necessarily proportional
to) media time when the clock time base is used. For example, if the media
play rate is zero (0), media playback is suspended; however, timing
coordinates will continue to advance according to the natural progression
of clock time in direct proportion to the reference clock base.
Furthermore, if the media play rate changes during playback, presentation
timing is not affected.
However, at present, this text basically states (informatively) or in the
smpte case assumes:

   - for clock time base, RATE(TIME(document)) is fixed as 1X real time,
   independently of RATE(TIME(media))
   - for media time base, RATE(TIME(document)) = RATE(TIME(media))
   - for smpte time base, it doesn't say anything special, but one can
   infer that the same interpretation applies as for media time base (in
   either continuous or discontinuous modes)

In any case, I don't want the interpretation of the proposed parameter to
depend upon differences in play rates.


>
>  However  the play rate of the media may not be known, so I've assumed
> that any time base mapping must be external to the document, and that what
> we need to do to ensure that BEGIN(document) aligns with the right point in
> the media's temporal coordinate space is to define a known fixed datum in
> the media, in the document's time base, and require the processor to map
> the temporal coordinate spaces.
>
>     The intent of ttp:mediaOffset is to express the delta between
> BEGIN(document) and BEGIN(media):
>
>
>  That's not what I expect from a parameter called mediaOffset – I'd
> certainly been reading it as ORIGIN(document) - ORIGIN(media).
>

The problem with this is that BEGIN(media) - ORIGIN(media) is unknown and
arbitrary, and, further, shouldn't affect synchronization IMO. It certainly
wouldn't affect synchronization in clock time base, media time base, or
continuous smpte time base. However, in the case of discontinuous smpte
time base, special treatment is needed for using/interpreting
ttp:mediaOffset, the same special treatment that is required for converting
a discontinuous smpte time base document to an ISD sequence, something I
have not yet documented in the spec, for which the basic approach I am
thinking of is as follows:

*Convert Discontinuous SMPTE Time Base Document to Media Time Base Document*

(1) reset MEDIATIMER to 0; initialize MAPPINGS to empty set;
(2) simultaneously start playback of related media object at 1X play rate
and start MEDIATIMER at 1X real time;
(3) when encountering a SMPTE time label in related media object, record
the current value of MEDIATIMER and save the pair <SMPTE time label,
MEDIATIMER value> in MAPPINGS;
(4) if playback is not complete, go to (3);
(5) visit each time expression T in document, performing following steps:
(6) if T is in MAPPINGS, then rewrite T (in document) to MAPPINGS.get(T)
and continue at (5);
(7) otherwise (T has no mapping), either abort due to mapping error or use
a fallback mapping (TBD), e.g., mapping of "closest" label that does map;


>
>    - if ttp:mediaOffset > 0, then BEGIN(document) temporally follows
>    BEGIN(media)
>    - if ttp:mediaOffset < 0, then BEGIN(document) temporally precedes
>    BEGIN(media)
>
> Note that this definition is arbitrary: we could invert the meaning if we
> wish. In any case, the current language decodes as follows:
>
>  Given ttp:mediaOffset = +10s, then <body begin="5s"/> means that body
> starts at 15s after BEGIN(media).
>
>
>  That seems to be an offset of ORIGIN(document) - ORIGIN(media)
>

Let's work out the example using *your* interpretation of "offset" where we
choose an arbitrary BEGIN(media) of 7s in TIME(media), *and further
assuming that media and document play rates match*:

Given

(1) BEGIN(media) = 7s in TIME(media)
(2) BEGIN(body) = 5s in TIME(document)
(3) mediaOffset = 10s = *ORIGIN(document) - ORIGIN(media)*

Yields

BEGIN(body) in TIME(media) = ORIGIN(media) + mediaOffset + BEGIN(body) = 0s
+ 10s + 5s = 15s in TIME(media), which is the same as *BEGIN(media) + 8s*

and, now, let's *change BEGIN(media) to another value, say 13s*, so we end
up with 0s + 10s + 5s = 15s in TIME(media), which is the same as *BEGIN(media)
+ 2s*

However, using *my* interpretation of "offset" using the same info, we have:

Given

(1) BEGIN(media) = 7s in TIME(media)
(2) BEGIN(body) = 5s in TIME(document)
(3) mediaOffset = 10s = *BEGIN(document) - BEGIN(media)*

Yields

BEGIN(body) in TIME(media) = BEGIN(media) + mediaOffset + BEGIN(body) = 7s
+ 10s + 5s = 22s in TIME(media), which is the same as *BEGIN(media) + 15s*

and, now, let's *change BEGIN(media) to another value, say 13s*, we end up
with 13s + 10s + 5s = 28s, which is the same as *BEGIN(media) + 15s* (still)

So, given your interpretation, BEGIN(body) in TIME(media) is dependent on
BEGIN(media), while in my interpretation, BEGIN(body) remains constant with
respect to BEGIN(media). When reasoning about timing as an author, I would
clearly want to use BEGIN(media) and not ORIGIN(media) as the fixed datum.
But this preference is based on using time expressions related to
BEGIN(media) as opposed to ORIGIN(media), about which see more below.

that must be evaluated one time only, at BEGIN(document) or 5s in the
> document.
>

Not if play rates match or if you use my interpretation (see more below).


>
>  This is still problematic, since it's content dependent. Consider that
> two videos Va and Vb both have continuous timecode where the beginning of
> the programme is at 10:00:00.
>

I interpret this, for Va and Vb, as BEGIN(media) = 36000s in TIME(media)


> Va has dialogue and a corresponding TTML document Ta such that BEGIN(Ta) =
> 10:01:00
>

I interpret this as mediaOffset = -36000s and BEGIN(Ta) = 36060s in
TIME(document), which maps to BEGIN(media) + mediaOffset + BEGIN(body) =
36060s in TIME(media)


> and Vb has Tb where BEGIN(Tb)=10:05:00.
>

I interpret this as mediaOffset = -36000s and BEGIN(Ta) = 36300s in
TIME(document), which maps to BEGIN(media) + mediaOffset + BEGIN(body) =
36300s in TIME(media)


> I would state that the more useful parameter would be identical in both
> documents, i.e. mediaBegin="10:00:00", so that any processor can start the
> effective clock (e.g. a frame counter) ticking at the same point, rather
> than having to evaluate at the arbitrary point that is BEGIN(document).
>

*The problem here is that this document is authored such that time
expressions are not related to BEGIN(media), but rather, related to
ORIGIN(media).* TTML, being based on SMIL, basically assumes that time is
expressed in relation to BEGIN(related media object), and not
ORIGIN(related media object). This follows from how time expressions on
children of a par time container are relative to the begin time of their
parent par container, and not the origin of the time base of their parent.

Our different positions on this issue appear to relate to which mode we
think of as being normal. For me, the example you describe is abnormal from
a SMIL timing perspective, whereas apparently the converse is true for you.

In my mental model, time expressions in TTML stay fixed with respect to
BEGIN(media), while in yours, they apparently stay fixed with respect to
ORIGIN(media). In my model, the use of timeOffset is independent of
BEGIN(media) while in your model it is dependent on BEGIN(media).

So, to translate between our mental models, we have:

*From BEGIN(media) relative to ORIGIN(media) relative time expressions:*

add BEGIN(media)

*From ORIGIN(media) relative to BEGIN(media) relative time expressions:*

subtract BEGIN(media)

*Now, however, let's look at the situation when play rates differ*, i.e.,
RATE(TIME(document)) != RATE(TIME(media)). As an example, let's say that
RATE(TIME(media)) / RATE(TIME(document)) is 2, i.e., we run media time at
twice the rate of document time. So, going back to my earlier numbers:

Given

(1) RATE(TIME(media)) = 2, RATE(TIME(document)) = 1, EPOCH(TIME(media)) =
EPOCH(TIME(document))
(2) BEGIN(media) = 7s in TIME(media), or 3.5s in TIME(real)
(3) BEGIN(body) = 5s in TIME(document), or 5s in TIME(real)
(4) mediaOffset = 10s = *ORIGIN(document) in TIME(real) - ORIGIN(media) in
TIME(real)*

Yields

BEGIN(body) in TIME(real) = ORIGIN(media) + mediaOffset + BEGIN(body) = 0s
+ 10s + 5s = 15s in TIME(real), which is the same as *BEGIN(media) + 11.5s
in TIME(real)*

Now, let's *change BEGIN(media) to another value, say 13s*:

Given

(1) RATE(TIME(media)) = 2, RATE(TIME(document)) = 1, EPOCH(TIME(media)) =
EPOCH(TIME(document))
(2) BEGIN(media) = 13s in TIME(media), or 6.5s in TIME(real)
(3) BEGIN(body) = 5s in TIME(document), or 5s in TIME(real)
(4) mediaOffset = 10s = *ORIGIN(document) in TIME(real) - ORIGIN(media) in
TIME(real)*

Yields

BEGIN(body) in TIME(real) = ORIGIN(media) + mediaOffset + BEGIN(body) = 0s
+ 10s + 5s = 15s in TIME(real), which is the same as *BEGIN(media) + 8.5s
in TIME(real)*

However, using my interpretation of "offset" using the same info, we have:

Given

(1) RATE(TIME(media)) = 2, RATE(TIME(document)) = 1, EPOCH(TIME(media)) =
EPOCH(TIME(document))
(2) BEGIN(media) = 7s in TIME(media), or 3.5s in TIME(real)
(3) BEGIN(body) = 5s in TIME(document), or 5s in TIME(real)
(4) mediaOffset = 10s = *BEGIN(document) in TIME(real) - BEGIN(media) in
TIME(real)*

Yields

BEGIN(body) in TIME(real) = BEGIN(media) + mediaOffset + BEGIN(body) = 3.5s
+ 10s + 5s = 18.5s in TIME(real), which is the same as *BEGIN(media) + 15s
in TIME(real)*

Now, let's change BEGIN(media) to another value, say 13s:

Given

(1) RATE(TIME(media)) = 2, RATE(TIME(document)) = 1, EPOCH(TIME(media)) =
EPOCH(TIME(document))
(2) BEGIN(media) = 13s in TIME(media), or 6.5s in TIME(real)
(3) BEGIN(body) = 5s in TIME(document), or 5s in TIME(real)
(4) mediaOffset = 10s = *BEGIN(document) in TIME(real) - BEGIN(media) in
TIME(real)*

Yields

BEGIN(body) in TIME(real) = BEGIN(media) + mediaOffset + BEGIN(body) = 6.5s
+ 10s + 5s = 21.5s in TIME(real), which is the same as *BEGIN(media) + 15s
in TIME(real)*

Notice that by using my interpretation of mediaOffset, differing play rates
do not affect the relationship between BEGIN(body) and BEGIN(media), which
stays constant.


>
>  My proposed ttp:mediaBegin would have the value "10:00:00"
>

I agree from your example that an expression of 10:00:00 describes the
delta between BEGIN(media) and ORIGIN(media), but this is only useful in
cases where time expressions are related to ORIGIN(media) and not
BEGIN(media). In the mediaOffset formalism I defined, this value, i.e.,
BEGIN(media) - ORIGIN(media), is of no utility. Namely, if mediaOffset is
specified as I defined it, i.e., with your example as mediaOffset="-36000s"
or mediaOffset="-10h", then you don't have to worry about either play rate
differences or changes in actual BEGIN(media), since the result is time
expressions always related to BEGIN(media).


> in these cases, and not mix in the concept of mapping between the
> document's temporal coordinates and the related media's temporal
> coordinates.
>



> The play rate in the document's time base is well defined as now. It's
> reasonable to assume that any media playback device knows when the related
> media begins and what it's play rate is.
>
>     Or, given ttp:mediaOffset = -5s, then <body begin="5s"/> means that
> body starts at BEGIN(media).
>
>  Given this formalism, we don't really care about BEGIN(media) -
> ORIGIN(media).
>
>
>  Agreed. What we care about is BEGIN(media) in the temporal coordinate
> space of the document, or in your useful terminology, in TIME(document).
>
>
>  Now, if you are suggesting an alternative use case where
> ORIGIN(document) != 0 in the TIME(document) coordinate space, then that is
> something I haven't considered, and certainly did not intend to address.
> Indeed, doing so would be problematic since SMIL timing semantics assumes
> that unspecified begin defaults to 0s, and further, that 0s corresponds to
> ORIGIN(document).
>
>
>  I'm not suggesting that ORIGIN(document) !=0 in TIME(document), since
> that would as you say create a whole bunch of other problems.
>
>
>  My response to such a proposed use case would probably be: we don't
> support it, you don't need to do it anyway, so don't do it.
>
>  Note that the above considerations assume that time base is media, or
> that time base is smpte continuous mode, or that time base is smpte
> discontinuous mode and that all smpte time events have been converted to
> equivalent smpte continuous mode values, e.g., by playing back a media
> object in 1X normal play mode and recording the PTS time that corresponds
> with each frame associated with a smpte time label.
>
>
> Just for completeness (at the expense of being repetitious), did you also
> assume that the media play rate is identical to the document's play rate,
> i.e. that the only difference between TIME(media) and TIME(document) is an
> additive offset?
>

See above.


>
>
>>      *Proposals*
>>
>
>>  I would propose a resolution to points 1, 2, 3 and 5 that is to remove
>> mediaOffset and add a ttp:mediaBegin parameter, expressed in the same time
>> base as the document's ttp:timeBase parameter. This also fits better with
>> ttp:mediaDuration.
>>
>
>  Hmmm. I'm not inclined to make this change, because mentally I see
> mediaOffset as expressing a difference/delta/offset between two points in
> two different one-dimensional coordinate spaces both representing linear
> time (at 1X play rate). Calling it mediaBegin implies in my mind
> BEGIN(media), i.e., the delta between BEGIN(media) and ORIGIN(media), and
> not the delta between BEGIN(document) and BEGIN(media).
>
>
>  If this is just about the name we choose for the parameter then we're
> right to choose carefully, but it shouldn't prevent us from agreeing the
> semantics. To my mind mediaBegin does suggest the delta between
> BEGIN(document) and BEGIN(media), both in TIME(document). Whereas to me
> mediaOffset suggests the delta between ORIGIN(document) in TIME(document)
> and ORIGIN(media) in TIME(??? - this is not clear), which if I understand
> correctly isn't what you intend. Or if it is what you intend it doesn't
> seem to be a complete solution for the problem.
>
>     I would additionally propose allowing dates to be specified to use in
>> relation to clock times to resolve point 4, perhaps with a ttp:date
>> parameter, valid only when ttp:timeBase="clock". Note that this does not
>> resolve any time comparison issues caused by documents whose times cross
>> midnight and wrap back round to a smaller number of hours.
>>
>
>  Again, I'm wondering what is the related media object? To my
> recollection, ttp:timeBase="clock" was added to TTML to handle timed text
> cases that don't have a related media object.
>
>
>  It would be a media object that had also been captured with reference to
> a clock.
>
>
>
>>
>>
>>  Are there other related use cases or requirements not met by these
>> proposals?
>>
>>  Kind regards,
>>
>>  Nigel
>>
>>
>
Received on Tuesday, 23 September 2014 18:26:39 UTC