Re: Issue-270 and Issue-335 from Nigel Megitt on 2014-09-24 (public-tt@w3.org from September 2014)

From: Nigel Megitt <nigel.megitt@bbc.co.uk>
Date: Wed, 24 Sep 2014 15:30:02 +0000
To: Nigel Megitt <nigel.megitt@bbc.co.uk>, Glenn Adams <glenn@skynav.com>
CC: Timed Text Working Group <public-tt@w3.org>
Message-ID: <D048A155.11E8E%nigel.megitt@bbc.co.uk>
Sorry, forgot the links:

EBU-TT, EBU Tech 3350: https://tech.ebu.ch/docs/tech/tech3350.pdf

CARRIAGE OF EBU-TT-D IN ISOBMFF, EBU Tech3381: https://tech.ebu.ch/docs/tech/tech3381.pdf


There's no straight link to ISO/IEC14496-12:2012 as you have to buy it from a shop :-(



From: Nigel Megitt <nigel.megitt@bbc.co.uk<mailto:nigel.megitt@bbc.co.uk>>
Date: Wednesday, 24 September 2014 17:24
To: Glenn Adams <glenn@skynav.com<mailto:glenn@skynav.com>>
Cc: Timed Text Working Group <public-tt@w3.org<mailto:public-tt@w3.org>>
Subject: Re: Issue-270 and Issue-335
Resent-From: <public-tt@w3.org<mailto:public-tt@w3.org>>
Resent-Date: Wednesday, 24 September 2014 17:24

Glenn Adams <glenn@skynav.com<mailto:glenn@skynav.com>>, Tuesday, 23 September 2014 20:25 wrote:
On Tue, Sep 23, 2014 at 4:15 AM, Nigel Megitt <nigel.megitt@bbc.co.uk<mailto:nigel.megitt@bbc.co.uk>> wrote:
Glenn Adams <glenn@skynav.com<mailto:glenn@skynav.com>>, Monday, 22 September 2014 22:14 wrote:
On Mon, Sep 22, 2014 at 8:38 AM, Nigel Megitt <nigel.megitt@bbc.co.uk<mailto:nigel.megitt@bbc.co.uk>> wrote:
Glenn, Courtney, all,

The edit to TTML2 ascribed to issue-270 and issue-335 (https://dvcs.w3.org/hg/ttml/rev/3cbc109b90bd) is causing me some concern. I have added notes to both those issues, and additionally I have a number of queries to raise for discussion:

Concerns

1. it appears to define an addition/subtraction operation on SMPTE time values even if they're discontinuous. The processing of these seems to be undefined, so they should be disallowed, shouldn't they?

I had intended to add material to deal with the discontinuous smpte mode, but it didn't get into the edit. Will add.


2. It blurs the layers of interpretation of time values from documents up into any external context. For example it opens up the ambiguity that, when a sequence of TTML documents is wrapped e.g. in ISOBMFF, there are media time offsets available both in TTML and in the wrapper, and authors may be unclear whether they are intended as independent (additive) offsets or as duplicate offsets in which one may be considered not for processing, i.e. metadata.

Since TTML doesn't know anything about external wrapper metadata, it isn't the right place to deal with such possible ambiguity (e.g., in different offset values internal and external). The correct place to deal with this is in the external spec.


Since those external specs already exist we should work in sympathy with them rather than redefining what's already there and creating confusion. Can we avoid redefining TTML so that it invalidates external wrappers that should be independent?

It depends on specifics. I need to know the exact text in an external spec that may intersect with this feature. It may also require that external spec to add a note to avoid confusion. In any case, I have not seen a worked out example of how this proposed feature would invalidate an external spec.

I suggest reviewing the definitions of MPEG 4, for example ISOBMFF in ISO/IEC 14496-12:2012, which specifies a range of generic timing constructs for aligning media in different formats, including composition time of samples (8.6.1.3), and specific mappings of the presentation time-line to the media time-line using the Edit List Box as defined in 8.6.6. This is referenced by EBU-TT-D in Tech3381 where the only additional constraint required is to define the behaviour when the contents of a document extend outside the sample period.

These constructs effectively define a media timeline, so that the only requirement on a processor is to map the time expressions in a document to the timeline defined in the wrapper. No further offsets are required in the document because they're in the wrapper.



3. It is actually the opposite proposal to the one I made in Issue-335: I've added a note there and re-opened it.

4. If clock time is prohibited from using media offset because the discontinuityOffset can not be derived in the absence of a date, then I would certainly be happy to propose the addition of a date value. A use case for this is when a TTML document is created as an archive artefact by a processor that observes some real world timed events and converts them into TTML.

My reason for excluding clock mode is because it doesn't have a related media object.

Ah, right. There may in fact be a related media object, but the temporal relationship would be indirect, and mediated by the clock rather than some other time embedded in the media.

Yes, that is a better way of saying what I intended.



5. It does nothing to address the scenario where the media time corresponding to the beginning of the related media object is known at authoring time, and is non-zero. This media begin time is distinct from, and possibly earlier than, the beginning of the contents of the TTML document.

I don't understand this statement, since this is precisely what ttp:mediaOffset does: allow the beginning of the root temporal extent to be offset either before or after the beginning of the related media object.

ttp:mediaOffset doesn't do that though: it merely allows for times in the document to be offset prior to processing. It doesn't extend the root temporal extent beyond the document's contents.

Correct, since it isn't intended to do that.

Okay, so it doesn't extend the root temporal extent but it offsets it. Looking back once more at the wording in the TTML2 draft spec it appears to specify the period between BEGIN(media) in TIME(document) and BEGIN(document). And nowhere in the spec is it required that a processor perform time calculations using it.  I'm struggling to see what the utility of this is – can you explain the use case more?

As you've suggested there appears to be a simple relationship between the mediaBegin that I've proposed and your mediaOffset:

mediaOffset = BEGIN(document) - mediaBegin

where mediaBegin is in TIME(document).

There are a couple of limitations on this:
1. You can only perform the calculation when the time base permits it, i.e. excluding SMPTE discontinuous.
2. The maximum size of mediaOffset is limited to BEGIN(document) unless you permit mediaBegin to be negative.

However on the positive side, mediaBegin could be used as the starting point in your algorithm for mapping SMPTE discontinuous markers into continuous times. Similarly, if you replace mediaDuration with mediaEnd, then:

mediaDuration = mediaEnd – mediaBegin

or if mediaEnd is not specified or is indefinite then mediaDuration resolves to indefinite as currently defined.

And mediaEnd would be usable as the end marker for the mapping of SMPTE discontinuous markers into continuous times.

Since mediaBegin and mediaEnd aren't required for general presentation processing, and appear to have no effect on any computed time values within the document, it may be appropriate to make them metadata rather than parameters. There's prior work here: EBU-TT (Tech3350) [] and it's predecessor binary format STL both support metadata ebuttm:documentStartOfProgramme as time code.




I'm puzzled by this: in your ISD generation use case, if the TTML document were untimed but you knew ttp:mediaOffset then how would you derive the begin time of the first ISD?

SMIL semantics dictates that an unspecified begin time resolve to 0, for both par and seq parents. ttp:mediaOffset doesn't have any role any resolving active begin/end for document elements. It only comes into play when synchronizing document time coordinates with media time coordinates.

I'm unclear from the current spec wording what exactly a presentation processor should do with the value, even when it does come into play.

ttp:mediaBegin would define the begin time of the first possible ISD without further calculation, unless you also want to map the times into another time base.

But that isn't something I'm trying to do here. Indeed, I'm saying we can't do that without changing SMIL semantics. "the begin time of the first possible ISD without further calculation [in the document time base]" is always 0.

Surely we're free to define an extra constraint in the special case that the author has extra knowledge about the media, to set the first possible ISD begin time to a later point. It's extremely similar in principle to permitting:

<body begin="100s" timeContainer="par" …>
<div begin="5s" …> … </div>
</body>

to generate an empty ISD from 100s to 105s, which would be an additional feature compared to now, when if there's no content flowed into such an ISD then it would not exist.

There are two distinct one-dimensional temporal coordinate spaces here that are potentially related:

  *   document's temporal coordinate space, call this TIME(document)
     *   origin is at ORIGIN(document), which is always ZERO (0)
     *   has begin time BEGIN(document)
     *   has explicit or implied duration of DUR(document)
     *   so root temporal extent is always the open interval:
        *   [ 0, DUR(document) )
  *   related media object's temporal coordinate space, call this TIME(media)
     *   origin is at ORIGIN(media)
     *   has begin time BEGIN(media)
     *   has explicit or implied duration of DUR(media)
     *   so media temporal extent is always the open interval:
        *   [ BEGIN(media), BEGIN(media) + DUR(media) )

Since TIME(media) may have a different play rate or frame rate to TIME(document) I think we need to introduce the concept of evaluation time of this parameter, since conversion between the document time base and the media time base may only be achievable by a simple addition at one instant.

I agree that the play rate of TIME(media) and TIME(document) could be different, a point mentioned in a few notes in the current spec text:

Yes, the current spec text was my reference for the term play rate.


6.2.1 ttp:timeBase

Note:

When using a media time base, if that time base is paused or scaled positively or negatively, i.e., the media play rate is not unity, then it is expected that the presentation of associated Timed Text content will be similarly paused, accelerated, or decelerated, respectively. The means for controlling an external media time base is outside the scope of this specification.

Appendix N Time Expression Semantics

Note:

The phrase play rate as used below is intended to model a (possibly variable) parameter in the document processing context wherein the rate of playback (or interpretation) of time may artificially dilated or narrowed, for example, when slowing down or speeding up the rate of playback of a related media object. Without loss of generality, the following discussion assumes a fixed play(back) rate. In the case of variable play rates, appropriate adjustments may need to be made to the resulting computations.

Appendix N.1 Clock Time Base

Note:

That is to say, timing is disconnected from (not necessarily proportional to) media time when the clock time base is used. For example, if the media play rate is zero (0), media playback is suspended; however, timing coordinates will continue to advance according to the natural progression of clock time in direct proportion to the reference clock base. Furthermore, if the media play rate changes during playback, presentation timing is not affected.

However, at present, this text basically states (informatively) or in the smpte case assumes:

  *   for clock time base, RATE(TIME(document)) is fixed as 1X real time, independently of RATE(TIME(media))
  *   for media time base, RATE(TIME(document)) = RATE(TIME(media))
  *   for smpte time base, it doesn't say anything special, but one can infer that the same interpretation applies as for media time base (in either continuous or discontinuous modes)

In any case, I don't want the interpretation of the proposed parameter to depend upon differences in play rates.

Agreed.

However  the play rate of the media may not be known, so I've assumed that any time base mapping must be external to the document, and that what we need to do to ensure that BEGIN(document) aligns with the right point in the media's temporal coordinate space is to define a known fixed datum in the media, in the document's time base, and require the processor to map the temporal coordinate spaces.

The intent of ttp:mediaOffset is to express the delta between BEGIN(document) and BEGIN(media):

That's not what I expect from a parameter called mediaOffset – I'd certainly been reading it as ORIGIN(document) - ORIGIN(media).

The problem with this is that BEGIN(media) - ORIGIN(media) is unknown and arbitrary, and, further, shouldn't affect synchronization IMO. It certainly wouldn't affect synchronization in clock time base, media time base, or continuous smpte time base. However, in the case of discontinuous smpte time base, special treatment is needed for using/interpreting ttp:mediaOffset, the same special treatment that is required for converting a discontinuous smpte time base document to an ISD sequence, something I have not yet documented in the spec, for which the basic approach I am thinking of is as follows:

Convert Discontinuous SMPTE Time Base Document to Media Time Base Document

(1) reset MEDIATIMER to 0; initialize MAPPINGS to empty set;
(2) simultaneously start playback of related media object at 1X play rate and start MEDIATIMER at 1X real time;

As an alternative, start playback of related media object, and start MEDIATIMER when the mediaBegin marker is observed in the related media's timecode. This allows for material such as clock, bars etc that are likely to be present in the media to be ignored reliably.

(3) when encountering a SMPTE time label in related media object, record the current value of MEDIATIMER and save the pair <SMPTE time label, MEDIATIMER value> in MAPPINGS;
(4) if playback is not complete, go to (3);

Or if the mediaEnd marker has not been observed and there's more media remaining, go to (3).

(5) visit each time expression T in document, performing following steps:
(6) if T is in MAPPINGS, then rewrite T (in document) to MAPPINGS.get(T) and continue at (5);
(7) otherwise (T has no mapping), either abort due to mapping error or use a fallback mapping (TBD), e.g., mapping of "closest" label that does map;


  *   if ttp:mediaOffset > 0, then BEGIN(document) temporally follows BEGIN(media)
  *   if ttp:mediaOffset < 0, then BEGIN(document) temporally precedes BEGIN(media)

Note that this definition is arbitrary: we could invert the meaning if we wish. In any case, the current language decodes as follows:

Given ttp:mediaOffset = +10s, then <body begin="5s"/> means that body starts at 15s after BEGIN(media).

That seems to be an offset of ORIGIN(document) - ORIGIN(media)

Let's work out the example using your interpretation of "offset" where we choose an arbitrary BEGIN(media) of 7s in TIME(media), and further assuming that media and document play rates match:

Given

(1) BEGIN(media) = 7s in TIME(media)
(2) BEGIN(body) = 5s in TIME(document)
(3) mediaOffset = 10s = ORIGIN(document) - ORIGIN(media)

Yields

BEGIN(body) in TIME(media) = ORIGIN(media) + mediaOffset + BEGIN(body) = 0s + 10s + 5s = 15s in TIME(media), which is the same as BEGIN(media) + 8s

and, now, let's change BEGIN(media) to another value, say 13s, so we end up with 0s + 10s + 5s = 15s in TIME(media), which is the same as BEGIN(media) + 2s

Hmmm possibly we're interpreting BEGIN(media) differently. I had thought you meant the time of start of media playback in TIME(media) but on thinking it through more I now think you mean the time of start of media regardless of start of playback. So when you say 'let's change BEGIN(media) to another value' am I right in thinking you mean 'consider another piece of media whose BEGIN(media) is another value' rather than 'consider playing back the same media from a different start point'?

If you mean start of playback of the same media, then this is exactly the right behaviour: you started the media 6s later and the body therefore began 6s earlier, relatively. But it's a bit contrived, since the normal workflow is to start with the media and create the captions/subtitles, and it's common in delivery standards for different media to start with a similar timecode (e.g. "10:00:00" as per my example) and for the captions to start at different times relative to that, dependent on when the dialogue commences.

It would be more likely that BEGIN(body) varies for different media assets even if BEGIN(media) is the same for each of those media assets. Anyhow…


However, using my interpretation of "offset" using the same info, we have:

Given

(1) BEGIN(media) = 7s in TIME(media)
(2) BEGIN(body) = 5s in TIME(document)
(3) mediaOffset = 10s = BEGIN(document) - BEGIN(media)

Just to check I've understood step 3, you mean:
BEGIN(document) = BEGIN(body) in TIME(document), and
BEGIN(media) = BEGIN(media) in TIME(media)?



Yields

BEGIN(body) in TIME(media) = BEGIN(media) + mediaOffset + BEGIN(body) = 7s + 10s + 5s = 22s in TIME(media), which is the same as BEGIN(media) + 15s

and, now, let's change BEGIN(media) to another value, say 13s, we end up with 13s + 10s + 5s = 28s, which is the same as BEGIN(media) + 15s (still)

This "change BEGIN(media) to another value" has given me pause for thought. I can think of three possible meanings:

1. If you mean start playback at another time [and call it BEGIN(media)], this would be highly undesirable: the consequence of starting playback at a different place in the media is that the timings of all the captions/subtitles move, and presumably are no longer aligned. So whereas the first one matched up at authoring time it is now 6s later relative to the media.

2. If you mean 'there's another piece of media with a different start time in TIME(media)' then okay, you've ended up with the same value, but what's the advantage of that? Is it important to have this consistent across different media and documents that have the same mediaOffset?

3. If you mean that a new rendition of the same media is created, but with a different BEGIN(media) time, and this way the same TTML document can be used to play back captions for it, without any change in the TTML document, but with an externally specified BEGIN(media) that may vary? In this use case it's not meaningful for BEGIN(document) to be later than BEGIN(media) (comparing in the same time base) so the offset is always positive or zero (or negative or zero if you define it the other way around). In TTML1 it's assumed to be zero. Again, I'm struggling to see the benefit of permitting other offset values. Plus, there's no guarantee that in creating a new rendition the same time base has been used – the rate of playback and eventual duration may have been tweaked for example, as would happen if a 30fps video were played back at 25fps without changing the number of frames. There are more unknowns than BEGIN(media).

However I can see the benefit of omitting both mediaOffset and mediaBegin if you have some external knowledge of BEGIN(media) and you need the same TTML document to play back against renditions that have been given different values for BEGIN(media), e.g. because they've been striped with different timecode or the opening sequence has had some more material prepended to it, or some unwanted material removed. It would be much easier to keep the same TTML document and include the media begin/offset information in a wrapper in this scenario.


So, given your interpretation, BEGIN(body) in TIME(media) is dependent on BEGIN(media), while in my interpretation, BEGIN(body) remains constant with respect to BEGIN(media). When reasoning about timing as an author, I would clearly want to use BEGIN(media) and not ORIGIN(media) as the fixed datum. But this preference is based on using time expressions related to BEGIN(media) as opposed to ORIGIN(media), about which see more below.

Agreed, having a fixed known document time that will be related to BEGIN(media) at playback makes sense.


that must be evaluated one time only, at BEGIN(document) or 5s in the document.

Not if play rates match or if you use my interpretation (see more below).


This is still problematic, since it's content dependent. Consider that two videos Va and Vb both have continuous timecode where the beginning of the programme is at 10:00:00.

I interpret this, for Va and Vb, as BEGIN(media) = 36000s in TIME(media)

Yes, if you convert to s that's what you get.


Va has dialogue and a corresponding TTML document Ta such that BEGIN(Ta) = 10:01:00

I interpret this as mediaOffset = -36000s and BEGIN(Ta) = 36060s in TIME(document), which maps to BEGIN(media) + mediaOffset + BEGIN(body) = 36060s in TIME(media)

How about if we define it as:

mediaOffset = BEGIN(Va) in TIME(Ta) - BEGIN(Ta) = -60s, and BEGIN(Ta) = 36060s in TIME(Ta).

Mapping BEGIN(Ta) to TIME(Va) is BEGIN(Va) in TIME(Va) - mediaOffset, which happens to be 36000 - -60 = 36060s.
If BEGIN(Va) in TIME(Va) happened to be reset to, let's say, 0, then BEGIN(Ta) in TIME(Va) would be 0 - -60 = 60s, which is still the expected result.


and Vb has Tb where BEGIN(Tb)=10:05:00.

I interpret this as mediaOffset = -36000s and BEGIN(Ta) = 36300s in TIME(document), which maps to BEGIN(media) + mediaOffset + BEGIN(body) = 36300s in TIME(media)

As previous example, I'd have expected mediaOffset = -300s.


I would state that the more useful parameter would be identical in both documents, i.e. mediaBegin="10:00:00", so that any processor can start the effective clock (e.g. a frame counter) ticking at the same point, rather than having to evaluate at the arbitrary point that is BEGIN(document).

The problem here is that this document is authored such that time expressions are not related to BEGIN(media), but rather, related to ORIGIN(media). TTML, being based on SMIL, basically assumes that time is expressed in relation to BEGIN(related media object), and not ORIGIN(related media object). This follows from how time expressions on children of a par time container are relative to the begin time of their parent par container, and not the origin of the time base of their parent.

Our different positions on this issue appear to relate to which mode we think of as being normal. For me, the example you describe is abnormal from a SMIL timing perspective, whereas apparently the converse is true for you.

I'm not sure I agree here: I think it's more to do with where we think the mapping between TIME(media) and TIME(document) should occur – inside the TTML processor or externally. You seem to want to be able to do it internally whereas I think that it can (or maybe should) only be done externally, albeit by a media player that also happens to contain a TTML processor.

Since the document time is expressed on a timeline internal to TTML/SMIL but the media time may use any other timeline, it's hard to make any assumption about the mapping other than that we require the media and document play rate to be identical in real terms.

In my mental model, time expressions in TTML stay fixed with respect to BEGIN(media), while in yours, they apparently stay fixed with respect to ORIGIN(media). In my model, the use of timeOffset is independent of BEGIN(media) while in your model it is dependent on BEGIN(media).

In TTML N.3:

"S = (countedFrames - droppedFrames + (subFrames / subFrameRate)) / effectiveFrameRate "

This doesn't include any reference begin time other than the origin. So yes, time expressions are related to ORIGIN(document) in TTML.

And I would agree that when mapped to TIME(media) any timings relative to BEGIN(media) need to stay constant. But the value mappings for achieving this aren't within our control since we don't in general know about TIME(media).

So, to translate between our mental models, we have:

From BEGIN(media) relative to ORIGIN(media) relative time expressions:

add BEGIN(media)

From ORIGIN(media) relative to BEGIN(media) relative time expressions:

subtract BEGIN(media)

Now, however, let's look at the situation when play rates differ, i.e., RATE(TIME(document)) != RATE(TIME(media)). As an example, let's say that RATE(TIME(media)) / RATE(TIME(document)) is 2, i.e., we run media time at twice the rate of document time. So, going back to my earlier numbers:

Given

(1) RATE(TIME(media)) = 2, RATE(TIME(document)) = 1, EPOCH(TIME(media)) = EPOCH(TIME(document))
(2) BEGIN(media) = 7s in TIME(media), or 3.5s in TIME(real)
(3) BEGIN(body) = 5s in TIME(document), or 5s in TIME(real)
(4) mediaOffset = 10s = ORIGIN(document) in TIME(real) - ORIGIN(media) in TIME(real)

Yields

BEGIN(body) in TIME(real) = ORIGIN(media) + mediaOffset + BEGIN(body) = 0s + 10s + 5s = 15s in TIME(real), which is the same as BEGIN(media) + 11.5s in TIME(real)

Now, let's change BEGIN(media) to another value, say 13s:

As above, I don't know what this change of BEGIN(media) is supposed to signify, and it seems like it might be important.


Given

(1) RATE(TIME(media)) = 2, RATE(TIME(document)) = 1, EPOCH(TIME(media)) = EPOCH(TIME(document))
(2) BEGIN(media) = 13s in TIME(media), or 6.5s in TIME(real)
(3) BEGIN(body) = 5s in TIME(document), or 5s in TIME(real)
(4) mediaOffset = 10s = ORIGIN(document) in TIME(real) - ORIGIN(media) in TIME(real)

Yields

BEGIN(body) in TIME(real) = ORIGIN(media) + mediaOffset + BEGIN(body) = 0s + 10s + 5s = 15s in TIME(real), which is the same as BEGIN(media) + 8.5s in TIME(real)

I agree it certainly would not be desirable for the mapped begin time of the document relative to the media not to scale linearly but be dependent on the size of the time value used, e.g. if the difference between 90 and 100 were not equal in real terms to the difference between 1090 and 1100.


However, using my interpretation of "offset" using the same info, we have:

Given

(1) RATE(TIME(media)) = 2, RATE(TIME(document)) = 1, EPOCH(TIME(media)) = EPOCH(TIME(document))
(2) BEGIN(media) = 7s in TIME(media), or 3.5s in TIME(real)
(3) BEGIN(body) = 5s in TIME(document), or 5s in TIME(real)
(4) mediaOffset = 10s = BEGIN(document) in TIME(real) - BEGIN(media) in TIME(real)

Yields

BEGIN(body) in TIME(real) = BEGIN(media) + mediaOffset + BEGIN(body) = 3.5s + 10s + 5s = 18.5s in TIME(real), which is the same as BEGIN(media) + 15s in TIME(real)

Now, let's change BEGIN(media) to another value, say 13s:

Given

(1) RATE(TIME(media)) = 2, RATE(TIME(document)) = 1, EPOCH(TIME(media)) = EPOCH(TIME(document))
(2) BEGIN(media) = 13s in TIME(media), or 6.5s in TIME(real)
(3) BEGIN(body) = 5s in TIME(document), or 5s in TIME(real)
(4) mediaOffset = 10s = BEGIN(document) in TIME(real) - BEGIN(media) in TIME(real)

Yields

BEGIN(body) in TIME(real) = BEGIN(media) + mediaOffset + BEGIN(body) = 6.5s + 10s + 5s = 21.5s in TIME(real), which is the same as BEGIN(media) + 15s in TIME(real)

Notice that by using my interpretation of mediaOffset, differing play rates do not affect the relationship between BEGIN(body) and BEGIN(media), which stays constant.


My proposed ttp:mediaBegin would have the value "10:00:00"

I agree from your example that an expression of 10:00:00 describes the delta between BEGIN(media) and ORIGIN(media), but this is only useful in cases where time expressions are related to ORIGIN(media) and not BEGIN(media).

Since I defined it in TIME(document) not in TIME(media) the time expressions are related to the origin.

In the mediaOffset formalism I defined, this value, i.e., BEGIN(media) - ORIGIN(media), is of no utility. Namely, if mediaOffset is specified as I defined it, i.e., with your example as mediaOffset="-36000s" or mediaOffset="-10h", then you don't have to worry about either play rate differences or changes in actual BEGIN(media), since the result is time expressions always related to BEGIN(media).

I think we're trying to achieve the same end result, i.e. that the text appears at the right time relative to the media despite some (as yet unstated) set of transformations. My approach also relates time expressions to BEGIN(media), except in TIME(document) and therefore unavoidably with reference to ORIGIN(document). Yours goes further, including a translation into TIME(media), which I don't believe we should do.


in these cases, and not mix in the concept of mapping between the document's temporal coordinates and the related media's temporal coordinates.


The play rate in the document's time base is well defined as now. It's reasonable to assume that any media playback device knows when the related media begins and what it's play rate is.

Or, given ttp:mediaOffset = -5s, then <body begin="5s"/> means that body starts at BEGIN(media).

Given this formalism, we don't really care about BEGIN(media) - ORIGIN(media).

Agreed. What we care about is BEGIN(media) in the temporal coordinate space of the document, or in your useful terminology, in TIME(document).


Now, if you are suggesting an alternative use case where ORIGIN(document) != 0 in the TIME(document) coordinate space, then that is something I haven't considered, and certainly did not intend to address. Indeed, doing so would be problematic since SMIL timing semantics assumes that unspecified begin defaults to 0s, and further, that 0s corresponds to ORIGIN(document).

I'm not suggesting that ORIGIN(document) !=0 in TIME(document), since that would as you say create a whole bunch of other problems.


My response to such a proposed use case would probably be: we don't support it, you don't need to do it anyway, so don't do it.

Note that the above considerations assume that time base is media, or that time base is smpte continuous mode, or that time base is smpte discontinuous mode and that all smpte time events have been converted to equivalent smpte continuous mode values, e.g., by playing back a media object in 1X normal play mode and recording the PTS time that corresponds with each frame associated with a smpte time label.

Just for completeness (at the expense of being repetitious), did you also assume that the media play rate is identical to the document's play rate, i.e. that the only difference between TIME(media) and TIME(document) is an additive offset?

See above.



Proposals

I would propose a resolution to points 1, 2, 3 and 5 that is to remove mediaOffset and add a ttp:mediaBegin parameter, expressed in the same time base as the document's ttp:timeBase parameter. This also fits better with ttp:mediaDuration.

Hmmm. I'm not inclined to make this change, because mentally I see mediaOffset as expressing a difference/delta/offset between two points in two different one-dimensional coordinate spaces both representing linear time (at 1X play rate). Calling it mediaBegin implies in my mind BEGIN(media), i.e., the delta between BEGIN(media) and ORIGIN(media), and not the delta between BEGIN(document) and BEGIN(media).

If this is just about the name we choose for the parameter then we're right to choose carefully, but it shouldn't prevent us from agreeing the semantics. To my mind mediaBegin does suggest the delta between BEGIN(document) and BEGIN(media), both in TIME(document). Whereas to me mediaOffset suggests the delta between ORIGIN(document) in TIME(document) and ORIGIN(media) in TIME(??? - this is not clear), which if I understand correctly isn't what you intend. Or if it is what you intend it doesn't seem to be a complete solution for the problem.

I would additionally propose allowing dates to be specified to use in relation to clock times to resolve point 4, perhaps with a ttp:date parameter, valid only when ttp:timeBase="clock". Note that this does not resolve any time comparison issues caused by documents whose times cross midnight and wrap back round to a smaller number of hours.

Again, I'm wondering what is the related media object? To my recollection, ttp:timeBase="clock" was added to TTML to handle timed text cases that don't have a related media object.

It would be a media object that had also been captured with reference to a clock.




Are there other related use cases or requirements not met by these proposals?

Kind regards,

Nigel
Received on Wednesday, 24 September 2014 15:30:38 UTC