Re: Issue-270 and Issue-335 from Cyril Concolato on 2014-09-24 (public-tt@w3.org from September 2014)

From: Cyril Concolato <cyril.concolato@telecom-paristech.fr>
Date: Wed, 24 Sep 2014 17:50:23 +0200
To: public-tt@w3.org
Message-ID: <5422E83F.1030709@telecom-paristech.fr>
Le 24/09/2014 17:30, Nigel Megitt a écrit :
> Sorry, forgot the links:
>
> EBU-TT, EBU Tech 3350: https://tech.ebu.ch/docs/tech/tech3350.pdf 
> <https://tech.ebu.ch/docs/tech/tech3350.pdf>
> CARRIAGE OF EBU-TT-D IN ISOBMFF, EBU Tech3381: 
> https://tech.ebu.ch/docs/tech/tech3381.pdf
>
> There's no straight link to ISO/IEC14496-12:2012 as you have to buy it 
> from a shop :-(
This is not correct. ISO/IEC14496-12:2012 as well as the first 
corrigendum and the 2 amendments, including the one useful for the 
carriage of timed text are freely available from here 
http://standards.iso.org/ittf/PubliclyAvailableStandards/index.html .

Unfortunately, the spec for the carriage of timed text itself 
(14496-30:2014) is not freely available, but I'm working on making it 
publicly available.

Cyril

>
>
>
> From: Nigel Megitt <nigel.megitt@bbc.co.uk 
> <mailto:nigel.megitt@bbc.co.uk>>
> Date: Wednesday, 24 September 2014 17:24
> To: Glenn Adams <glenn@skynav.com <mailto:glenn@skynav.com>>
> Cc: Timed Text Working Group <public-tt@w3.org <mailto:public-tt@w3.org>>
> Subject: Re: Issue-270 and Issue-335
> Resent-From: <public-tt@w3.org <mailto:public-tt@w3.org>>
> Resent-Date: Wednesday, 24 September 2014 17:24
>
>     Glenn Adams <glenn@skynav.com <mailto:glenn@skynav.com>>, Tuesday,
>     23 September 2014 20:25 wrote:
>
>         On Tue, Sep 23, 2014 at 4:15 AM, Nigel Megitt
>         <nigel.megitt@bbc.co.uk <mailto:nigel.megitt@bbc.co.uk>> wrote:
>
>             Glenn Adams <glenn@skynav.com <mailto:glenn@skynav.com>>,
>             Monday, 22 September 2014 22:14 wrote:
>
>                 On Mon, Sep 22, 2014 at 8:38 AM, Nigel Megitt
>                 <nigel.megitt@bbc.co.uk
>                 <mailto:nigel.megitt@bbc.co.uk>> wrote:
>
>                     Glenn, Courtney, all,
>
>                     The edit to TTML2 ascribed to issue-270 and
>                     issue-335
>                     (https://dvcs.w3.org/hg/ttml/rev/3cbc109b90bd) is
>                     causing me some concern. I have added notes to
>                     both those issues, and additionally I have a
>                     number of queries to raise for discussion:
>
>                     _Concerns_
>                     _
>                     _
>                     1. it appears to define an addition/subtraction
>                     operation on SMPTE time values even if they're
>                     discontinuous. The processing of these seems to be
>                     undefined, so they should be disallowed, shouldn't
>                     they?
>
>
>                 I had intended to add material to deal with the
>                 discontinuous smpte mode, but it didn't get into the
>                 edit. Will add.
>
>
>                     2. It blurs the layers of interpretation of time
>                     values from documents up into any external
>                     context. For example it opens up the ambiguity
>                     that, when a sequence of TTML documents is wrapped
>                     e.g. in ISOBMFF, there are media time offsets
>                     available both in TTML and in the wrapper, and
>                     authors may be unclear whether they are intended
>                     as independent (additive) offsets or as duplicate
>                     offsets in which one may be considered not for
>                     processing, i.e. metadata.
>
>
>                 Since TTML doesn't know anything about external
>                 wrapper metadata, it isn't the right place to deal
>                 with such possible ambiguity (e.g., in different
>                 offset values internal and external). The correct
>                 place to deal with this is in the external spec.
>
>
>             Since those external specs already exist we should work in
>             sympathy with them rather than redefining what's already
>             there and creating confusion. Can we avoid redefining TTML
>             so that it invalidates external wrappers that should be
>             independent?
>
>
>         It depends on specifics. I need to know the exact text in an
>         external spec that may intersect with this feature. It may
>         also require that external spec to add a note to avoid
>         confusion. In any case, I have not seen a worked out example
>         of how this proposed feature would invalidate an external spec.
>
>
>     I suggest reviewing the definitions of MPEG 4, for example ISOBMFF
>     in ISO/IEC 14496-12:2012, which specifies a range of generic
>     timing constructs for aligning media in different formats,
>     including composition time of samples (8.6.1.3), and specific
>     mappings of the presentation time-line to the media time-line
>     using the Edit List Box as defined in 8.6.6. This is referenced by
>     EBU-TT-D in Tech3381 where the only additional constraint required
>     is to define the behaviour when the contents of a document extend
>     outside the sample period.
>
>     These constructs effectively define a media timeline, so that the
>     only requirement on a processor is to map the time expressions in
>     a document to the timeline defined in the wrapper. No further
>     offsets are required in the document because they're in the wrapper.
>
>
>
>                     3. It is actually the opposite proposal to the one
>                     I made in Issue-335: I've added a note there and
>                     re-opened it.
>
>                     4. If clock time is prohibited from using media
>                     offset because the discontinuityOffset can not be
>                     derived in the absence of a date, then I would
>                     certainly be happy to propose the addition of a
>                     date value. A use case for this is when a TTML
>                     document is created as an archive artefact by a
>                     processor that observes some real world timed
>                     events and converts them into TTML.
>
>
>                 My reason for excluding clock mode is because it
>                 doesn't have a related media object.
>
>
>             Ah, right. There may in fact be a related media object,
>             but the temporal relationship would be indirect, and
>             mediated by the clock rather than some other time embedded
>             in the media.
>
>
>         Yes, that is a better way of saying what I intended.
>
>
>
>                     5. It does nothing to address the scenario where
>                     the media time corresponding to the beginning of
>                     the related media object is known at authoring
>                     time, and is non-zero. This media begin time is
>                     distinct from, and possibly earlier than, the
>                     beginning of the contents of the TTML document.
>
>
>                 I don't understand this statement, since this is
>                 precisely what ttp:mediaOffset does: allow the
>                 beginning of the root temporal extent to be offset
>                 either before or after the beginning of the related
>                 media object.
>
>
>             ttp:mediaOffset doesn't do that though: it merely allows
>             for times in the document to be offset prior to
>             processing. It doesn't extend the root temporal extent
>             beyond the document's contents.
>
>
>         Correct, since it isn't intended to do that.
>
>
>     Okay, so it doesn't extend the root temporal extent but it offsets
>     it. Looking back once more at the wording in the TTML2 draft spec
>     it appears to specify the period between BEGIN(media) in
>     TIME(document) and BEGIN(document). And nowhere in the spec is it
>     required that a processor perform time calculations using it.  I'm
>     struggling to see what the utility of this is – can you explain
>     the use case more?
>
>     As you've suggested there appears to be a simple relationship
>     between the mediaBegin that I've proposed and your mediaOffset:
>
>     mediaOffset = BEGIN(document) - mediaBegin
>
>     where mediaBegin is in TIME(document).
>
>     There are a couple of limitations on this:
>     1. You can only perform the calculation when the time base permits
>     it, i.e. excluding SMPTE discontinuous.
>     2. The maximum size of mediaOffset is limited to BEGIN(document)
>     unless you permit mediaBegin to be negative.
>
>     However on the positive side, mediaBegin could be used as the
>     starting point in your algorithm for mapping SMPTE discontinuous
>     markers into continuous times. Similarly, if you replace
>     mediaDuration with mediaEnd, then:
>
>     mediaDuration = mediaEnd – mediaBegin
>
>     or if mediaEnd is not specified or is indefinite then
>     mediaDuration resolves to indefinite as currently defined.
>
>     And mediaEnd would be usable as the end marker for the mapping of
>     SMPTE discontinuous markers into continuous times.
>
>     Since mediaBegin and mediaEnd aren't required for general
>     presentation processing, and appear to have no effect on any
>     computed time values within the document, it may be appropriate to
>     make them metadata rather than parameters. There's prior work
>     here: EBU-TT (Tech3350) [] and it's predecessor binary format STL
>     both support metadata ebuttm:documentStartOfProgramme as time code.
>
>
>
>             I'm puzzled by this: in your ISD generation use case, if
>             the TTML document were untimed but you knew
>             ttp:mediaOffset then how would you derive the begin time
>             of the first ISD?
>
>
>         SMIL semantics dictates that an unspecified begin time resolve
>         to 0, for both par and seq parents. ttp:mediaOffset doesn't
>         have any role any resolving active begin/end for document
>         elements. It only comes into play when synchronizing document
>         time coordinates with media time coordinates.
>
>
>     I'm unclear from the current spec wording what exactly a
>     presentation processor should do with the value, even when it does
>     come into play.
>
>             ttp:mediaBegin would define the begin time of the first
>             possible ISD without further calculation, unless you also
>             want to map the times into another time base.
>
>
>         But that isn't something I'm trying to do here. Indeed, I'm
>         saying we can't do that without changing SMIL semantics. "the
>         begin time of the first possible ISD without further
>         calculation [in the document time base]" is always 0.
>
>
>     Surely we're free to define an extra constraint in the special
>     case that the author has extra knowledge about the media, to set
>     the first possible ISD begin time to a later point. It's extremely
>     similar in principle to permitting:
>
>     <body begin="100s" timeContainer="par" …>
>     <div begin="5s" …> … </div>
>     </body>
>
>     to generate an empty ISD from 100s to 105s, which would be an
>     additional feature compared to now, when if there's no content
>     flowed into such an ISD then it would not exist.
>
>                 There are two distinct one-dimensional temporal
>                 coordinate spaces here that are potentially related:
>
>                   * document's temporal coordinate space, call this
>                     TIME(document)
>                       o origin is at ORIGIN(document), which is always
>                         ZERO (0)
>                       o has begin time BEGIN(document)
>                       o has explicit or implied duration of DUR(document)
>                       o so root temporal extent is always the open
>                         interval:
>                           + [ 0, DUR(document) )
>                   * related media object's temporal coordinate space,
>                     call this TIME(media)
>                       o origin is at ORIGIN(media)
>                       o has begin time BEGIN(media)
>                       o has explicit or implied duration of DUR(media)
>                       o so media temporal extent is always the open
>                         interval:
>                           + [ BEGIN(media), BEGIN(media) + DUR(media) )
>
>             Since TIME(media) may have a different play rate or frame
>             rate to TIME(document) I think we need to introduce the
>             concept of evaluation time of this parameter, since
>             conversion between the document time base and the media
>             time base may only be achievable by a simple addition at
>             one instant.
>
>
>         I agree that the play rate of TIME(media) and TIME(document)
>         could be different, a point mentioned in a few notes in the
>         current spec text:
>
>
>     Yes, the current spec text was my reference for the term play rate.
>
>
>         *6.2.1 ttp:timeBase*
>
>         *Note:*
>
>         When using a media time base, if that time base is paused or
>         scaled positively or negatively, i.e., the media play rate is
>         not unity, then it is expected that the presentation of
>         associated Timed Text content will be similarly paused,
>         accelerated, or decelerated, respectively. The means for
>         controlling an external media time base is outside the scope
>         of this specification.
>
>         *Appendix N Time Expression Semantics*
>
>         *Note:*
>
>         The phrase /play rate/ as used below is intended to model a
>         (possibly variable) parameter in the document processing
>         context wherein the rate of playback (or interpretation) of
>         time may artificially dilated or narrowed, for example, when
>         slowing down or speeding up the rate of playback of a related
>         media object <#148a2b877da9f8c4_terms-related-media-object>.
>         Without loss of generality, the following discussion assumes a
>         fixed play(back) rate. In the case of variable play rates,
>         appropriate adjustments may need to be made to the resulting
>         computations.
>
>         *Appendix N.1 Clock Time Base*
>
>         *Note:*
>
>         That is to say, timing is disconnected from (not necessarily
>         proportional to) media time when the |clock| time base is
>         used. For example, if the media play rate is zero (0), media
>         playback is suspended; however, timing coordinates will
>         continue to advance according to the natural progression of
>         clock time in direct proportion to the reference clock base.
>         Furthermore, if the media play rate changes during playback,
>         presentation timing is not affected.
>
>         However, at present, this text basically states
>         (informatively) or in the smpte case assumes:
>
>           * for clock time base, RATE(TIME(document)) is fixed as 1X
>             real time, independently of RATE(TIME(media))
>           * for media time base, RATE(TIME(document)) = RATE(TIME(media))
>           * for smpte time base, it doesn't say anything special, but
>             one can infer that the same interpretation applies as for
>             media time base (in either continuous or discontinuous modes)
>
>         In any case, I don't want the interpretation of the proposed
>         parameter to depend upon differences in play rates.
>
>
>     Agreed.
>
>             However  the play rate of the media may not be known, so
>             I've assumed that any time base mapping must be external
>             to the document, and that what we need to do to ensure
>             that BEGIN(document) aligns with the right point in the
>             media's temporal coordinate space is to define a known
>             fixed datum in the media, in the document's time base, and
>             require the processor to map the temporal coordinate spaces.
>
>                 The intent of ttp:mediaOffset is to express the delta
>                 between BEGIN(document) and BEGIN(media):
>
>
>             That's not what I expect from a parameter called
>             mediaOffset – I'd certainly been reading it as
>             ORIGIN(document) - ORIGIN(media).
>
>
>         The problem with this is that BEGIN(media) - ORIGIN(media) is
>         unknown and arbitrary, and, further, shouldn't affect
>         synchronization IMO. It certainly wouldn't affect
>         synchronization in clock time base, media time base, or
>         continuous smpte time base. However, in the case of
>         discontinuous smpte time base, special treatment is needed for
>         using/interpreting ttp:mediaOffset, the same special treatment
>         that is required for converting a discontinuous smpte time
>         base document to an ISD sequence, something I have not yet
>         documented in the spec, for which the basic approach I am
>         thinking of is as follows:
>
>         *Convert Discontinuous SMPTE Time Base Document to Media Time
>         Base Document*
>
>         (1) reset MEDIATIMER to 0; initialize MAPPINGS to empty set;
>         (2) simultaneously start playback of related media object at
>         1X play rate and start MEDIATIMER at 1X real time;
>
>
>     As an alternative, start playback of related media object, and
>     start MEDIATIMER when the mediaBegin marker is observed in the
>     related media's timecode. This allows for material such as clock,
>     bars etc that are likely to be present in the media to be ignored
>     reliably.
>
>         (3) when encountering a SMPTE time label in related media
>         object, record the current value of MEDIATIMER and save the
>         pair <SMPTE time label, MEDIATIMER value> in MAPPINGS;
>         (4) if playback is not complete, go to (3);
>
>
>     Or if the mediaEnd marker has not been observed and there's more
>     media remaining, go to (3).
>
>         (5) visit each time expression T in document, performing
>         following steps:
>         (6) if T is in MAPPINGS, then rewrite T (in document) to
>         MAPPINGS.get(T) and continue at (5);
>         (7) otherwise (T has no mapping), either abort due to mapping
>         error or use a fallback mapping (TBD), e.g., mapping of
>         "closest" label that does map;
>
>                   * if ttp:mediaOffset > 0, then BEGIN(document)
>                     temporally follows BEGIN(media)
>                   * if ttp:mediaOffset < 0, then BEGIN(document)
>                     temporally precedes BEGIN(media)
>
>                 Note that this definition is arbitrary: we could
>                 invert the meaning if we wish. In any case, the
>                 current language decodes as follows:
>
>                 Given ttp:mediaOffset = +10s, then <body begin="5s"/>
>                 means that body starts at 15s after BEGIN(media).
>
>
>             That seems to be an offset of ORIGIN(document) - ORIGIN(media)
>
>
>         Let's work out the example using *your* interpretation of
>         "offset" where we choose an arbitrary BEGIN(media) of 7s in
>         TIME(media), */and further assuming that media and document
>         play rates match/*:
>
>         Given
>
>         (1) BEGIN(media) = 7s in TIME(media)
>         (2) BEGIN(body) = 5s in TIME(document)
>         (3) mediaOffset = 10s = *ORIGIN(document) - ORIGIN(media)*
>
>         Yields
>
>         BEGIN(body) in TIME(media) = ORIGIN(media) + mediaOffset +
>         BEGIN(body) = 0s + 10s + 5s = 15s in TIME(media), which is the
>         same as *BEGIN(media) + 8s*
>
>         and, now, let's *change BEGIN(media) to another value, say
>         13s*, so we end up with 0s + 10s + 5s = 15s in TIME(media),
>         which is the same as *BEGIN(media) + 2s*
>
>
>     Hmmm possibly we're interpreting BEGIN(media) differently. I had
>     thought you meant the time of start of media playback in
>     TIME(media) but on thinking it through more I now think you mean
>     the time of start of media regardless of start of playback. So
>     when you say 'let's change BEGIN(media) to another value' am I
>     right in thinking you mean 'consider another piece of media whose
>     BEGIN(media) is another value' rather than 'consider playing back
>     the same media from a different start point'?
>
>     If you mean start of playback of the same media, then this is
>     exactly the right behaviour: you started the media 6s later and
>     the body therefore began 6s earlier, relatively. But it's a bit
>     contrived, since the normal workflow is to start with the media
>     and create the captions/subtitles, and it's common in delivery
>     standards for different media to start with a similar timecode
>     (e.g. "10:00:00" as per my example) and for the captions to start
>     at different times relative to that, dependent on when the
>     dialogue commences.
>
>     It would be more likely that BEGIN(body) varies for different
>     media assets even if BEGIN(media) is the same for each of those
>     media assets. Anyhow…
>
>
>         However, using *my* interpretation of "offset" using the same
>         info, we have:
>
>         Given
>
>         (1) BEGIN(media) = 7s in TIME(media)
>         (2) BEGIN(body) = 5s in TIME(document)
>         (3) mediaOffset = 10s = *BEGIN(document) - BEGIN(media)*
>
>
>     Just to check I've understood step 3, you mean:
>     *BEGIN(document)* = BEGIN(body) in TIME(document), and
>     *BEGIN(media)* = BEGIN(media) in TIME(media)?
>
>
>
>         Yields
>
>         BEGIN(body) in TIME(media) = BEGIN(media) + mediaOffset +
>         BEGIN(body) = 7s + 10s + 5s = 22s in TIME(media), which is the
>         same as *BEGIN(media) + 15s*
>
>         and, now, let's *change BEGIN(media) to another value, say
>         13s*, we end up with 13s + 10s + 5s = 28s, which is the same
>         as *BEGIN(media) + 15s* (still)
>
>
>     This "change BEGIN(media) to another value" has given me pause for
>     thought. I can think of three possible meanings:
>
>     1. If you mean start playback at another time [and call it
>     BEGIN(media)], this would be highly undesirable: the consequence
>     of starting playback at a different place in the media is that the
>     timings of all the captions/subtitles move, and presumably are no
>     longer aligned. So whereas the first one matched up at authoring
>     time it is now 6s later relative to the media.
>
>     2. If you mean 'there's another piece of media with a different
>     start time in TIME(media)' then okay, you've ended up with the
>     same value, but what's the advantage of that? Is it important to
>     have this consistent across different media and documents that
>     have the same mediaOffset?
>
>     3. If you mean that a new rendition of the same media is created,
>     but with a different BEGIN(media) time, and this way the same TTML
>     document can be used to play back captions for it, without any
>     change in the TTML document, but with an externally specified
>     BEGIN(media) that may vary? In this use case it's not meaningful
>     for BEGIN(document) to be later than BEGIN(media) (comparing in
>     the same time base) so the offset is always positive or zero (or
>     negative or zero if you define it the other way around). In TTML1
>     it's assumed to be zero. Again, I'm struggling to see the benefit
>     of permitting other offset values. Plus, there's no guarantee that
>     in creating a new rendition the same time base has been used – the
>     rate of playback and eventual duration may have been tweaked for
>     example, as would happen if a 30fps video were played back at
>     25fps without changing the number of frames. There are more
>     unknowns than BEGIN(media).
>
>     However I can see the benefit of /omitting/ both mediaOffset and
>     mediaBegin if you have some external knowledge of BEGIN(media) and
>     you need the same TTML document to play back against renditions
>     that have been given different values for BEGIN(media), e.g.
>     because they've been striped with different timecode or the
>     opening sequence has had some more material prepended to it, or
>     some unwanted material removed. It would be much easier to keep
>     the same TTML document and include the media begin/offset
>     information in a wrapper in this scenario.
>
>
>         So, given your interpretation, BEGIN(body) in TIME(media) is
>         dependent on BEGIN(media), while in my interpretation,
>         BEGIN(body) remains constant with respect to BEGIN(media).
>         When reasoning about timing as an author, I would clearly want
>         to use BEGIN(media) and not ORIGIN(media) as the fixed datum.
>         But this preference is based on using time expressions related
>         to BEGIN(media) as opposed to ORIGIN(media), about which see
>         more below.
>
>
>     Agreed, having a fixed known document time that will be related to
>     BEGIN(media) at playback makes sense.
>
>
>             that must be evaluated one time only, at BEGIN(document)
>             or 5s in the document.
>
>
>         Not if play rates match or if you use my interpretation (see
>         more below).
>
>
>             This is still problematic, since it's content dependent.
>             Consider that two videos Va and Vb both have continuous
>             timecode where the beginning of the programme is at 10:00:00.
>
>
>         I interpret this, for Va and Vb, as BEGIN(media) = 36000s in
>         TIME(media)
>
>
>     Yes, if you convert to s that's what you get.
>
>             Va has dialogue and a corresponding TTML document Ta such
>             that BEGIN(Ta) = 10:01:00
>
>
>         I interpret this as mediaOffset = -36000s and BEGIN(Ta) =
>         36060s in TIME(document), which maps to BEGIN(media) +
>         mediaOffset + BEGIN(body) = 36060s in TIME(media)
>
>
>     How about if we define it as:
>
>     mediaOffset = BEGIN(Va) in TIME(Ta) - BEGIN(Ta) = -60s, and
>     BEGIN(Ta) = 36060s in TIME(Ta).
>
>     Mapping BEGIN(Ta) to TIME(Va) is BEGIN(Va) in TIME(Va) -
>     mediaOffset, which happens to be 36000 - -60 = 36060s.
>     If BEGIN(Va) in TIME(Va) happened to be reset to, let's say, 0,
>     then BEGIN(Ta) in TIME(Va) would be 0 - -60 = 60s, which is still
>     the expected result.
>
>             and Vb has Tb where BEGIN(Tb)=10:05:00.
>
>
>         I interpret this as mediaOffset = -36000s and BEGIN(Ta) =
>         36300s in TIME(document), which maps to BEGIN(media) +
>         mediaOffset + BEGIN(body) = 36300s in TIME(media)
>
>
>     As previous example, I'd have expected mediaOffset = -300s.
>
>             I would state that the more useful parameter would be
>             identical in both documents, i.e. mediaBegin="10:00:00",
>             so that any processor can start the effective clock (e.g.
>             a frame counter) ticking at the same point, rather than
>             having to evaluate at the arbitrary point that is
>             BEGIN(document).
>
>
>         *The problem here is that this document is authored such that
>         time expressions are not related to BEGIN(media), but rather,
>         related to ORIGIN(media).* TTML, being based on SMIL,
>         basically assumes that time is expressed in relation to
>         BEGIN(related media object), and not ORIGIN(related media
>         object). This follows from how time expressions on children of
>         a par time container are relative to the begin time of their
>         parent par container, and not the origin of the time base of
>         their parent.
>
>         Our different positions on this issue appear to relate to
>         which mode we think of as being normal. For me, the example
>         you describe is abnormal from a SMIL timing perspective,
>         whereas apparently the converse is true for you.
>
>
>     I'm not sure I agree here: I think it's more to do with where we
>     think the mapping between TIME(media) and TIME(document) should
>     occur – inside the TTML processor or externally. You seem to want
>     to be able to do it internally whereas I think that it can (or
>     maybe should) only be done externally, albeit by a media player
>     that also happens to contain a TTML processor.
>
>     Since the document time is expressed on a timeline internal to
>     TTML/SMIL but the media time may use any other timeline, it's hard
>     to make any assumption about the mapping other than that we
>     require the media and document play rate to be identical in real
>     terms.
>
>         In my mental model, time expressions in TTML stay fixed with
>         respect to BEGIN(media), while in yours, they apparently stay
>         fixed with respect to ORIGIN(media). In my model, the use of
>         timeOffset is independent of BEGIN(media) while in your model
>         it is dependent on BEGIN(media).
>
>
>     In TTML N.3:
>
>     "|S = (countedFrames - droppedFrames + (subFrames / subFrameRate))
>     / effectiveFrameRate|"
>
>     This doesn't include any reference begin time other than the
>     origin. So yes, time expressions are related to ORIGIN(document)
>     in TTML.
>
>     And I would agree that when mapped to TIME(media) any timings
>     relative to BEGIN(media) need to stay constant. But the value
>     mappings for achieving this aren't within our control since we
>     don't in general know about TIME(media).
>
>         So, to translate between our mental models, we have:
>
>         *From BEGIN(media) relative to ORIGIN(media) relative time
>         expressions:*
>
>         add BEGIN(media)
>
>         *From ORIGIN(media) relative to BEGIN(media) relative time
>         expressions:*
>
>         subtract BEGIN(media)
>
>         *Now, however, let's look at the situation when play rates
>         differ*, i.e., RATE(TIME(document)) != RATE(TIME(media)). As
>         an example, let's say that RATE(TIME(media)) /
>         RATE(TIME(document)) is 2, i.e., we run media time at twice
>         the rate of document time. So, going back to my earlier numbers:
>
>         Given
>
>         (1) RATE(TIME(media)) = 2, RATE(TIME(document)) = 1,
>         EPOCH(TIME(media)) = EPOCH(TIME(document))
>         (2) BEGIN(media) = 7s in TIME(media), or 3.5s in TIME(real)
>         (3) BEGIN(body) = 5s in TIME(document), or 5s in TIME(real)
>         (4) mediaOffset = 10s = *ORIGIN(document) in TIME(real) -
>         ORIGIN(media) in TIME(real)*
>
>         Yields
>
>         BEGIN(body) in TIME(real) = ORIGIN(media) + mediaOffset +
>         BEGIN(body) = 0s + 10s + 5s = 15s in TIME(real), which is the
>         same as *BEGIN(media) + 11.5s in TIME(real)*
>
>         Now, let's *change BEGIN(media) to another value, say 13s*:
>
>
>     As above, I don't know what this change of BEGIN(media) is
>     supposed to signify, and it seems like it might be important.
>
>
>         Given
>
>         (1) RATE(TIME(media)) = 2, RATE(TIME(document)) = 1,
>         EPOCH(TIME(media)) = EPOCH(TIME(document))
>         (2) BEGIN(media) = 13s in TIME(media), or 6.5s in TIME(real)
>         (3) BEGIN(body) = 5s in TIME(document), or 5s in TIME(real)
>         (4) mediaOffset = 10s = *ORIGIN(document) in TIME(real) -
>         ORIGIN(media) in TIME(real)*
>
>         Yields
>
>         BEGIN(body) in TIME(real) = ORIGIN(media) + mediaOffset +
>         BEGIN(body) = 0s + 10s + 5s = 15s in TIME(real), which is the
>         same as *BEGIN(media) + 8.5s in TIME(real)*
>
>
>     I agree it certainly would not be desirable for the mapped begin
>     time of the document relative to the media not to scale linearly
>     but be dependent on the size of the time value used, e.g. if the
>     difference between 90 and 100 were not equal in real terms to the
>     difference between 1090 and 1100.
>
>
>         However, using my interpretation of "offset" using the same
>         info, we have:
>
>         Given
>
>         (1) RATE(TIME(media)) = 2, RATE(TIME(document)) = 1,
>         EPOCH(TIME(media)) = EPOCH(TIME(document))
>         (2) BEGIN(media) = 7s in TIME(media), or 3.5s in TIME(real)
>         (3) BEGIN(body) = 5s in TIME(document), or 5s in TIME(real)
>         (4) mediaOffset = 10s = *BEGIN(document) in TIME(real) -
>         BEGIN(media) in TIME(real)*
>
>         Yields
>
>         BEGIN(body) in TIME(real) = BEGIN(media) + mediaOffset +
>         BEGIN(body) = 3.5s + 10s + 5s = 18.5s in TIME(real), which is
>         the same as *BEGIN(media) + 15s in TIME(real)*
>
>         Now, let's change BEGIN(media) to another value, say 13s:
>
>         Given
>
>         (1) RATE(TIME(media)) = 2, RATE(TIME(document)) = 1,
>         EPOCH(TIME(media)) = EPOCH(TIME(document))
>         (2) BEGIN(media) = 13s in TIME(media), or 6.5s in TIME(real)
>         (3) BEGIN(body) = 5s in TIME(document), or 5s in TIME(real)
>         (4) mediaOffset = 10s = *BEGIN(document) in TIME(real) -
>         BEGIN(media) in TIME(real)*
>
>         Yields
>
>         BEGIN(body) in TIME(real) = BEGIN(media) + mediaOffset +
>         BEGIN(body) = 6.5s + 10s + 5s = 21.5s in TIME(real), which is
>         the same as *BEGIN(media) + 15s in TIME(real)*
>
>         Notice that by using my interpretation of mediaOffset,
>         differing play rates do not affect the relationship between
>         BEGIN(body) and BEGIN(media), which stays constant.
>
>
>             My proposed ttp:mediaBegin would have the value "10:00:00"
>
>
>         I agree from your example that an expression of 10:00:00
>         describes the delta between BEGIN(media) and ORIGIN(media),
>         but this is only useful in cases where time expressions are
>         related to ORIGIN(media) and not BEGIN(media).
>
>
>     Since I defined it in TIME(document) not in TIME(media) the time
>     expressions are related to the origin.
>
>         In the mediaOffset formalism I defined, this value, i.e.,
>         BEGIN(media) - ORIGIN(media), is of no utility. Namely, if
>         mediaOffset is specified as I defined it, i.e., with your
>         example as mediaOffset="-36000s" or mediaOffset="-10h", then
>         you don't have to worry about either play rate differences or
>         changes in actual BEGIN(media), since the result is time
>         expressions always related to BEGIN(media).
>
>
>     I think we're trying to achieve the same end result, i.e. that the
>     text appears at the right time relative to the media despite some
>     (as yet unstated) set of transformations. My approach also relates
>     time expressions to BEGIN(media), except in TIME(document) and
>     therefore unavoidably with reference to ORIGIN(document). Yours
>     goes further, including a translation into TIME(media), which I
>     don't believe we should do.
>
>             in these cases, and not mix in the concept of mapping
>             between the document's temporal coordinates and the
>             related media's temporal coordinates.
>
>
>             The play rate in the document's time base is well defined
>             as now. It's reasonable to assume that any media playback
>             device knows when the related media begins and what it's
>             play rate is.
>
>                 Or, given ttp:mediaOffset = -5s, then <body
>                 begin="5s"/> means that body starts at BEGIN(media).
>
>                 Given this formalism, we don't really care about
>                 BEGIN(media) - ORIGIN(media).
>
>
>             Agreed. What we care about is BEGIN(media) in the temporal
>             coordinate space of the document, or in your useful
>             terminology, in TIME(document).
>
>
>                 Now, if you are suggesting an alternative use case
>                 where ORIGIN(document) != 0 in the TIME(document)
>                 coordinate space, then that is something I haven't
>                 considered, and certainly did not intend to address.
>                 Indeed, doing so would be problematic since SMIL
>                 timing semantics assumes that unspecified begin
>                 defaults to 0s, and further, that 0s corresponds to
>                 ORIGIN(document).
>
>
>             I'm not suggesting that ORIGIN(document) !=0 in
>             TIME(document), since that would as you say create a whole
>             bunch of other problems.
>
>
>                 My response to such a proposed use case would probably
>                 be: we don't support it, you don't need to do it
>                 anyway, so don't do it.
>
>                 Note that the above considerations assume that time
>                 base is media, or that time base is smpte continuous
>                 mode, or that time base is smpte discontinuous mode
>                 and that all smpte time events have been converted to
>                 equivalent smpte continuous mode values, e.g., by
>                 playing back a media object in 1X normal play mode and
>                 recording the PTS time that corresponds with each
>                 frame associated with a smpte time label.
>
>             Just for completeness (at the expense of being
>             repetitious), did you also assume that the media play rate
>             is identical to the document's play rate, i.e. that the
>             only difference between TIME(media) and TIME(document) is
>             an additive offset?
>
>
>         See above.
>
>
>
>                         _Proposals_
>
>                     _
>                     _
>                     I would propose a resolution to points 1, 2, 3 and
>                     5 that is to remove mediaOffset and add a
>                     ttp:mediaBegin parameter, expressed in the same
>                     time base as the document's ttp:timeBase
>                     parameter. This also fits better with
>                     ttp:mediaDuration.
>
>
>                 Hmmm. I'm not inclined to make this change, because
>                 mentally I see mediaOffset as expressing a
>                 difference/delta/offset between two points in two
>                 different one-dimensional coordinate spaces both
>                 representing linear time (at 1X play rate). Calling it
>                 mediaBegin implies in my mind BEGIN(media), i.e., the
>                 delta between BEGIN(media) and ORIGIN(media), and not
>                 the delta between BEGIN(document) and BEGIN(media).
>
>
>             If this is just about the name we choose for the parameter
>             then we're right to choose carefully, but it shouldn't
>             prevent us from agreeing the semantics. To my mind
>             mediaBegin does suggest the delta between BEGIN(document)
>             and BEGIN(media), both in TIME(document). Whereas to me
>             mediaOffset suggests the delta between ORIGIN(document) in
>             TIME(document) and ORIGIN(media) in TIME(??? - this is not
>             clear), which if I understand correctly isn't what you
>             intend. Or if it is what you intend it doesn't seem to be
>             a complete solution for the problem.
>
>                     I would additionally propose allowing dates to be
>                     specified to use in relation to clock times to
>                     resolve point 4, perhaps with a ttp:date
>                     parameter, valid only when ttp:timeBase="clock".
>                     Note that this does not resolve any time
>                     comparison issues caused by documents whose times
>                     cross midnight and wrap back round to a smaller
>                     number of hours.
>
>
>                 Again, I'm wondering what is the related media object?
>                 To my recollection, ttp:timeBase="clock" was added to
>                 TTML to handle timed text cases that don't have a
>                 related media object.
>
>
>             It would be a media object that had also been captured
>             with reference to a clock.
>
>
>
>                     Are there other related use cases or requirements
>                     not met by these proposals?
>
>                     Kind regards,
>
>                     Nigel
>
>
>


-- 
Cyril Concolato
Multimedia Group / Telecom ParisTech
http://concolato.wp.mines-telecom.fr/
@cconcolato
Received on Wednesday, 24 September 2014 15:50:57 UTC