Re: ISSUE-317 (IMSC should not require frame alignment): IMSC should not require frame alignment [TTML IMSC 1.0] from Nigel Megitt on 2014-05-23 (public-tt@w3.org from May 2014)

From: Nigel Megitt <nigel.megitt@bbc.co.uk>
Date: Fri, 23 May 2014 10:13:08 +0000
To: Michael Dolan <mdolan@newtbt.com>, "'Timed Text Working Group'" <public-tt@w3.org>
Message-ID: <CFA4CE93.1E317%nigel.megitt@bbc.co.uk>
CIL

On 22/05/2014 17:39, "Michael Dolan" <mdolan@newtbt.com> wrote:

>CIL
>
>-----Original Message-----
>From: Nigel Megitt [mailto:nigel.megitt@bbc.co.uk]
>Sent: Thursday, May 22, 2014 8:50 AM
>To: Michael Dolan; 'Timed Text Working Group'
>Subject: Re: ISSUE-317 (IMSC should not require frame alignment): IMSC
>should not require frame alignment [TTML IMSC 1.0]
>
>You can achieve unambiguous frame alignment in the current IMSC draft spec
>by specifying frames in the time expression, so I think your requirement
>is
>already met.
>
>MD>>  This is simply not true, especially for non-integer framerates.
>There
>is a *lengthy* reflector exchange on this topic, maybe a year ago which
>resulted in changes to TTML1SE.  Perhaps that will help.  You'll be glad
>to
>hear it is simpler for European (integer) framerates. :-)

Thanks for the reminder - I see what you mean. By the way, it was
ISSUE-199 and the resolution was included in Appendix N of TTML1SE: we
chose not to treat seconds as an integral number of frames, giving rise to
the result you describe.

That takes us to a calculation of where in the media timeline the time
expression points to - selecting the nearest quantum of time that's
available in the system is the next step needed in an implementation. I
agree that we should state a rounding algorithm, but enforcing that the
quantum is the encoded video frame wouldn't be general enough for all
uses. I could perhaps accept a compromise that the quantum is the frame in
the special case that frame based time expressions are used, but I would
still prefer a less restrictive rule to allow for very low encoded frame
rates.


>I disagree that removing the requirement to align unambiguously with
>frames
>takes the timings outside the time space domain as the related video. In
>fact they are both in a time domain arbitrarily defined as media time that
>the presentation system manages. They're just expressing times with
>different levels of precision.
>
>MD>>  I am afraid it is not that simple. You seem to be missing: 1)
>(above)
>the problem that alignment to specific frames of coded video is ambiguous
>(without the added spec language in TTML1SE);

Happily we have the added spec language in TTML1SE so the ambiguity is
removed. If your point is that TTML1SE Appendix N is non-normative then
how about adding a statement in IMSC 1 that TTML1SE Appendix N is
normative for IMSC?

>and 2) text is only visible in
>today's decoders for the integral period in which the coded video frame(s)
>are visible - that is, you cannot in the general case actually make the
>text
>appear and disappear on other than coded video frame boundaries.  I
>suppose
>in some futuristic decoder/display architecture (and a connector other
>than
>HDMI), it might be possible to truly have text appear and disappear
>unrelated to the coded video object frame alignment, but I'm not aware of
>any equipment today where that is possible.

This is not futuristic at all - it's possible to implement a decoder on
the same device as the display today without being constrained by a
frame-based intermediary such as HDMI. Obvious example devices include
laptop computers and tablets. If IMSC is intended to be constrained by the
requirements of HDMI it needs to say so in the Scope section.

> I'm not saying we should forbid
>sub-frame sync (and we do not as far as I know).  I am saying at a minimum
>we need to crisply support alignment with the coded video frame. If one
>wants to venture into the world of subframe visual sync, that's perfectly
>fine, but doesn't help the existing designs where that is not possible.
>
>Since I don't recognise the requirement for text display time alignment
>with
>encoded video frames please could you describe where this requirement
>comes
>from?
>
>MD>> See above.  Is the problem that ISMC *requires* that it work only
>this
>way and forbids subframe sync?  If so, we should correct it.

Yes, that's exactly it - assuming that by "subframe sync" you mean general
sync to quanta shorter than the frame, rather than syncing precisely to
sub-frames according to ttp:subFrameRate, which is explicitly prohibited
in IMSC 1 ("#subFrameRate SHALL not be used"). I agree we should correct
it.

>
>Kind regards,
>
>Nigel
>
>
>On 22/05/2014 16:42, "Michael Dolan" <mdolan@newtbt.com> wrote:
>
>>No, that's not sufficient.  It must be possible to composite the text
>>in the exact same time space domain as the related (coded) video.
>>Unambiguous frame alignment is absolutely required.  What happens after
>>that is a decoder problem.
>>
>>If you also want to attempt to provide hints about alignment to display
>>formats, or in other applications video frame sync is not important,
>>that's OK.  But that does not relax the requirement for the ability to
>>align with the coded video. And in order to do that, the math must be
>>prescribed.
>>
>>	Mike
>>
>>-----Original Message-----
>>From: Nigel Megitt [mailto:nigel.megitt@bbc.co.uk]
>>Sent: Thursday, May 22, 2014 8:34 AM
>>To: Michael Dolan; 'Timed Text Working Group'
>>Subject: Re: ISSUE-317 (IMSC should not require frame alignment): IMSC
>>should not require frame alignment [TTML IMSC 1.0]
>>
>>On 22/05/2014 15:40, "Michael Dolan" <mdolan@newtbt.com> wrote:
>>
>>>This is a complex topic and absolutely required to provide coded
>>>frame-level text/video sync.
>>
>>I don't believe that frame-level text/video sync is the requirement
>>though
>>- the text needs to be synced against media time, and so does the
>>video, and so does the audio.
>>
>>> 
>>>
>>>It is, I believe, impossible for an author to enable sync to display
>>>frames.
>>
>>I think that's an academic point - what's needed is for the author to
>>specify times as precisely as she/he is able to, and the processor to
>>honour those as closely as it can. The frame rate of the video that the
>>author is creating captions for can not always be guaranteed in the
>>workflow to be the same as the frame rate of the video being played
>>back with those captions.
>>I'm arguing that the processor and display combination should try to
>>honour the authored times as accurately as possible independently of
>>the encoded video frame rate for playback.
>>
>>Nigel
>>
>>>
>>>	Mike
>>>
>>>-----Original Message-----
>>>From: Timed Text Working Group Issue Tracker
>>>[mailto:sysbot+tracker@w3.org]
>>>Sent: Thursday, May 22, 2014 3:20 AM
>>>To: public-tt@w3.org
>>>Subject: ISSUE-317 (IMSC should not require frame alignment): IMSC
>>>should not require frame alignment [TTML IMSC 1.0]
>>>
>>>ISSUE-317 (IMSC should not require frame alignment): IMSC should not
>>>require frame alignment [TTML IMSC 1.0]
>>>
>>>http://www.w3.org/AudioVideo/TT/tracker/issues/317
>>>
>>>Raised by: Nigel Megitt
>>>On product: TTML IMSC 1.0
>>>
>>>IMSC 1.0 §4.4 [1] currently requires temporal quantisation of media
>>>times to frame display times. This rule comes into play when times are
>>>not expressed in frames, and therefore the same document may apply to
>>>a range of related media objects covering different frame rates. In
>>>the case when frames are used the document can only be displayed
>>>alongside media of the same frame rate so there's no need for the
>>>frame alignment
>>expression.
>>>
>>>This approach prevents implementations from changing caption display
>>>at screen refresh rate quantisation and enforces quantisation based on
>>>the encoded video frame rate. This means that if a low frame rate
>>>video is provided, e.g. quarter rate which could be around 6 frames
>>>per second, the effective word reading rate may be increased to the
>>>point where text becomes hard to read.
>>>
>>>Consider a streaming environment in which there is enough network
>>>capacity to provide audio and captions but the video experience is
>>>badly
>>>impacted: in this case it must be permitted that the implementation
>>>continue to present captions alongside the audio regardless of the
>>>frames of video that are displayed.
>>>
>>>I propose a solution to this problem that implementations SHALL
>>>display captions as temporally close to the media time specified as
>>>the display device permits, independent of video frame rate.
>>>
>>>Note that where frames are used in media time expressions this reduces
>>>to exactly the current behaviour.
>>>
>>>[1]
>>>https://dvcs.w3.org/hg/ttml/raw-file/ea1a92310a27/ttml-ww-profiles/ttm
>>>l
>>>-ww
>>>-profiles.html#synchronization
>>>
>>>
>>>
>>>
>>
>>
>
>
>
Received on Friday, 23 May 2014 10:13:42 UTC