RE: ISSUE-317 (IMSC should not require frame alignment): IMSC should not require frame alignment [TTML IMSC 1.0] from Michael Dolan on 2014-05-23 (public-tt@w3.org from May 2014)

From: Michael Dolan <mdolan@newtbt.com>
Date: Fri, 23 May 2014 07:32:05 -0700
To: "'Timed Text Working Group'" <public-tt@w3.org>
Message-ID: <008b01cf7693$c6a4e880$53eeb980$@newtbt.com>
We did not *choose* to make seconds non-frame-aligned.  That was done for us
in NTSC countries half a century ago.

#subframerate time representation is not remotely the problem.
Non-frame-aligned times exist with only seconds (and fractional seconds).

Appendix N was agreed to be made normative in TTML2. It could not be made so
in TTML1SE since it was obviously substantive.

Again, I think we have a fundamental disconnect about how TTML (not IMSC)
relates to coded video objects.  Until we agree on the model, we should set
aside these issues - we will both not agree and not agree on where to codify
the constraints.

	Mike

-----Original Message-----
From: Nigel Megitt [mailto:nigel.megitt@bbc.co.uk] 
Sent: Friday, May 23, 2014 3:13 AM
To: Michael Dolan; 'Timed Text Working Group'
Subject: Re: ISSUE-317 (IMSC should not require frame alignment): IMSC
should not require frame alignment [TTML IMSC 1.0]

CIL

On 22/05/2014 17:39, "Michael Dolan" <mdolan@newtbt.com> wrote:

>CIL
>
>-----Original Message-----
>From: Nigel Megitt [mailto:nigel.megitt@bbc.co.uk]
>Sent: Thursday, May 22, 2014 8:50 AM
>To: Michael Dolan; 'Timed Text Working Group'
>Subject: Re: ISSUE-317 (IMSC should not require frame alignment): IMSC 
>should not require frame alignment [TTML IMSC 1.0]
>
>You can achieve unambiguous frame alignment in the current IMSC draft 
>spec by specifying frames in the time expression, so I think your 
>requirement is already met.
>
>MD>>  This is simply not true, especially for non-integer framerates.
>There
>is a *lengthy* reflector exchange on this topic, maybe a year ago which 
>resulted in changes to TTML1SE.  Perhaps that will help.  You'll be 
>glad to hear it is simpler for European (integer) framerates. :-)

Thanks for the reminder - I see what you mean. By the way, it was
ISSUE-199 and the resolution was included in Appendix N of TTML1SE: we chose
not to treat seconds as an integral number of frames, giving rise to the
result you describe.

That takes us to a calculation of where in the media timeline the time
expression points to - selecting the nearest quantum of time that's
available in the system is the next step needed in an implementation. I
agree that we should state a rounding algorithm, but enforcing that the
quantum is the encoded video frame wouldn't be general enough for all uses.
I could perhaps accept a compromise that the quantum is the frame in the
special case that frame based time expressions are used, but I would still
prefer a less restrictive rule to allow for very low encoded frame rates.


>I disagree that removing the requirement to align unambiguously with 
>frames takes the timings outside the time space domain as the related 
>video. In fact they are both in a time domain arbitrarily defined as 
>media time that the presentation system manages. They're just 
>expressing times with different levels of precision.
>
>MD>>  I am afraid it is not that simple. You seem to be missing: 1)
>(above)
>the problem that alignment to specific frames of coded video is 
>ambiguous (without the added spec language in TTML1SE);

Happily we have the added spec language in TTML1SE so the ambiguity is
removed. If your point is that TTML1SE Appendix N is non-normative then how
about adding a statement in IMSC 1 that TTML1SE Appendix N is normative for
IMSC?

>and 2) text is only visible in
>today's decoders for the integral period in which the coded video 
>frame(s) are visible - that is, you cannot in the general case actually 
>make the text appear and disappear on other than coded video frame 
>boundaries.  I suppose in some futuristic decoder/display architecture 
>(and a connector other than HDMI), it might be possible to truly have 
>text appear and disappear unrelated to the coded video object frame 
>alignment, but I'm not aware of any equipment today where that is 
>possible.

This is not futuristic at all - it's possible to implement a decoder on the
same device as the display today without being constrained by a frame-based
intermediary such as HDMI. Obvious example devices include laptop computers
and tablets. If IMSC is intended to be constrained by the requirements of
HDMI it needs to say so in the Scope section.

> I'm not saying we should forbid
>sub-frame sync (and we do not as far as I know).  I am saying at a 
>minimum we need to crisply support alignment with the coded video 
>frame. If one wants to venture into the world of subframe visual sync, 
>that's perfectly fine, but doesn't help the existing designs where that is
not possible.
>
>Since I don't recognise the requirement for text display time alignment 
>with encoded video frames please could you describe where this 
>requirement comes from?
>
>MD>> See above.  Is the problem that ISMC *requires* that it work only
>this
>way and forbids subframe sync?  If so, we should correct it.

Yes, that's exactly it - assuming that by "subframe sync" you mean general
sync to quanta shorter than the frame, rather than syncing precisely to
sub-frames according to ttp:subFrameRate, which is explicitly prohibited in
IMSC 1 ("#subFrameRate SHALL not be used"). I agree we should correct it.

>
>Kind regards,
>
>Nigel
>
>
>On 22/05/2014 16:42, "Michael Dolan" <mdolan@newtbt.com> wrote:
>
>>No, that's not sufficient.  It must be possible to composite the text 
>>in the exact same time space domain as the related (coded) video.
>>Unambiguous frame alignment is absolutely required.  What happens 
>>after that is a decoder problem.
>>
>>If you also want to attempt to provide hints about alignment to 
>>display formats, or in other applications video frame sync is not 
>>important, that's OK.  But that does not relax the requirement for the 
>>ability to align with the coded video. And in order to do that, the 
>>math must be prescribed.
>>
>>	Mike
>>
>>-----Original Message-----
>>From: Nigel Megitt [mailto:nigel.megitt@bbc.co.uk]
>>Sent: Thursday, May 22, 2014 8:34 AM
>>To: Michael Dolan; 'Timed Text Working Group'
>>Subject: Re: ISSUE-317 (IMSC should not require frame alignment): IMSC 
>>should not require frame alignment [TTML IMSC 1.0]
>>
>>On 22/05/2014 15:40, "Michael Dolan" <mdolan@newtbt.com> wrote:
>>
>>>This is a complex topic and absolutely required to provide coded 
>>>frame-level text/video sync.
>>
>>I don't believe that frame-level text/video sync is the requirement 
>>though
>>- the text needs to be synced against media time, and so does the 
>>video, and so does the audio.
>>
>>> 
>>>
>>>It is, I believe, impossible for an author to enable sync to display 
>>>frames.
>>
>>I think that's an academic point - what's needed is for the author to 
>>specify times as precisely as she/he is able to, and the processor to 
>>honour those as closely as it can. The frame rate of the video that 
>>the author is creating captions for can not always be guaranteed in 
>>the workflow to be the same as the frame rate of the video being 
>>played back with those captions.
>>I'm arguing that the processor and display combination should try to 
>>honour the authored times as accurately as possible independently of 
>>the encoded video frame rate for playback.
>>
>>Nigel
>>
>>>
>>>	Mike
>>>
>>>-----Original Message-----
>>>From: Timed Text Working Group Issue Tracker 
>>>[mailto:sysbot+tracker@w3.org]
>>>Sent: Thursday, May 22, 2014 3:20 AM
>>>To: public-tt@w3.org
>>>Subject: ISSUE-317 (IMSC should not require frame alignment): IMSC 
>>>should not require frame alignment [TTML IMSC 1.0]
>>>
>>>ISSUE-317 (IMSC should not require frame alignment): IMSC should not 
>>>require frame alignment [TTML IMSC 1.0]
>>>
>>>http://www.w3.org/AudioVideo/TT/tracker/issues/317
>>>
>>>Raised by: Nigel Megitt
>>>On product: TTML IMSC 1.0
>>>
>>>IMSC 1.0 §4.4 [1] currently requires temporal quantisation of media 
>>>times to frame display times. This rule comes into play when times 
>>>are not expressed in frames, and therefore the same document may 
>>>apply to a range of related media objects covering different frame 
>>>rates. In the case when frames are used the document can only be 
>>>displayed alongside media of the same frame rate so there's no need 
>>>for the frame alignment
>>expression.
>>>
>>>This approach prevents implementations from changing caption display 
>>>at screen refresh rate quantisation and enforces quantisation based 
>>>on the encoded video frame rate. This means that if a low frame rate 
>>>video is provided, e.g. quarter rate which could be around 6 frames 
>>>per second, the effective word reading rate may be increased to the 
>>>point where text becomes hard to read.
>>>
>>>Consider a streaming environment in which there is enough network 
>>>capacity to provide audio and captions but the video experience is 
>>>badly
>>>impacted: in this case it must be permitted that the implementation 
>>>continue to present captions alongside the audio regardless of the 
>>>frames of video that are displayed.
>>>
>>>I propose a solution to this problem that implementations SHALL 
>>>display captions as temporally close to the media time specified as 
>>>the display device permits, independent of video frame rate.
>>>
>>>Note that where frames are used in media time expressions this 
>>>reduces to exactly the current behaviour.
>>>
>>>[1]
>>>https://dvcs.w3.org/hg/ttml/raw-file/ea1a92310a27/ttml-ww-profiles/tt
>>>m
>>>l
>>>-ww
>>>-profiles.html#synchronization
>>>
>>>
>>>
>>>
>>
>>
>
>
>
Received on Friday, 23 May 2014 14:32:42 UTC