RE: ISSUE-317 (IMSC should not require frame alignment): IMSC should not require frame alignment [TTML IMSC 1.0] from Michael Dolan on 2014-05-22 (public-tt@w3.org from May 2014)

From: Michael Dolan <mdolan@newtbt.com>
Date: Thu, 22 May 2014 09:39:09 -0700
To: "'Timed Text Working Group'" <public-tt@w3.org>
Message-ID: <012701cf75dc$5cee74e0$16cb5ea0$@newtbt.com>
CIL

-----Original Message-----
From: Nigel Megitt [mailto:nigel.megitt@bbc.co.uk] 
Sent: Thursday, May 22, 2014 8:50 AM
To: Michael Dolan; 'Timed Text Working Group'
Subject: Re: ISSUE-317 (IMSC should not require frame alignment): IMSC
should not require frame alignment [TTML IMSC 1.0]

You can achieve unambiguous frame alignment in the current IMSC draft spec
by specifying frames in the time expression, so I think your requirement is
already met.

MD>>  This is simply not true, especially for non-integer framerates.  There
is a *lengthy* reflector exchange on this topic, maybe a year ago which
resulted in changes to TTML1SE.  Perhaps that will help.  You'll be glad to
hear it is simpler for European (integer) framerates. :-)

I disagree that removing the requirement to align unambiguously with frames
takes the timings outside the time space domain as the related video. In
fact they are both in a time domain arbitrarily defined as media time that
the presentation system manages. They're just expressing times with
different levels of precision.

MD>>  I am afraid it is not that simple. You seem to be missing: 1) (above)
the problem that alignment to specific frames of coded video is ambiguous
(without the added spec language in TTML1SE); and 2) text is only visible in
today's decoders for the integral period in which the coded video frame(s)
are visible - that is, you cannot in the general case actually make the text
appear and disappear on other than coded video frame boundaries.  I suppose
in some futuristic decoder/display architecture (and a connector other than
HDMI), it might be possible to truly have text appear and disappear
unrelated to the coded video object frame alignment, but I'm not aware of
any equipment today where that is possible.  I'm not saying we should forbid
sub-frame sync (and we do not as far as I know).  I am saying at a minimum
we need to crisply support alignment with the coded video frame. If one
wants to venture into the world of subframe visual sync, that's perfectly
fine, but doesn't help the existing designs where that is not possible.

Since I don't recognise the requirement for text display time alignment with
encoded video frames please could you describe where this requirement comes
from?

MD>> See above.  Is the problem that ISMC *requires* that it work only this
way and forbids subframe sync?  If so, we should correct it.

Kind regards,

Nigel


On 22/05/2014 16:42, "Michael Dolan" <mdolan@newtbt.com> wrote:

>No, that's not sufficient.  It must be possible to composite the text 
>in the exact same time space domain as the related (coded) video. 
>Unambiguous frame alignment is absolutely required.  What happens after 
>that is a decoder problem.
>
>If you also want to attempt to provide hints about alignment to display 
>formats, or in other applications video frame sync is not important, 
>that's OK.  But that does not relax the requirement for the ability to 
>align with the coded video. And in order to do that, the math must be 
>prescribed.
>
>	Mike
>
>-----Original Message-----
>From: Nigel Megitt [mailto:nigel.megitt@bbc.co.uk]
>Sent: Thursday, May 22, 2014 8:34 AM
>To: Michael Dolan; 'Timed Text Working Group'
>Subject: Re: ISSUE-317 (IMSC should not require frame alignment): IMSC 
>should not require frame alignment [TTML IMSC 1.0]
>
>On 22/05/2014 15:40, "Michael Dolan" <mdolan@newtbt.com> wrote:
>
>>This is a complex topic and absolutely required to provide coded 
>>frame-level text/video sync.
>
>I don't believe that frame-level text/video sync is the requirement 
>though
>- the text needs to be synced against media time, and so does the 
>video, and so does the audio.
>
>> 
>>
>>It is, I believe, impossible for an author to enable sync to display 
>>frames.
>
>I think that's an academic point - what's needed is for the author to 
>specify times as precisely as she/he is able to, and the processor to 
>honour those as closely as it can. The frame rate of the video that the 
>author is creating captions for can not always be guaranteed in the 
>workflow to be the same as the frame rate of the video being played 
>back with those captions.
>I'm arguing that the processor and display combination should try to 
>honour the authored times as accurately as possible independently of 
>the encoded video frame rate for playback.
>
>Nigel
>
>>
>>	Mike
>>
>>-----Original Message-----
>>From: Timed Text Working Group Issue Tracker 
>>[mailto:sysbot+tracker@w3.org]
>>Sent: Thursday, May 22, 2014 3:20 AM
>>To: public-tt@w3.org
>>Subject: ISSUE-317 (IMSC should not require frame alignment): IMSC 
>>should not require frame alignment [TTML IMSC 1.0]
>>
>>ISSUE-317 (IMSC should not require frame alignment): IMSC should not 
>>require frame alignment [TTML IMSC 1.0]
>>
>>http://www.w3.org/AudioVideo/TT/tracker/issues/317
>>
>>Raised by: Nigel Megitt
>>On product: TTML IMSC 1.0
>>
>>IMSC 1.0 §4.4 [1] currently requires temporal quantisation of media 
>>times to frame display times. This rule comes into play when times are 
>>not expressed in frames, and therefore the same document may apply to 
>>a range of related media objects covering different frame rates. In 
>>the case when frames are used the document can only be displayed 
>>alongside media of the same frame rate so there's no need for the 
>>frame alignment
>expression.
>>
>>This approach prevents implementations from changing caption display 
>>at screen refresh rate quantisation and enforces quantisation based on 
>>the encoded video frame rate. This means that if a low frame rate 
>>video is provided, e.g. quarter rate which could be around 6 frames 
>>per second, the effective word reading rate may be increased to the 
>>point where text becomes hard to read.
>>
>>Consider a streaming environment in which there is enough network 
>>capacity to provide audio and captions but the video experience is 
>>badly
>>impacted: in this case it must be permitted that the implementation 
>>continue to present captions alongside the audio regardless of the 
>>frames of video that are displayed.
>>
>>I propose a solution to this problem that implementations SHALL 
>>display captions as temporally close to the media time specified as 
>>the display device permits, independent of video frame rate.
>>
>>Note that where frames are used in media time expressions this reduces 
>>to exactly the current behaviour.
>>
>>[1]
>>https://dvcs.w3.org/hg/ttml/raw-file/ea1a92310a27/ttml-ww-profiles/ttm
>>l
>>-ww
>>-profiles.html#synchronization
>>
>>
>>
>>
>
>
Received on Thursday, 22 May 2014 16:39:45 UTC