Re: ISSUE-317 (IMSC should not require frame alignment): IMSC should not require frame alignment [TTML IMSC 1.0] from Nigel Megitt on 2014-05-22 (public-tt@w3.org from May 2014)

From: Nigel Megitt <nigel.megitt@bbc.co.uk>
Date: Thu, 22 May 2014 15:49:57 +0000
To: Michael Dolan <mdolan@newtbt.com>, "'Timed Text Working Group'" <public-tt@w3.org>
Message-ID: <CFA3DA34.1E2C3%nigel.megitt@bbc.co.uk>
You can achieve unambiguous frame alignment in the current IMSC draft spec
by specifying frames in the time expression, so I think your requirement
is already met.

I disagree that removing the requirement to align unambiguously with
frames takes the timings outside the time space domain as the related
video. In fact they are both in a time domain arbitrarily defined as media
time that the presentation system manages. They're just expressing times
with different levels of precision.

Since I don't recognise the requirement for text display time alignment
with encoded video frames please could you describe where this requirement
comes from?

Kind regards,

Nigel


On 22/05/2014 16:42, "Michael Dolan" <mdolan@newtbt.com> wrote:

>No, that's not sufficient.  It must be possible to composite the text in
>the
>exact same time space domain as the related (coded) video. Unambiguous
>frame
>alignment is absolutely required.  What happens after that is a decoder
>problem.  
>
>If you also want to attempt to provide hints about alignment to display
>formats, or in other applications video frame sync is not important,
>that's
>OK.  But that does not relax the requirement for the ability to align with
>the coded video. And in order to do that, the math must be prescribed.
>
>	Mike
>
>-----Original Message-----
>From: Nigel Megitt [mailto:nigel.megitt@bbc.co.uk]
>Sent: Thursday, May 22, 2014 8:34 AM
>To: Michael Dolan; 'Timed Text Working Group'
>Subject: Re: ISSUE-317 (IMSC should not require frame alignment): IMSC
>should not require frame alignment [TTML IMSC 1.0]
>
>On 22/05/2014 15:40, "Michael Dolan" <mdolan@newtbt.com> wrote:
>
>>This is a complex topic and absolutely required to provide coded
>>frame-level text/video sync.
>
>I don't believe that frame-level text/video sync is the requirement though
>- the text needs to be synced against media time, and so does the video,
>and
>so does the audio.
>
>> 
>>
>>It is, I believe, impossible for an author to enable sync to display
>>frames.
>
>I think that's an academic point - what's needed is for the author to
>specify times as precisely as she/he is able to, and the processor to
>honour
>those as closely as it can. The frame rate of the video that the author is
>creating captions for can not always be guaranteed in the workflow to be
>the
>same as the frame rate of the video being played back with those captions.
>I'm arguing that the processor and display combination should try to
>honour
>the authored times as accurately as possible independently of the encoded
>video frame rate for playback.
>
>Nigel
>
>>
>>	Mike
>>
>>-----Original Message-----
>>From: Timed Text Working Group Issue Tracker
>>[mailto:sysbot+tracker@w3.org]
>>Sent: Thursday, May 22, 2014 3:20 AM
>>To: public-tt@w3.org
>>Subject: ISSUE-317 (IMSC should not require frame alignment): IMSC
>>should not require frame alignment [TTML IMSC 1.0]
>>
>>ISSUE-317 (IMSC should not require frame alignment): IMSC should not
>>require frame alignment [TTML IMSC 1.0]
>>
>>http://www.w3.org/AudioVideo/TT/tracker/issues/317
>>
>>Raised by: Nigel Megitt
>>On product: TTML IMSC 1.0
>>
>>IMSC 1.0 §4.4 [1] currently requires temporal quantisation of media
>>times to frame display times. This rule comes into play when times are
>>not expressed in frames, and therefore the same document may apply to a
>>range of related media objects covering different frame rates. In the
>>case when frames are used the document can only be displayed alongside
>>media of the same frame rate so there's no need for the frame alignment
>expression.
>>
>>This approach prevents implementations from changing caption display at
>>screen refresh rate quantisation and enforces quantisation based on the
>>encoded video frame rate. This means that if a low frame rate video is
>>provided, e.g. quarter rate which could be around 6 frames per second,
>>the effective word reading rate may be increased to the point where
>>text becomes hard to read.
>>
>>Consider a streaming environment in which there is enough network
>>capacity to provide audio and captions but the video experience is
>>badly
>>impacted: in this case it must be permitted that the implementation
>>continue to present captions alongside the audio regardless of the
>>frames of video that are displayed.
>>
>>I propose a solution to this problem that implementations SHALL display
>>captions as temporally close to the media time specified as the display
>>device permits, independent of video frame rate.
>>
>>Note that where frames are used in media time expressions this reduces
>>to exactly the current behaviour.
>>
>>[1]
>>https://dvcs.w3.org/hg/ttml/raw-file/ea1a92310a27/ttml-ww-profiles/ttml
>>-ww
>>-profiles.html#synchronization
>>
>>
>>
>>
>
>
Received on Thursday, 22 May 2014 15:50:29 UTC