RE: EBU Segmentation document from Michael Dolan on 2013-12-19 (public-tt@w3.org from December 2013)

From: Michael Dolan <mdolan@newtbt.com>
Date: Thu, 19 Dec 2013 08:31:52 -0800
To: "'Timed Text Working Group'" <public-tt@w3.org>
Message-ID: <010e01cefcd7$d48b7300$7da25900$@newtbt.com>
[merging 2 independent responses into a single post]

 

From: John Birch [mailto:John.Birch@screensystems.tv] 
Sent: Monday, December 09, 2013 7:58 AM
To: Michael Dolan; 'Timed Text Working Group'
Subject: RE: EBU Segmentation document

 

Hi Michael,

 

I’m interested in why you are unhappy with times in the document that would lie outside the sample epoch?

 

Most container formats state something along the lines of… “content is not valid outside the sample duration”.  Which is fine, and IMHO having e.g. begin and end times that exceed the sample epoch does not violate this concept. Having a begin and end time outside the epoch does not necessarily mean a decoder has to activate the content before the sample epoch (or after it).

 

MD>> The standalone documents cannot be properly presented since the actual document duration is now constrained by external means only. There is nothing in the document itself that constrains the intended decoder behaviour.  Thus, a normal decoder would in fact use the end/dur times in the document itself to determine the temporal extent. A design principle of TTML is that, with very few exceptions, they are self-contained. Enabling more parameters defined only by external means is to be avoided if possible.

 

Further I’m interested in how other track types are handled… for example audio is often sampled at a different rate to video… consequently audio and video samples are unlikely to align. E.g. How do containers handle the overlap of audio segments across sample boundaries?

 

MD>> I don’t understand the relevance of inter-track alignment to this discussion.  No tracks are aligned in practice today and none of the discussion here alters that as far as I can tell.

 

I much fear we might be discussing and designing a solution to  a ‘non problem’.

 

MD>> Well, my initial reply was that the use cases were in fact a non-problem. J  Documents can repeat styles and align times and accomplish the same thing.  Decoders can be designed well to not flash when temporally adjacent content is identical. For production workflow of splitting/recombining documents, this can also be solved by the production equipment without changing the fundamental definition of temporal extent.  However, my response below is accepting these use cases and suggesting a less disruptive (in my view) solution.

 

I have reservations about using IDs to link documents… there are any number of potential issues there.

 

MD>> I don’t see a problem, but we can discuss that after we conclude a discussion of the basic design issue.

 

Allowing temporal overlap does at least mean that each document can be independent, whilst retaining some ability for a decoder to optimise for continuity of presentation, regardless of play direction. Further it re-inforces the concept that the subtitle durations are independent of the video (and audio) samples… E.g. subtitle temporal accuracy and continuation of presentation should ideally occur at the specified time points regardless of video playback frame rates or stutters due to adaptive adjustments etc.

 

Best regards,

John

 

From: Andreas Tai [mailto:tai@irt.de] 
Sent: Monday, December 09, 2013 10:32 AM
To: Michael Dolan; 'Timed Text Working Group'
Subject: Re: EBU Segmentation document

 

Hi Mike,

Some comments: 

1) Intermediate Synchronic Document (ISD) and Segmentation

The use of the concept of Intermediate Synchronic Document in this context can be confusing. IMO the main problem is unrelated.

Let's take a source document with only two subtitles/p elements that shall be broken into smaller target documents:

<p begin="00:00:00" end="00:00:10" xml:id="sub1" region="r1">Foo</p>
<p begin="00:00:01" end="00:00:03" xml:id="sub1" region="r2">Bar</p>

You would generate three ISD´s:

IDS 1 = [0s,1s]
ISD 2 = [1s,3s]
ISD 3 = [3s,10s]

Even if you make 3 documents samples of the exact duration of the IDS's (1s,2s and 7s) the concept of the ISD does not give a hint about a “continued” subtitle. In this case the subtitle with the xml:id "sub1" will be shown in all three samples.

If you generate samples with a fixed length of 1s (for example) subtitle with the xml:id  "sub1" will continue over 9 documents (not counting the first one).

 

MD>> Sorry if adding the ISD to the discussion complicated things. We can drop it if it simplifies discussion of the more fundamental design issue.

2) xml:id versus new metadata
It can be discussed why we should not take xml:id as unique identifier instead of two new metadata elements. I think is possible to overload the semantics of xml:id in a specific context. For example it could make the constraint that xml:id in a stream of subtitle documents has to identify always the same subtitle.

We had the discussion in the EBU-TT context and there were more than one opinion that it would be difficult (or not desired) to manage identity over more than one document.

You could argue that a sample is in this case not stateless anymore because you need all previous send documents to check this constraint.

 

MD>> I do not believe that xml:id can be used since its scope is explicitly defined by W3C to be within a single document.  But let’s discuss the id mechanism after we resolve the basic design.

3) Validation and fallback behavior
Regardless which approach will be taken: the rendering client is still responsible to check if two "subtitles" of different documents are the "same". The region and the computed style set must be the same. An identifier is just a hint.

If you give a hint for a “continued” subtitle: what happens if this is not correct e.g. one subtitle in document n is marked as the same as in document n-1 but although it as the same content the subtitle has not the same computed styleset (e.g. another background color).

 

MD>> I believe we could define the metadata to be authoritative if we want to.  Ideally, the two computed styles match, but indeed we would need to define which of the metadata and computed style “wins” in the event of a conflict. Making the metadata authoritative reduces decoder computation.


Best regards,

Andreas



From: Michael Dolan [mailto:mdolan@newtbt.com] 
Sent: 06 December 2013 00:48
To: 'Timed Text Working Group'
Subject: RE: EBU Segmentation document

 

ACTION-250: https://www.w3.org/AudioVideo/TT/tracker/actions/250 

 

As I suggested below, I think the use cases in the BBC/EBU contribution can be met without relaxing the temporal constraint as proposed.  It involves two metadata attributes that are used to signal that there exists a temporal continuity between the last synchronic document in one instance document and the first synchronic document in another instance document.

 

ttm:continues=[xs:string]  This indicates that the identified content may be continued in another document with later, adjacent temporal extent.  The value of the attribute is an identifier that uniquely identifies the content in both documents. There may be more than one of these per document, but the string values must be unique. The scope of uniqueness of the identifier value is the current instance document plus the immediately following (temporally) instance document.

 

ttm:continuedFrom=[xs:string]  This indicates that the identified content may have been continued from another document with earlier, adjacent temporal extent.  And further, the content that this attribute applies to is identical in every way in both documents. The presentation engine may ignore the content in the current document. This does not mean the two synchronic documents are identical; just the identified content. The value of the attribute is an identifier that uniquely identifies the content from an earlier, temporally adjacent document. There may be more than one of these per document. The scope of uniqueness of the identifier value is the current instance document plus the immediately preceding (temporally) instance document.

 

For example:

 

Doc #1:

<p begin="00:07:51:00" end="00:07:52:01" ttm:continues="theLastLine">text that spans documents</p>

 

Doc #2:

<p begin="00:07:52:01" end="00:07:53:00" ttm:continuedFrom="theLastLine">text that spans documents</p>

 

Let me know if I have missed some use case or if this won’t work.

 

Regards,

                Mike

 

From: Nigel Megitt [mailto:nigel.megitt@bbc.co.uk] 
Sent: Thursday, November 21, 2013 8:46 AM
To: John Birch; Michael Dolan; 'Timed Text Working Group'
Subject: Re: EBU Segmentation document

 

I also agree!

 

Although… a use case we should consider is bi-directional playing/scanning of content – this was mentioned to me last week (apologies I forget who but I have a pretty good idea it was one of 2 people I'm thinking of!). To facilitate this any forward-facing information should be mirrored as backwards-facing, i.e. the 'continuation of previous' and begin times preceding sample begin time should be permitted.

 

In the previous discussions was there any consideration of the scope of xml:id within the set of [continued] documents? Normally xml:id is unique per document, but there would be an argument here for extending the uniqueness within the set of connected documents.

 

Kind regards,

 

Nigel

 

 

On 21/11/2013 16:34, "John Birch" <John.Birch@screensystems.tv> wrote:

 

I very much agree…

 

Although… *even* in a perfect decoder world I would suggest that ‘a priori’ knowledge of the end time of content (i.e. an end time that is effective in a subsequent document) would be useful in improving decoder efficiency. I.e. knowing that display of content spans the next sample boundary can be used to optimise the decoder. The indication that current content was a continuation of previous content is arguably perhaps less relevant.

 

Best regards,

John

 

John Birch | Strategic Partnerships Manager | Screen
Main Line : +44 1473 831700 | Ext : 270 | Direct Dial : +44 1473 834532 
Mobile : +44 7919 558380 | Fax : +44 1473 830078 
John.Birch@screensystems.tv | www.screensystems.tv | https://twitter.com/screensystems

Visit us at 
BVE, Excel London, 25-27 February 2014, Stand P36

P Before printing, think about the environment

 

From: Michael Dolan [mailto:mdolan@newtbt.com] 
Sent: 21 November 2013 16:26
To: 'Timed Text Working Group'
Subject: RE: EBU Segmentation document

 

This general topic has come and gone over the years, and a remnant of an early discussion remains in draft TTML2, Appendix L [1].

 

This proposal is different in that the segments are all valid and complete TTML documents. This is good since more recent analysis suggests that creating document ”pieces” doesn’t actually solve a problem. We should consider removing Appendix L in TTML2.

 

The focus of this appears to be the management of the temporal extent.  I recommend the title and introduction be clarified as “segment” is misleading.

 

This issue was also discussed in the past.  A suggestion at the time was to add two attributes that: 1) indicated the content is continued (repeated) in a subsequent document; and 2) indicated that the content is a continuation of an earlier document. This preserves the contained temporal extent of each document.

 

In a perfect decoder world, this should not be necessary if the end/start times of adjacent documents are identical.  The resulting synchronic documents at the temporal boundary would be identical and the decoder should not glitch and remove the content and flash.  The attributes would “help” the decoder do the right thing.

 

Regards,

 

                Mike

 

[1] https://dvcs.w3.org/hg/ttml/raw-file/tip/ttml2/spec/ttml2.html#streaming 

 

 

From: Nigel Megitt [mailto:nigel.megitt@bbc.co.uk] 
Sent: Monday, November 11, 2013 6:14 AM
To: Timed Text Working Group
Subject: EBU Segmentation document

 

All,

 

As per Action-235 please see this draft document on segmentation of EBU-TT, for reference with respect to Issue-288. Note that it is not finalised within EBU and may be published as a purely informative document rather than a normative one. There is therefore no formal errata document, however some areas have been commented on and may change, including the substantive change described below.

 

Section 4.1.5 states that content outside a sample's temporal extent shall not be displayed. However in the draft EBU-TT-D specification it is permitted in fault scenarios for processors to make use of the knowledge of any subtitles that extend outside the sample's temporal extent to be displayed, as a 'graceful recovery' option only, in the case that the sample that should be active has not been received.

 

Kind regards,

 

Nigel

 

 

 

----------------------------

http://www. <http://www.bbc.co.uk> bbc.co.uk
This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.

---------------------

 

This message may contain confidential and/or privileged information. If you are not the intended recipient you must not use, copy, disclose or take any action based on this message or any information herein. If you have received this message in error, please advise the sender immediately by reply e-mail and delete this message. Thank you for your cooperation. Screen Subtitling Systems Ltd. Registered in England No. 2596832. Registered Office: The Old Rectory, Claydon Church Lane, Claydon, Ipswich, Suffolk, IP6 0EQ

     

 

----------------------------

http://www. <http://www.bbc.co.uk> bbc.co.uk
This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.

---------------------

 

This message may contain confidential and/or privileged information. If you are not the intended recipient you must not use, copy, disclose or take any action based on this message or any information herein. If you have received this message in error, please advise the sender immediately by reply e-mail and delete this message. Thank you for your cooperation. Screen Subtitling Systems Ltd. Registered in England No. 2596832. Registered Office: The Old Rectory, Claydon Church Lane, Claydon, Ipswich, Suffolk, IP6 0EQ
Received on Thursday, 19 December 2013 16:32:32 UTC