RE: ISSUE-188 (render-complexity): Bounding SDP-US rendering complexity [Simple Delivery Profile for Closed Captions] from Michael A Dolan on 2012-10-03 (public-tt@w3.org from October 2012)

From: Michael A Dolan <mdolan@newtbt.com>
Date: Wed, 3 Oct 2012 09:45:29 -0700
To: <public-tt@w3.org>
Message-ID: <059a01cda186$8121fbb0$8365f310$@newtbt.com>
CFF bounds the total size of any one document to bound the decoder memory
and processor requirements. Depending on the total text, it may have to be
broken up.

Live streaming is trickier (and CFF doesn't support it).  Some adjustments
would need to be made.

	Mike

-----Original Message-----
From: John Birch [mailto:John.Birch@screensystems.tv] 
Sent: Wednesday, October 03, 2012 8:08 AM
To: Michael A Dolan; public-tt@w3.org
Subject: RE: ISSUE-188 (render-complexity): Bounding SDP-US rendering
complexity [Simple Delivery Profile for Closed Captions]

Hi Mike,

Yes, I guess I have... I was not really thinking that you might break a VOD
asset into fragments. That's similar to the reels concept in Digital Cinema.

I was also assuming that subtitle documents and movie fragments would relate
to the same temporal period in the presentation (thinking MXF operational
practises here), but what you are saying is that a subtitle fragment could
be of considerably different temporal length to any movie fragment?

So yes, agreed, it would be possible to find common document boundaries
where there was no subtitle on air for any language.

But this does raise some more questions for me...

Given the typical size of a subtitle document, why fragment it at all?

Getting back to live streaming, I would assume short movie fragments would
be used (perhaps to support adaptive bit rate). What would be the strategy
for subtitle document duration? You don't have subtitle document for the
complete 'movie' because the subtitle content is being generated live (by a
steno typist or re-speaker). So I am assuming that you could send subtitle
documents that represented complete single subtitles when they were
completed by the subtitler / captioneer.... as documents, but what about
cumulative captioning (roll-on / paint-on) in the live environment?

Best regards,
John

John Birch | Screen Systems | Strategic Partnerships Manager Main Line : +44
1473 831700 | Ext : 270 | Direct Dial : +44 1473 834532 Mobile : +44 7919
558380 | Fax : +44 1473 830078 John.Birch@screensystems.tv |
www.screensystems.tv | http://twitter.com/ScreenSubtitles

Visit us at
SMPTE Annual Technical Conference & Exhibition,23-24 October, Stand 112
Loews Hollywood hotel, Hollywood

P Before printing, think about the environment-----Original Message-----
From: Michael A Dolan [mailto:mdolan@newtbt.com]
Sent: 03 October 2012 15:38
To: public-tt@w3.org
Subject: RE: ISSUE-188 (render-complexity): Bounding SDP-US rendering
complexity [Simple Delivery Profile for Closed Captions]

John-

I think you have mixed in the live scenario here.  And you have assumed that
multiple subtitle tracks have to all have the same duration fragments.

As Sean and I have noted, CFF subtitle documents are typically 5-30 minutes
in duration (an authoring decision). And the tracks don't all have to be the
same duration.  It is therefore statistically overwhelming that an
appropriate authoring boundary can be found for all of them.

        Mike

-----Original Message-----
From: John Birch [mailto:John.Birch@screensystems.tv]
Sent: Tuesday, October 02, 2012 9:09 AM
To: 'mdolan@newtbt.com'; 'public-tt@w3.org'
Subject: Re: ISSUE-188 (render-complexity): Bounding SDP-US rendering
complexity [Simple Delivery Profile for Closed Captions]

Good.

It is very unlikely that authors would be able to align subtitles with
fragment boundaries.

Subtitles may often persist for longer than 4 seconds. A typical
presentation time for each subtitle in a sequence might be 3 seconds... So
statistically it is likely that there will be many boundary crossings.
Further, the alignment of different languages of subtitles for the same
content is uncommon, due primarily to the verbosity of different
languages... Extreme example, Italian cf Chinese...

A comment in TTML would be useful, perhaps also in CFF to this effect. (Yes
I know it is implicit in the model :-) but wouldn't hurt to make it more
explicit?

Best regards,
John


John Birch | Screen Systems | Strategic Partnerships Manager Main Line : +44
1473 831700 | Ext : 270 | Direct Dial : +44 1473 834532 Mobile : +44 7919
558380 | Fax : +44 1473 830078 John.Birch@screensystems.tv |
www.screensystems.tv | http://twitter.com/ScreenSubtitles

Visit us at
SMPTE Annual Technical Conference & Exhibition,23-24 October, Stand 112
Loews Hollywood hotel, Hollywood

P Before printing, think about the environment----- Original Message -----
From: Michael A Dolan [mailto:mdolan@newtbt.com]
Sent: Tuesday, October 02, 2012 04:06 PM
To: 'Timed Text Working Group' <public-tt@w3.org>
Subject: RE: ISSUE-188 (render-complexity): Bounding SDP-US rendering
complexity [Simple Delivery Profile for Closed Captions]

CFF is a fragmented file construction (2-4 second samples for video and
audio), so it is "streaming friendly" and perhaps suitable for distribution.
To stream it directly would require a custom transport (i.e. not DASH). But
because of the fragmented construction, it would be easy to make DASH assets
from it.

Authors should strive to create document boundaries when nothing is active.
Decoder manufacturers should strive to detect identical active content
across document boundaries and not flash. As I mentioned, SMPTE has
suggested some "hint" additions to TTML 1.1 to better support this.

        Mike

-----Original Message-----
From: John Birch [mailto:John.Birch@screensystems.tv]
Sent: Monday, October 01, 2012 1:59 AM
To: Michael A Dolan; 'Timed Text Working Group'
Subject: RE: ISSUE-188 (render-complexity): Bounding SDP-US rendering
complexity [Simple Delivery Profile for Closed Captions]

Hi Mike,

Thanks for your comments re clarifications.

With respect to 'progressive download' in CFF.

a) Would this be suitable for live streaming? I hope so... a robust video
container standard that includes well defined support for **subtitle**
tracks is definitely needed for live video on the Internet!

With respect to movie fragments, is the case where a subtitle persists
across the boundary between movie fragments described?
I note there are many paragraphs about synchronisation between tracks, but
clarification of the **persistence** of a subtitle across such a boundary
eluded me...

It might be a subtle point, but without the notion of persistence, an
implementation might result in an annoying subtitle redraw 'flash' at such a
boundary.

Best regards,

John


John Birch | Screen Systems | Strategic Partnerships Manager Main Line : +44
1473 831700 | Ext : 270 | Direct Dial : +44 1473 834532 Mobile : +44 7919
558380 | Fax : +44 1473 830078 John.Birch@screensystems.tv |
www.screensystems.tv | http://twitter.com/ScreenSubtitles

Visit us at
SMPTE Annual Technical Conference & Exhibition,23-24 October, Stand 112
Loews Hollywood hotel, Hollywood

P Before printing, think about the environment-----Original Message-----
From: Michael A Dolan [mailto:mdolan@newtbt.com]
Sent: 28 September 2012 17:59
To: John Birch; 'Timed Text Working Group'
Subject: RE: ISSUE-188 (render-complexity): Bounding SDP-US rendering
complexity [Simple Delivery Profile for Closed Captions]

CIL

-----Original Message-----
From: John Birch [mailto:John.Birch@screensystems.tv]
Sent: Friday, September 28, 2012 9:15 AM
To: Michael A Dolan; 'Timed Text Working Group'
Subject: RE: ISSUE-188 (render-complexity): Bounding SDP-US rendering
complexity [Simple Delivery Profile for Closed Captions]

Hi Mike,

Sorry, I don't think I'm getting my point across - I'll try one more time.

The problem is with how I feel some readers may interpret the CFF model. On
cursory examination, the model behaviour is consistent with (and indicates /
implies) just a pop mode of subtitling or captioning.
There are no statements in the CFF document that indicate that multiple
(other) modes of presentation might be expected... and not all readers of
CFF will be aware of different captioning modalities.

It is my fear that many implementations will follow this rendering model at
a cursory level (for such is the nature of implementation under time and
fiscal pressure).
Consequently there is a risk that implementations will arise that do not
effectively support the more sophisticated begin, begin, begin, end, end,
end, kind of sequence of SubtitleEvents that would occur for a Paint-on or
Roll On sequence.

I am simply suggesting adding a couple of sentences to highlight the fact
that there is the potential for non-pop presentations, and the subsequent
implications that arise for the rendering model (i.e. the necessity to
reconstruct the intermediate model and redraw all the content).

[MD> ] Happy to add some informative clarifications.

On the issue of inefficiency, clearly deleting the entire buffer and then
redrawing all of it plus a bit more in Paint-on is inefficient.
For Roll on when a scroll is needed a change to a pointer to where to read
from in the buffer would be more efficient than deleting and rewriting.

[MD> ] Of course.  By why on earth would you do that in a decoder
implementation?  As I've noted, the rendering model is a constraint on the
document complexity (the synchronic documents, actually).

Also it would appear that the CFF model implies that a tt document in CFF
must span the time frame that all the content is on screen for... since a
new document clears the buffers.
If there is a duration limit for a tt document (perhaps due to streaming
chunking) then a repetition of content would often be necessary.

Is there any chunking limitation on the length (temporal) of a document?

[MD> ] No.  But there is a total document size limit.

And if so, then chunking would dictates that the new document starting at
the chunk boundary must carry some / all of the information that was present
in the previous document.

[MD> ] There are no published semantics (anywhere in the world as far as I
know) for continuity or persistent content between TTML documents.  SMPTE
has proposed some be added in TTML 1.1, but I don't understand why a TTML
1.0 rendering model would need to concern itself with such a future
scenario.  SDP-US certainly doesn't define any.

Further if such a case arose, does the CFF model effectively state that a
back to back presentation of this same content across a chunk boundary does
NOT result in a visible flash on the screen?

[MD> ]  Although decoders could probably mitigate this, authors should
create boundaries at sensible points in time to minimize this.  But this is
not really about a rendering model.

**But you are correct that CFF does not force an implementation that cannot
do paint and roll on**, further, I don't believe I stated that.
Instead my issue is that CFF does not **obviously** (at a high level of
abstraction) represent a model that anticipates paint on and roll on
behaviour.
IMHO such a model would include scrolling and cumulative properties.

[MD> ] Sorry it is not obvious.  Nevertheless it does.

I agree that developing a different model in CFF is unlikely, and actually I
believe it unnecessary.
A few (abstract) sentences addressing the expected styles of input in the
SDP documents (or in CFF) would suffice to indicate these potential
implementation requirements.

[MD> ] OK. That's easily solved and is not a barrier to its application for
SDP-US.

Best regards,
John


John Birch | Screen Systems | Strategic Partnerships Manager Main Line : +44
1473 831700 | Ext : 270 | Direct Dial : +44 1473 834532 Mobile : +44 7919
558380 | Fax : +44 1473 830078 John.Birch@screensystems.tv |
www.screensystems.tv | http://twitter.com/ScreenSubtitles

Visit us at
SMPTE Annual Technical Conference & Exhibition,23-24 October, Stand 112
Loews Hollywood hotel, Hollywood

P Before printing, think about the environment-----Original Message-----
From: Michael A Dolan [mailto:mdolan@newtbt.com]
Sent: 28 September 2012 16:20
To: 'Timed Text Working Group'
Subject: RE: ISSUE-188 (render-complexity): Bounding SDP-US rendering
complexity [Simple Delivery Profile for Closed Captions]

John-

I'm not sure how to respond where your argument has no backup and your
reading is cursory, except to restate that the CFF rendering model in no way
forbids simple paint on and animation authoring and decoder behavior, or as
far as anyone knows, any feature at all defined in SDP-US.

Please cite a feature of SDP-US that you believe the CFF rendering model
forces either the document or the decoder to operate inefficiently, and
explain why you believe that it does.

Regards,

        Mike

-----Original Message-----
From: John Birch [mailto:John.Birch@screensystems.tv]
Sent: Friday, September 28, 2012 4:04 AM
To: Michael A Dolan; 'Timed Text Working Group'
Subject: RE: ISSUE-188 (render-complexity): Bounding SDP-US rendering
complexity [Simple Delivery Profile for Closed Captions]

Hi Mike,

My concern is that primarily (and my reading of CFF is cursory) CFF appears
to imply a simplistic pop model for subtitles.

Clearly it is possible using overlapping timing to create a paint on effect
in SDP. And clearly the CFF model does not explicitly preclude such a
mechanism. However, CFF does not, in any obvious fashion, indicate that a
renderer may need to **redraw** parts of the bitmap that have just been
cleared as a result of a subtitle event (begin or end). If SDP did reference
CFF as a potential rendering model, it would seem wise (to me) to add a note
(probably in SDP) about how paint on and roll on modes might cause such
redraw possibilities, and how they might be handled more efficiently by the
renderer for SDP.

Regards,
John

John Birch | Screen Systems | Strategic Partnerships Manager Main Line : +44
1473 831700 | Ext : 270 | Direct Dial : +44 1473 834532 Mobile : +44 7919
558380 | Fax : +44 1473 830078 John.Birch@screensystems.tv |
www.screensystems.tv | http://twitter.com/ScreenSubtitles

Visit us at
SMPTE Annual Technical Conference & Exhibition,23-24 October, Stand 112
Loews Hollywood hotel, Hollywood

P Before printing, think about the environment-----Original Message-----
From: Michael A Dolan [mailto:mdolan@newtbt.com]
Sent: 27 September 2012 21:15
To: 'Timed Text Working Group'
Subject: RE: ISSUE-188 (render-complexity): Bounding SDP-US rendering
complexity [Simple Delivery Profile for Closed Captions]

John-

Although CFF-TT does not explicitly address incremental additions to the
region, that does not mean the model does not apply.  As drafted, it just
takes the time events as a full re-rendering.  A simplification, yes; but it
is incorrect to say that incremental flow ("paint-on") is not supported.
The model is therefore a constraint on the complexity of the Intermediate
Synchronic Documents, not the authored document. And it is definitely not a
constraints the decoder - it can do whatever it wants for efficiency.

I've started a discussion in DECE about the interest in making the model
more complex to explicitly deal with incremental additions. My guess is that
it will not be worth the effort.  And, decoders can always implement
whatever efficiencies that they want.

There is a question that regions scroll at all.  If they do, the behavior in
TTML 1.0 needs a good deal of work, and the same rendering model would apply
as for paint-on described above.  If not, that would be irrelevant to the
CFF-TT (or any) rendering model.  Hence the new issue 189.

Regards,

        Mike

-----Original Message-----
From: John Birch [mailto:John.Birch@screensystems.tv]
Sent: Thursday, September 27, 2012 1:23 AM
To: Timed Text Working Group
Subject: RE: ISSUE-188 (render-complexity): Bounding SDP-US rendering
complexity [Simple Delivery Profile for Closed Captions]

On a quick inspection, the CFF-TT rendering model does not appear to support
Paint on or Roll on (cumulative) subtitles, as every Subtitle Event causes a
clear of the subtitle plane root container?
Certainly a Paint on / Roll On effect could be emulated by resending the
previous caption content already 'assumed' to be displayed (although note
what is currently 'on screen' does depend on when the caption stream was
acquired)... but such a repetitious approach would be markedly inefficient!
Is this not a fundamental limitation for using the CFF model in SDP-US?

Regards,
John Birch

John Birch | Screen Systems | Strategic Partnerships Manager Main Line : +44
1473 831700 | Ext : 270 | Direct Dial : +44 1473 834532 Mobile : +44 7919
558380 | Fax : +44 1473 830078 John.Birch@screensystems.tv |
www.screensystems.tv | http://twitter.com/ScreenSubtitles

Visit us at
SMPTE Annual Technical Conference & Exhibition,23-24 October, Stand 112
Loews Hollywood hotel, Hollywood

P Before printing, think about the environment-----Original Message-----
From: Timed Text Working Group Issue Tracker [mailto:sysbot+tracker@w3.org]
Sent: 26 September 2012 21:16
To: public-tt@w3.org
Subject: ISSUE-188 (render-complexity): Bounding SDP-US rendering complexity
[Simple Delivery Profile for Closed Captions]

ISSUE-188 (render-complexity): Bounding SDP-US rendering complexity [Simple
Delivery Profile for Closed Captions]

http://www.w3.org/AudioVideo/TT/tracker/issues/188

Raised by: Pierre-Anthony Lemieux
On product: Simple Delivery Profile for Closed Captions

Bounding SDP-US rendering complexity
====================================

What
----

SDP-US is a profile of TTML that specifies constraints such as supported
TTML features and number of regions active at any given time. It does not
however impose bounds on key aspects of rendering complexity, such as
character and background drawing rates. Without such bounds, a valid SDP-US
document might not successfully play on all implementations or,
equivalently, determining the processing requirements of an implementation
is not possible.

CFF-TT is a profile of TTML developed by the DECE consortium
(http://uvvu.com) for internet delivery of subtitles and captions. Consumer
devices implementing CFF-TT are expected to be widely deployed. The CFF-TT
specification is publicly available at
http://uvvu.com/docs/public/tspec/CFFMediaFormat-1.0.4.pdf.

As with SDP-US, CFF-TT specifies supported TTML features -- largely a
superset of the features supported by SDP-US. To further simplify
implementation and improve interoperability, CFF-TT also imposes bounds on
rendering complexity through the use of an hypothetical rendering model.

SDP-US should consider adopting, a subset of or in its entirety, the
rendering complexity bounds (and rendering model) defined by CFF-TT.

Why
---

Such adoption would futher:
        - simplify implementations and improve interoperability by bounding
rendering (and thus document) complexity
        - encourage adoption of SDP-US and TTML by ensuring that SDP-US
content can be played on any CFF-compliant CE device

How
---

Adopting the CFF-TT hypothetical renderer and bounds on document complexity
could be achieved in a number of ways, including:

(a) mapping the CFF-TT rendering model to the existing (XSL-based) TTML
rendering model
(b) referencing the relevant sections of the CFF-TT specification defining
the CFF-TT rendering model
(c) importing the CFF-TT rendering model into the SDP-US specification




This message may contain confidential and/or privileged information. If you
are not the intended recipient you must not use, copy, disclose or take any
action based on this message or any information herein. If you have received
this message in error, please advise the sender immediately by reply e-mail
and delete this message. Thank you for your cooperation. Screen Subtitling
Systems Ltd. Registered in England No. 2596832. Registered Office: The Old
Rectory, Claydon Church Lane, Claydon, Ipswich, Suffolk, IP6 0EQ
Received on Wednesday, 3 October 2012 16:46:06 UTC