RE: chunked IMSC1 from Michael Dolan on 2022-03-18 (public-tt@w3.org from March 2022)

From: Michael Dolan <mike@dolan.tv>
Date: Fri, 18 Mar 2022 13:49:51 +0000
To: Glenn Adams <glenn@skynav.com>, "Nigel Megitt (nigel.megitt@bbc.co.uk)" <nigel.megitt@bbc.co.uk>
CC: "public-tt@w3.org" <public-tt@w3.org>
Message-ID: <BY5PR10MB38764C8454333D4A8660C29DB4139@BY5PR10MB3876.namprd10.prod.outlook.com>

Hi Nigel and Glenn,

This use case is for the live, low latency (LLL) scenario.  There is no reason to do it in VoD scenarios that TTML was designed for.

The problem is that, unlike video and audio, “normal TTML” cannot *begin* to be decoded until after encoding and reception of the entire segment/document.  This means that video and audio segments must be “held back” for the segment duration so that the decoding and presentation remains in sync. Given that LLL applications expect on the order of 500ms delay at worst, this just doesn’t work especially when, e.g. CMAF segments, are constrained to >960ms.  Decoding and presentation must necessarily begin with a partial segment, like video and audio.

Yes, a solution will likely require slightly special encoding and decoding processing and perhaps a constrained vocabulary (e.g. no <set>), although I would not postulate the complexity or issues at this time. That would be for further study.

The alternative is frankly not to use TTML.

              Mike

From: Glenn Adams <glenn@skynav.com>
Sent: Friday, March 18, 2022 6:33 AM
To: Michael Dolan <mike@dolan.tv>
Cc: public-tt@w3.org
Subject: Re: chunked IMSC1

Neither IMSC nor TTML reqs explicitly address this use case. Both operate on "document instances" as their input and require such instances to be well-formed (in an XML sense).

To do what you suggest, it would be necessary to progressively reparse a dynamically updated concretely encoded document instance, and only proceed with subsequent processing (as an XML infoset) for parses that proved well formed. This might require an underlying buffering layer to append a temporary postfix to the buffer prior to each parse attempt in order to supply missing close tags.

If I were designing such a system, I would first evaluate the potential use of EXI<https://www.w3.org/TR/exi/>.


On Thu, Mar 17, 2022 at 2:34 PM Michael Dolan <mike@dolan.tv<mailto:mike@dolan.tv>> wrote:
All,

Has anyone thought about this lately?  In the TTML1 days we pondered it a bit.  By “chunked” I mean in the HTTP sense.  That is, an IMSC1 document could be broken into pieces and delivered a few bytes at a time say every 500ms.  There are low latency use cases that need this sort of delivery where decode and presentation continues throughout a certain period. Today, when building ISO BMFF segments, one has to gather up all text over several seconds, create a well-formed document, and then deliver that well-formed document before presentation can begin. This inserts a delay relative to how 608 and Teletext work. And, it inserts the same delay into the video and audio as well – that is, the decoder cannot start decoding video and audio until the text is ready to go.

Even if no one has pondered this lately, is there interest?

Regards,
              Mike

"keep calm and carry on"
-----------------------
Michael DOLAN
TBT Inc
Del Mar, CA USA
+1-858-882-7497 (mobile)

Received on Friday, 18 March 2022 13:50:07 UTC