RE: chunked IMSC1 from Michael Dolan on 2022-03-18 (public-tt@w3.org from March 2022)

From: Michael Dolan <mike@dolan.tv>
Date: Fri, 18 Mar 2022 14:32:06 +0000
To: Nigel Megitt <nigel.megitt@bbc.co.uk>, Glenn Adams <glenn@skynav.com>
CC: "public-tt@w3.org" <public-tt@w3.org>
Message-ID: <BY5PR10MB387655A3AAA6787E85CA2792B4139@BY5PR10MB3876.namprd10.prod.outlook.com>
Hi Nigel,

Unlike TTML, there is no need to wait for a full segment of video or audio to start sending the packets and for the decoder to start to decode and present them long before the whole segment arrives.  Video and audio is “chunked” today and transmitted long before the encoding of the entire segment.

TTML cannot have more than one sample/segment (14496-30).

The minimum CMAF Segment duration (960ms) applies to all codecs.  And changing it would increase the coding overhead. The goal is longer, more efficient, segments that are chunked, not shorter inefficient ones.

              Mike

From: Nigel Megitt <nigel.megitt@bbc.co.uk>
Sent: Friday, March 18, 2022 7:19 AM
To: Michael Dolan <mike@dolan.tv>; Glenn Adams <glenn@skynav.com>
Cc: public-tt@w3.org
Subject: Re: chunked IMSC1

Hi Mike,

In these low latency scenarios, what is the expected latency for encoding the video? Might it be possible to send the TTML for the whole segment before all the video has been encoded, and thus work around the “holding back” problem?

We demonstrated with EBU-TT Live that, given an appropriate carriage mechanism, it is possible to send real time updates that are whole TTML documents. In the case of EBU-TT Live we were typically sending whole (short) documents at arbitrary times, corresponding to changes of presentation, but it would work equally well to send documents at predetermined intervals. If the issue is CMAF minimum segment durations, another solution might be to construct each CMAF TTML Segment out of multiple samples, where each sample is a whole document. Or change the CMAF minimum segment duration, of course.

Glenn, when you referenced EXI, were you talking particularly about EXI streaming, rather than EXI as a mechanism for compression?

Nigel



From: Michael Dolan <mike@dolan.tv<mailto:mike@dolan.tv>>
Date: Friday, 18 March 2022 at 13:50
To: Glenn Adams <glenn@skynav.com<mailto:glenn@skynav.com>>, Nigel Megitt <nigel.megitt@bbc.co.uk<mailto:nigel.megitt@bbc.co.uk>>
Cc: "public-tt@w3.org<mailto:public-tt@w3.org>" <public-tt@w3.org<mailto:public-tt@w3.org>>
Subject: RE: chunked IMSC1

Hi Nigel and Glenn,

This use case is for the live, low latency (LLL) scenario.  There is no reason to do it in VoD scenarios that TTML was designed for.

The problem is that, unlike video and audio, “normal TTML” cannot *begin* to be decoded until after encoding and reception of the entire segment/document.  This means that video and audio segments must be “held back” for the segment duration so that the decoding and presentation remains in sync. Given that LLL applications expect on the order of 500ms delay at worst, this just doesn’t work especially when, e.g. CMAF segments, are constrained to >960ms.  Decoding and presentation must necessarily begin with a partial segment, like video and audio.

Yes, a solution will likely require slightly special encoding and decoding processing and perhaps a constrained vocabulary (e.g. no <set>), although I would not postulate the complexity or issues at this time. That would be for further study.

The alternative is frankly not to use TTML.

              Mike

From: Glenn Adams <glenn@skynav.com<mailto:glenn@skynav.com>>
Sent: Friday, March 18, 2022 6:33 AM
To: Michael Dolan <mike@dolan.tv<mailto:mike@dolan.tv>>
Cc: public-tt@w3.org<mailto:public-tt@w3.org>
Subject: Re: chunked IMSC1

Neither IMSC nor TTML reqs explicitly address this use case. Both operate on "document instances" as their input and require such instances to be well-formed (in an XML sense).

To do what you suggest, it would be necessary to progressively reparse a dynamically updated concretely encoded document instance, and only proceed with subsequent processing (as an XML infoset) for parses that proved well formed. This might require an underlying buffering layer to append a temporary postfix to the buffer prior to each parse attempt in order to supply missing close tags.

If I were designing such a system, I would first evaluate the potential use of EXI<https://www.w3.org/TR/exi/>.


On Thu, Mar 17, 2022 at 2:34 PM Michael Dolan <mike@dolan.tv<mailto:mike@dolan.tv>> wrote:
All,

Has anyone thought about this lately?  In the TTML1 days we pondered it a bit.  By “chunked” I mean in the HTTP sense.  That is, an IMSC1 document could be broken into pieces and delivered a few bytes at a time say every 500ms.  There are low latency use cases that need this sort of delivery where decode and presentation continues throughout a certain period. Today, when building ISO BMFF segments, one has to gather up all text over several seconds, create a well-formed document, and then deliver that well-formed document before presentation can begin. This inserts a delay relative to how 608 and Teletext work. And, it inserts the same delay into the video and audio as well – that is, the decoder cannot start decoding video and audio until the text is ready to go.

Even if no one has pondered this lately, is there interest?

Regards,
              Mike

"keep calm and carry on"
-----------------------
Michael DOLAN
TBT Inc
Del Mar, CA USA
+1-858-882-7497 (mobile)
Received on Friday, 18 March 2022 14:32:22 UTC