Re: chunked IMSC1 from Nigel Megitt on 2022-03-18 (public-tt@w3.org from March 2022)

From: Nigel Megitt <nigel.megitt@bbc.co.uk>
Date: Fri, 18 Mar 2022 15:00:42 +0000
To: Michael Dolan <mike@dolan.tv>, Glenn Adams <glenn@skynav.com>
CC: "public-tt@w3.org" <public-tt@w3.org>
Message-ID: <069B141F-9FF5-46C7-88C8-645CA3DB616E@bbc.co.uk>
Thanks Mike,

That still doesn’t explain what the expected encode time is for the video. When I am presented with a problem like this I have usually found that drawing a timeline of all the events shows that there is a path through that gets the subtitles there in time for the video to be decoded. Possibly that’s not the case here though, I don’t know.

I get that the current ISO specs set constraints on samples/segment and duration per segment – I’m suggesting that if there’s an application requirement not met by those current specs, changing those constraints might be the easiest path, where “easiest” is very much a relative term!

An alternative might be to deliver ISDs (or a set of ISDs) per chunk, but that would also require ISO spec changes, so I’m not sure how helpful it is as a suggestion.

Nigel

From: Michael Dolan <mike@dolan.tv>
Date: Friday, 18 March 2022 at 14:32
To: Nigel Megitt <nigel.megitt@bbc.co.uk>, Glenn Adams <glenn@skynav.com>
Cc: "public-tt@w3.org" <public-tt@w3.org>
Subject: RE: chunked IMSC1

Hi Nigel,

Unlike TTML, there is no need to wait for a full segment of video or audio to start sending the packets and for the decoder to start to decode and present them long before the whole segment arrives.  Video and audio is “chunked” today and transmitted long before the encoding of the entire segment.

TTML cannot have more than one sample/segment (14496-30).

The minimum CMAF Segment duration (960ms) applies to all codecs.  And changing it would increase the coding overhead. The goal is longer, more efficient, segments that are chunked, not shorter inefficient ones.

              Mike

From: Nigel Megitt <nigel.megitt@bbc.co.uk>
Sent: Friday, March 18, 2022 7:19 AM
To: Michael Dolan <mike@dolan.tv>; Glenn Adams <glenn@skynav.com>
Cc: public-tt@w3.org
Subject: Re: chunked IMSC1

Hi Mike,

In these low latency scenarios, what is the expected latency for encoding the video? Might it be possible to send the TTML for the whole segment before all the video has been encoded, and thus work around the “holding back” problem?

We demonstrated with EBU-TT Live that, given an appropriate carriage mechanism, it is possible to send real time updates that are whole TTML documents. In the case of EBU-TT Live we were typically sending whole (short) documents at arbitrary times, corresponding to changes of presentation, but it would work equally well to send documents at predetermined intervals. If the issue is CMAF minimum segment durations, another solution might be to construct each CMAF TTML Segment out of multiple samples, where each sample is a whole document. Or change the CMAF minimum segment duration, of course.

Glenn, when you referenced EXI, were you talking particularly about EXI streaming, rather than EXI as a mechanism for compression?

Nigel



From: Michael Dolan <mike@dolan.tv<mailto:mike@dolan.tv>>
Date: Friday, 18 March 2022 at 13:50
To: Glenn Adams <glenn@skynav.com<mailto:glenn@skynav.com>>, Nigel Megitt <nigel.megitt@bbc.co.uk<mailto:nigel.megitt@bbc.co.uk>>
Cc: "public-tt@w3.org<mailto:public-tt@w3.org>" <public-tt@w3.org<mailto:public-tt@w3.org>>
Subject: RE: chunked IMSC1

Hi Nigel and Glenn,

This use case is for the live, low latency (LLL) scenario.  There is no reason to do it in VoD scenarios that TTML was designed for.

The problem is that, unlike video and audio, “normal TTML” cannot *begin* to be decoded until after encoding and reception of the entire segment/document.  This means that video and audio segments must be “held back” for the segment duration so that the decoding and presentation remains in sync. Given that LLL applications expect on the order of 500ms delay at worst, this just doesn’t work especially when, e.g. CMAF segments, are constrained to >960ms.  Decoding and presentation must necessarily begin with a partial segment, like video and audio.

Yes, a solution will likely require slightly special encoding and decoding processing and perhaps a constrained vocabulary (e.g. no <set>), although I would not postulate the complexity or issues at this time. That would be for further study.

The alternative is frankly not to use TTML.

              Mike

From: Glenn Adams <glenn@skynav.com<mailto:glenn@skynav.com>>
Sent: Friday, March 18, 2022 6:33 AM
To: Michael Dolan <mike@dolan.tv<mailto:mike@dolan.tv>>
Cc: public-tt@w3.org<mailto:public-tt@w3.org>
Subject: Re: chunked IMSC1

Neither IMSC nor TTML reqs explicitly address this use case. Both operate on "document instances" as their input and require such instances to be well-formed (in an XML sense).

To do what you suggest, it would be necessary to progressively reparse a dynamically updated concretely encoded document instance, and only proceed with subsequent processing (as an XML infoset) for parses that proved well formed. This might require an underlying buffering layer to append a temporary postfix to the buffer prior to each parse attempt in order to supply missing close tags.

If I were designing such a system, I would first evaluate the potential use of EXI<https://www.w3.org/TR/exi/>.


On Thu, Mar 17, 2022 at 2:34 PM Michael Dolan <mike@dolan.tv<mailto:mike@dolan.tv>> wrote:
All,

Has anyone thought about this lately?  In the TTML1 days we pondered it a bit.  By “chunked” I mean in the HTTP sense.  That is, an IMSC1 document could be broken into pieces and delivered a few bytes at a time say every 500ms.  There are low latency use cases that need this sort of delivery where decode and presentation continues throughout a certain period. Today, when building ISO BMFF segments, one has to gather up all text over several seconds, create a well-formed document, and then deliver that well-formed document before presentation can begin. This inserts a delay relative to how 608 and Teletext work. And, it inserts the same delay into the video and audio as well – that is, the decoder cannot start decoding video and audio until the text is ready to go.

Even if no one has pondered this lately, is there interest?

Regards,
              Mike

"keep calm and carry on"
-----------------------
Michael DOLAN
TBT Inc
Del Mar, CA USA
+1-858-882-7497 (mobile)




----------------------------

http://www.bbc.co.uk

This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.

---------------------
Received on Friday, 18 March 2022 15:01:12 UTC