Re: chunked IMSC1 from Andreas Tai on 2022-03-18 (public-tt@w3.org from March 2022)

From: Andreas Tai <w3c@andreastai.com>
Date: Fri, 18 Mar 2022 17:58:15 +0100
To: Michael Dolan <mike@dolan.tv>, Nigel Megitt <nigel.megitt@bbc.co.uk>
Cc: Glenn Adams <glenn@skynav.com>, "public-tt@w3.org" <public-tt@w3.org>, Cyril Concolato <cconcolato@netflix.com>
Message-ID: <bf38e78a-6a94-5dc6-dc89-ec745e2166a6@andreastai.com>
The W3C XSLT 3.0 specification adds streaming for processing XML input 
[1]. The main use case has been for processing very large XML files 
(e.g., from genetic research), but it may be applicable in your use case 
as well. It allows template rules to be applied (e.g., for a tt:p 
element) before the entire document is read.


[1] https://www.w3.org/TR/xslt-30/#streaming-concepts

Am 18.03.2022 um 16:32 schrieb Michael Dolan:
> Hi Nigel,
> 
> I have a timeline picture and a few folks that feel this problem needs 
> to be solved.
> 
> Changing those two key ISO specs is not really an option – multiple 
> samples/segment, even if everyone agreed, would be non-backwards 
> compatible; and the minimum segment duration has been discussed ad 
> nauseum and, even if everyone could be swayed to reduce it, it induces 
> undesirable inefficiency.
> 
> If it helps you to stare at a possible solution, consider the following:
> 
> Chunk/Packet #1 start of segment/sample:
> 
> <tt><head/><body><div><p begin=0 dur=0.5>The</p>
> 
> Chunk/Packet #2:
> 
> <p begin=0.5 dur=0.5>The quick</p>
> 
> …
> 
> Chunk/Packet #N and end of segment/sample:
> 
> </div><body/></tt>
> 
> The full segment/sample is a well formed normal TTML document.  But a 
> savvy decoder could incrementally decode/display the content of the 
> chunks. Non-savvy decoders will still work fine.
> 
> I’m not suggesting this is the best solution or not – just an example of 
> how to solve this in a non-disruptive manner.
> 
> I would prefer to have TTWG’s help with this.
> 
>                Mike
> 
> *From:* Cyril Concolato <cconcolato@netflix.com>
> *Sent:* Friday, March 18, 2022 8:16 AM
> *To:* Michael Dolan <mike@dolan.tv>
> *Cc:* Nigel Megitt <nigel.megitt@bbc.co.uk>; Glenn Adams 
> <glenn@skynav.com>; public-tt@w3.org
> *Subject:* Re: chunked IMSC1
> 
> On Fri, Mar 18, 2022 at 7:32 AM Michael Dolan <mike@dolan.tv 
> <mailto:mike@dolan.tv>> wrote:
> 
>     Hi Nigel,
> 
>     Unlike TTML, there is no need to wait for a full segment of video or
>     audio to start sending the packets and for the decoder to start to
>     decode and present them long before the whole segment arrives. 
>     Video and audio is “chunked” today and transmitted long before the
>     encoding of the entire segment.
> 
>     TTML cannot have more than one sample/segment (14496-30).
> 
> ISO/IEC 14496-30 is about overall carriage of TTML in MP4. It does not 
> impose any application specific constraints, in particular does not 
> constrain segmented media. CMAF talks about segmented media but I don't 
> recall any restriction. Maybe DASH-IF has such a restriction?
> 
> That said, the general solutions for progressively downloading XML are 
> well known and applicable to TTML. In particular, if your TTML parser is 
> a SAX-based, progressive parser and the document is authored with proper 
> constraints (i.e. document order matches time order), you could use HTTP 
> chunked transfer (e.g. mapping one HTTP chunk to a <p> or a <div>) of TTML.
> 
> HTH,
> 
> Cyril
> 
>     The minimum CMAF Segment duration (960ms) applies to all codecs. 
>     And changing it would increase the coding overhead. The goal is
>     longer, more efficient, segments that are chunked, not shorter
>     inefficient ones.
> 
>                    Mike
> 
>     *From:* Nigel Megitt <nigel.megitt@bbc.co.uk
>     <mailto:nigel.megitt@bbc.co.uk>>
>     *Sent:* Friday, March 18, 2022 7:19 AM
>     *To:* Michael Dolan <mike@dolan.tv <mailto:mike@dolan.tv>>; Glenn
>     Adams <glenn@skynav.com <mailto:glenn@skynav.com>>
>     *Cc:* public-tt@w3.org <mailto:public-tt@w3.org>
>     *Subject:* Re: chunked IMSC1
> 
>     Hi Mike,
> 
>     In these low latency scenarios, what is the expected latency for
>     encoding the video? Might it be possible to send the TTML for the
>     whole segment before all the video has been encoded, and thus work
>     around the “holding back” problem?
> 
>     We demonstrated with EBU-TT Live that, given an appropriate carriage
>     mechanism, it is possible to send real time updates that are whole
>     TTML documents. In the case of EBU-TT Live we were typically sending
>     whole (short) documents at arbitrary times, corresponding to changes
>     of presentation, but it would work equally well to send documents at
>     predetermined intervals. If the issue is CMAF minimum segment
>     durations, another solution might be to construct each CMAF TTML
>     Segment out of multiple samples, where each sample is a whole
>     document. Or change the CMAF minimum segment duration, of course.
> 
>     Glenn, when you referenced EXI, were you talking particularly about
>     EXI /streaming/, rather than EXI as a mechanism for compression?
> 
>     Nigel
> 
>     *From: *Michael Dolan <mike@dolan.tv <mailto:mike@dolan.tv>>
>     *Date: *Friday, 18 March 2022 at 13:50
>     *To: *Glenn Adams <glenn@skynav.com <mailto:glenn@skynav.com>>,
>     Nigel Megitt <nigel.megitt@bbc.co.uk <mailto:nigel.megitt@bbc.co.uk>>
>     *Cc: *"public-tt@w3.org <mailto:public-tt@w3.org>" <public-tt@w3.org
>     <mailto:public-tt@w3.org>>
>     *Subject: *RE: chunked IMSC1
> 
>     Hi Nigel and Glenn,
> 
>     This use case is for the live, low latency (LLL) scenario.  There is
>     no reason to do it in VoD scenarios that TTML was designed for.
> 
>     The problem is that, unlike video and audio, “normal TTML” cannot
>     **begin** to be decoded until after encoding and reception of the
>     entire segment/document.  This means that video and audio segments
>     must be “held back” for the segment duration so that the decoding
>     and presentation remains in sync. Given that LLL applications expect
>     on the order of 500ms delay at worst, this just doesn’t work
>     especially when, e.g. CMAF segments, are constrained to >960ms. 
>     Decoding and presentation must necessarily begin with a partial
>     segment, like video and audio.
> 
>     Yes, a solution will likely require slightly special encoding and
>     decoding processing and perhaps a constrained vocabulary (e.g. no
>     <set>), although I would not postulate the complexity or issues at
>     this time. That would be for further study.
> 
>     The alternative is frankly not to use TTML.
> 
>                    Mike
> 
>     *From:* Glenn Adams <glenn@skynav.com <mailto:glenn@skynav.com>>
>     *Sent:* Friday, March 18, 2022 6:33 AM
>     *To:* Michael Dolan <mike@dolan.tv <mailto:mike@dolan.tv>>
>     *Cc:* public-tt@w3.org <mailto:public-tt@w3.org>
>     *Subject:* Re: chunked IMSC1
> 
>     Neither IMSC nor TTML reqs explicitly address this use case. Both
>     operate on "document instances" as their input and require such
>     instances to be well-formed (in an XML sense).
> 
>     To do what you suggest, it would be necessary to progressively
>     reparse a dynamically updated concretely encoded document instance,
>     and only proceed with subsequent processing (as an XML infoset) for
>     parses that proved well formed. This might require an underlying
>     buffering layer to append a temporary postfix to the buffer prior to
>     each parse attempt in order to supply missing close tags.
> 
>     If I were designing such a system, I would first evaluate the
>     potential use of EXI <https://www.w3.org/TR/exi/>.
> 
>     On Thu, Mar 17, 2022 at 2:34 PM Michael Dolan <mike@dolan.tv
>     <mailto:mike@dolan.tv>> wrote:
> 
>         All,
> 
>         Has anyone thought about this lately?  In the TTML1 days we
>         pondered it a bit.  By “chunked” I mean in the HTTP sense.  That
>         is, an IMSC1 document could be broken into pieces and delivered
>         a few bytes at a time say every 500ms.  There are low latency
>         use cases that need this sort of delivery where decode and
>         presentation continues throughout a certain period. Today, when
>         building ISO BMFF segments, one has to gather up all text over
>         several seconds, create a well-formed document, and then deliver
>         that well-formed document before presentation can begin. This
>         inserts a delay relative to how 608 and Teletext work. And, it
>         inserts the same delay into the video and audio as well – that
>         is, the decoder cannot start decoding video and audio until the
>         text is ready to go.
> 
>         Even if no one has pondered this lately, is there interest?
> 
>         Regards,
> 
>                        Mike
> 
>         "/keep calm and carry on"/
> 
>         -----------------------
> 
>         Michael DOLAN
> 
>         TBT Inc
> 
>         Del Mar, CA USA
> 
>         +1-858-882-7497 (mobile)
>
Received on Friday, 18 March 2022 17:00:37 UTC