RE: chunked IMSC1

There will necessarily be tradeoffs between LL live versus VoD scenarios; and full vocabulary or some subset.

-----Original Message-----
From: Nigel Megitt <nigel.megitt@bbc.co.uk> 
Sent: Monday, March 21, 2022 4:02 AM
To: Pierre-Anthony Lemieux <pal@sandflow.com>
Cc: Cyril Concolato <cconcolato@netflix.com>; Michael Dolan <mike@dolan.tv>; Glenn Adams <glenn@skynav.com>; public-tt@w3.org
Subject: Re: chunked IMSC1

>     I have yet to see practical scenarios that require more than a few
    regions defined upfront: vertical and horizontal alignment are usually
    sufficient to handle positioning.

Our live subtitle practice allows the subtitle author to move text vertically at arbitrary times in order to avoid overlap with important parts of the burned in image. I see no reason why this would not also be the practice in low latency scenarios. One consequence would be that it is in general not possible to know at the beginning of a segment all of the positions that will be used during the segment.

It may be possible to predefine all the possible regions that will be used and repeat them in the chunk containing the layout element for every segment, but it does not seem like a desirable solution.

Nigel


On 18/03/2022, 16:02, "Pierre-Anthony Lemieux" <pal@sandflow.com> wrote:

    >  If a new region needs to be defined part way through, it cannot be done if regions must be defined in the head.

    I have yet to see practical scenarios that require more than a few
    regions defined upfront: vertical and horizontal alignment are usually
    sufficient to handle positioning.

    -- Pierre

    On Fri, Mar 18, 2022 at 8:33 AM Nigel Megitt <nigel.megitt@bbc.co.uk> wrote:
    >
    > >In particular, if your TTML parser is a SAX-based, progressive parser and the document is authored with proper constraints (i.e. document order matches time order), you could use HTTP chunked transfer (e.g. mapping one HTTP chunk to a <p> or a <div>) of TTML.
    >
    > I think there are some complexities doing this in a live authoring scenario. If a new region needs to be defined part way through, it cannot be done if regions must be defined in the head. Perhaps this is a good argument for anonymous inline regions and inline styles being permitted, which they are not in all profiles.
    >
    > Nigel
    >
    >
    > ________________________________
    > From: Cyril Concolato <cconcolato@netflix.com>
    > Sent: Friday, March 18, 2022 3:16 pm
    > To: Michael Dolan <mike@dolan.tv>
    > Cc: Nigel Megitt <nigel.megitt@bbc.co.uk>; Glenn Adams <glenn@skynav.com>; public-tt@w3.org <public-tt@w3.org>
    > Subject: Re: chunked IMSC1
    >
    >
    >
    > On Fri, Mar 18, 2022 at 7:32 AM Michael Dolan <mike@dolan.tv> wrote:
    >>
    >> Hi Nigel,
    >>
    >>
    >>
    >> Unlike TTML, there is no need to wait for a full segment of video or audio to start sending the packets and for the decoder to start to decode and present them long before the whole segment arrives.  Video and audio is “chunked” today and transmitted long before the encoding of the entire segment.
    >>
    >>
    >>
    >> TTML cannot have more than one sample/segment (14496-30).
    >
    > ISO/IEC 14496-30 is about overall carriage of TTML in MP4. It does not impose any application specific constraints, in particular does not constrain segmented media. CMAF talks about segmented media but I don't recall any restriction. Maybe DASH-IF has such a restriction?
    >
    > That said, the general solutions for progressively downloading XML are well known and applicable to TTML. In particular, if your TTML parser is a SAX-based, progressive parser and the document is authored with proper constraints (i.e. document order matches time order), you could use HTTP chunked transfer (e.g. mapping one HTTP chunk to a <p> or a <div>) of TTML.
    >
    > HTH,
    > Cyril
    >
    >>
    >>
    >>
    >> The minimum CMAF Segment duration (960ms) applies to all codecs.  And changing it would increase the coding overhead. The goal is longer, more efficient, segments that are chunked, not shorter inefficient ones.
    >>
    >>
    >>
    >>               Mike
    >>
    >>
    >>
    >> From: Nigel Megitt <nigel.megitt@bbc.co.uk>
    >> Sent: Friday, March 18, 2022 7:19 AM
    >> To: Michael Dolan <mike@dolan.tv>; Glenn Adams <glenn@skynav.com>
    >> Cc: public-tt@w3.org
    >> Subject: Re: chunked IMSC1
    >>
    >>
    >>
    >> Hi Mike,
    >>
    >>
    >>
    >> In these low latency scenarios, what is the expected latency for encoding the video? Might it be possible to send the TTML for the whole segment before all the video has been encoded, and thus work around the “holding back” problem?
    >>
    >>
    >>
    >> We demonstrated with EBU-TT Live that, given an appropriate carriage mechanism, it is possible to send real time updates that are whole TTML documents. In the case of EBU-TT Live we were typically sending whole (short) documents at arbitrary times, corresponding to changes of presentation, but it would work equally well to send documents at predetermined intervals. If the issue is CMAF minimum segment durations, another solution might be to construct each CMAF TTML Segment out of multiple samples, where each sample is a whole document. Or change the CMAF minimum segment duration, of course.
    >>
    >>
    >>
    >> Glenn, when you referenced EXI, were you talking particularly about EXI streaming, rather than EXI as a mechanism for compression?
    >>
    >>
    >>
    >> Nigel
    >>
    >>
    >>
    >>
    >>
    >>
    >>
    >> From: Michael Dolan <mike@dolan.tv>
    >> Date: Friday, 18 March 2022 at 13:50
    >> To: Glenn Adams <glenn@skynav.com>, Nigel Megitt <nigel.megitt@bbc.co.uk>
    >> Cc: "public-tt@w3.org" <public-tt@w3.org>
    >> Subject: RE: chunked IMSC1
    >>
    >>
    >>
    >> Hi Nigel and Glenn,
    >>
    >>
    >>
    >> This use case is for the live, low latency (LLL) scenario.  There is no reason to do it in VoD scenarios that TTML was designed for.
    >>
    >>
    >>
    >> The problem is that, unlike video and audio, “normal TTML” cannot *begin* to be decoded until after encoding and reception of the entire segment/document.  This means that video and audio segments must be “held back” for the segment duration so that the decoding and presentation remains in sync. Given that LLL applications expect on the order of 500ms delay at worst, this just doesn’t work especially when, e.g. CMAF segments, are constrained to >960ms.  Decoding and presentation must necessarily begin with a partial segment, like video and audio.
    >>
    >>
    >>
    >> Yes, a solution will likely require slightly special encoding and decoding processing and perhaps a constrained vocabulary (e.g. no <set>), although I would not postulate the complexity or issues at this time. That would be for further study.
    >>
    >>
    >>
    >> The alternative is frankly not to use TTML.
    >>
    >>
    >>
    >>               Mike
    >>
    >>
    >>
    >> From: Glenn Adams <glenn@skynav.com>
    >> Sent: Friday, March 18, 2022 6:33 AM
    >> To: Michael Dolan <mike@dolan.tv>
    >> Cc: public-tt@w3.org
    >> Subject: Re: chunked IMSC1
    >>
    >>
    >>
    >> Neither IMSC nor TTML reqs explicitly address this use case. Both operate on "document instances" as their input and require such instances to be well-formed (in an XML sense).
    >>
    >>
    >>
    >> To do what you suggest, it would be necessary to progressively reparse a dynamically updated concretely encoded document instance, and only proceed with subsequent processing (as an XML infoset) for parses that proved well formed. This might require an underlying buffering layer to append a temporary postfix to the buffer prior to each parse attempt in order to supply missing close tags.
    >>
    >>
    >>
    >> If I were designing such a system, I would first evaluate the potential use of EXI.
    >>
    >>
    >>
    >>
    >>
    >> On Thu, Mar 17, 2022 at 2:34 PM Michael Dolan <mike@dolan.tv> wrote:
    >>
    >> All,
    >>
    >>
    >>
    >> Has anyone thought about this lately?  In the TTML1 days we pondered it a bit.  By “chunked” I mean in the HTTP sense.  That is, an IMSC1 document could be broken into pieces and delivered a few bytes at a time say every 500ms.  There are low latency use cases that need this sort of delivery where decode and presentation continues throughout a certain period. Today, when building ISO BMFF segments, one has to gather up all text over several seconds, create a well-formed document, and then deliver that well-formed document before presentation can begin. This inserts a delay relative to how 608 and Teletext work. And, it inserts the same delay into the video and audio as well – that is, the decoder cannot start decoding video and audio until the text is ready to go.
    >>
    >>
    >>
    >> Even if no one has pondered this lately, is there interest?
    >>
    >>
    >>
    >> Regards,
    >>
    >>               Mike
    >>
    >>
    >>
    >> "keep calm and carry on"
    >>
    >> -----------------------
    >>
    >> Michael DOLAN
    >>
    >> TBT Inc
    >>
    >> Del Mar, CA USA
    >>
    >> +1-858-882-7497 (mobile)
    >>
    >>

Received on Monday, 21 March 2022 16:34:49 UTC