Fwd: Fwd: [TTML-WEBVTT] How to map multiple <p> that share same time range to WebVTT from Andreas Tai on 2021-11-15 (public-tt@w3.org from November 2021)

From: Andreas Tai <w3c@andreastai.com>
Date: Mon, 15 Nov 2021 11:47:37 +0100
To: boy@unified-streaming.com
Cc: public-tt@w3.org, Pierre-Anthony Lemieux <pal@sandflow.com>
Message-ID: <24c8b19e-e3a3-34e4-1d9c-d0209b9c19ca@andreastai.com>
Dear Boy,

Thank you so much for bringing this up. It is important to get feedback 
from operation. Your input is therefore very valuable.

It is important to note that the mapping document you use is not a 
specification. As an editorial draft, it has a very low maturity level. 
  The Status section states:

"This is a draft document and may be updated, replaced or obsoleted by 
other documents at any time. It is inappropriate to cite this document 
as other than work in progress."

As Pierre said, the document has not been updated in many years. The 
reason for putting this document on hold was to wait until the WebVTT 
specification reached the status of a recommendation.

There are two lessons we can learn from your feedback:

1. there may be increased interest in guidance on mapping between TTML 
and WebVTT.
2. the status of the mapping document is not clear.

Both of these topics could be discussed at one of our next TTWG 
meetings. Regarding the status of the document: we should perhaps add a 
note or warning.

Regarding the issue you raised. As discussed, right now using ISD's 
seems to be the best solution for your use case.  It might also be a 
good idea to revisit how your use case should be handled according to 
the current WebVTT document.

Thanks again and best regards,
Andreas


---------- Forwarded message ---------
From: *Pierre-Anthony Lemieux* <pal@sandflow.com <mailto:pal@sandflow.com>>
Date: Fri, Nov 12, 2021 at 5:21 PM
Subject: Re: [TTML-WEBVTT] How to map multiple <p> that share same time 
range to WebVTT
To: Boy van Dijk <boy@unified-streaming.com 
<mailto:boy@unified-streaming.com>
Cc: TTWG <public-tt@w3.org <mailto:public-tt@w3.org>>


Hi Boy,

  > - Does that mean you're of the opinion that the specification should 
be changed in some way?

Probably -- or the specification should be deprecated.

Fundamentally, the subset of TTML [1] supported by the specification
does not correspond to common subsets, e.g. IMSC, and supports only a
limited subset of TTML features, e.g. see #timing constraints.

The specification has not been touched in 6 years.

@Andreas Tai who contributed to both the WebVTT-TTML mapping
specification and ttconv might have an opinion.

  >  there seems to be no reference to ISDs in the document, so I'm not 
sure how a person reading this specification
  > should know that a conversion to ISDs needs to take place first, or 
that ISDs play some other role in the conversion process.

I do not disagree.

Best,

-- Pierre

[1] 
https://w3c.github.io/ttml-webvtt-mapping/#the-ttml-to-webvtt-profile 
<https://w3c.github.io/ttml-webvtt-mapping/#the-ttml-to-webvtt-profile>

On Fri, Nov 12, 2021 at 4:43 AM Boy van Dijk <boy@unified-streaming.com 
<mailto:boy@unified-streaming.com>> wrote:
  >
  > Hi Pierre,
  >
  > Thanks for your response and sorry for leaving it waiting a little 
bit. I anticipated there would perhaps be more opinions on this.
  >
  > Unfortunately, I believe my initial message might not have been 
entirely clear because the original formatting was removed. For example:
  >
  > "Every <p> is mapped to a WebVTT cue."
  >
  > Is not something I wrote myself like it might have seemed, but a 
direct quote from the TTML WebVTT mapping spec 
(https://w3c.github.io/ttml-webvtt-mapping/ 
<https://w3c.github.io/ttml-webvtt-mapping/>).
  >
  > As you say this quote might not represent the right strategy:
  >
  > - Does that mean you're of the opinion that the specification should 
be changed in some way?
  > - Or are you saying I'm somehow missing something relevant in the 
specification (which might very well be these case!)?
  >
  > As for the second option, and the strategy you propose, using ISDs: 
there seems to be no reference to ISDs in the document, so I'm not sure 
how a person reading this specification should know that a conversion to 
ISDs needs to take place first, or that ISDs play some other role in the 
conversion process. Although working with ISDs might very well be the 
better approach of course, as you indicate.
  >
  > Please let me know your thoughts.
  >
  > Regards,
  > Boy
  >
  > On 4 Nov 2021 at 01:23:48, Pierre-Anthony Lemieux <pal@sandflow.com 
<mailto:pal@sandflow.com>> wrote:
  >>
  >> Hi Boy,
  >>
  >> Every <p> is mapped to a WebVTT cue.
  >>
  >>
  >> I am not convinced that this is the right strategy.
  >>
  >> I would instead map each intermediate synchronic document (ISD) to a
  >> WebVTT cue, so that no two cues have overlapping temporal ranges.
  >>
  >> This is the approach taken by https://github.com/sandflow/ttconv 
<https://github.com/sandflow/ttconv>.
  >>
  >> Best,
  >>
  >> -- Pierre
  >>
  >>
  >>
  >> On Wed, Nov 3, 2021 at 4:16 PM Boy van Dijk 
<boy@unified-streaming.com <mailto:boy@unified-streaming.com>> wrote:
  >>
  >>
  >> Hi,
  >>
  >>
  >> I represent Unified Streaming and I'm seeking your expertise about 
mapping to WebVTT the following bit of TTML:
  >>
  >>
  >> <div begin="00:00:46.320" end="00:00:48.360">
  >>
  >>     <p style="singleHeightStyle" tts:textAlign="center" 
region="region-20">
  >>
  >>         <span 
xml:space="preserve">&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;</span>
  >>
  >>         <span xml:space="preserve" 
tts:backgroundColor="black">&#xA0;You&#xA0;guys&#xA0;wanna&#xA0;story?&#xA0;</span>
  >>
  >>         <span 
xml:space="preserve">&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;</span>
  >>
  >>     </p>
  >>
  >>     <p style="singleHeightStyle" tts:display="inlineBlock" 
xml:space="preserve" region="region-20">&#xA0;</p>
  >>
  >>     <p style="singleHeightStyle" tts:textAlign="center" 
region="region-20">
  >>
  >>         <span 
xml:space="preserve">&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;</span>
  >>
  >>         <span xml:space="preserve" 
tts:backgroundColor="black">&#xA0;(MEN&#xA0;CHEERING)&#xA0;</span>
  >>
  >>         <span 
xml:space="preserve">&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;</span>
  >>
  >>     </p>
  >>
  >> </div>
  >>
  >>
  >>
  >> Or, my problem in simplified form:
  >>
  >>
  >> <p begin="00:00:46.320" end="00:00:48.360">
  >>
  >> You guys wanna story?
  >>
  >> </p>
  >>
  >> <p begin="00:00:46.320" end="00:00:48.360">
  >>
  >> (MEN CHEERING)
  >>
  >> </p>
  >>
  >>
  >>
  >> From what I understand of the spec, what needs to happen is pretty 
easy, because:
  >>
  >>
  >> Every <p> is mapped to a WebVTT cue.
  >>
  >>
  >>
  >> Which would result in:
  >>
  >>
  >> 00:00:46.320 --> 00:00:48.360
  >>
  >> You guys wanna story?
  >>
  >>
  >> 00:00:46.320 --> 00:00:48.360
  >>
  >> (MEN CHEERING)
  >>
  >>
  >>
  >> Considering that these two cues have the exact same start and end 
time, does their sequence carry meaning? I'm not sure, but I believe 
this result is far from ideal, especially after applying the very last 
step listed in the steps to convert TVTT to WebVTT:
  >>
  >>
  >> The last step is to sort these cues from earliest to latest time, 
based on each cue's beginning timestamp.
  >>
  >>
  >>
  >> The result of which will be that either one of the cues listed is 
first with the other listed second. Randomness, it seems.
  >>
  >>
  >> Okay, long email you might think but does this actually have any 
practical implications? Yes! If you play this back in a recent native 
HLS player on an Apple device you get completely unusable results (not 
all cues are presented and the one that are, aren't necessarily 
presented at the right time either).
  >>
  >>
  >> So, I believe my questions are the following:
  >>
  >>
  >> Is my understanding of the mapping spec as I presented it above 
correct?
  >>
  >> If my understanding is correct, does my example simply represent an 
edge case that isn't properly covered by the spec, or is only Apple's 
WebVTT parser to blame here?
  >>
  >> If this is an edge case not covered by the spec, what would be the 
way forward?
  >>
  >>
  >>
  >> Happy to hear your input and thanks you for your thoughts.
  >>
  >>
  >> For those interested, I created a simple Tears of Steel-based test 
stream without audio: 
https://origin.unified-streaming.com/public/tkt32756/main.m3u8 
<https://origin.unified-streaming.com/public/tkt32756/main.m3u8>
  >>
  >>
  >> It contains the following WebVTT:
  >>
  >>
  >> WEBVTT
  >>
  >>
  >> 00:00:05.000 --> 00:00:05.000
  >>
  >> You guys wanna story?
  >>
  >>
  >> 00:00:05.000 --> 00:00:05.000
  >>
  >> (MEN CHEERING)
  >>
  >>
  >> 00:00:05.500 --> 00:00:10.500
  >>
  >> You guys wanna story?
  >>
  >>
  >> 00:00:05.500 --> 00:00:10.500
  >>
  >> (MEN CHEERING)
  >>
  >>
  >> 00:00:11.500 --> 00:00:15.500
  >>
  >> You guys wanna story?
  >>
  >>
  >> 00:00:11.500 --> 00:00:15.500
  >>
  >> (MEN CHEERING)
  >>
  >>
  >>
  >> Regards,
  >>
  >> Boy
Received on Monday, 15 November 2021 10:48:23 UTC