Re: Removing cues with duplicate ids as a way to allow cue rewriting from Brendan Long on 2014-05-27 (public-html@w3.org from May 2014)

From: Brendan Long <self@brendanlong.com>
Date: Tue, 27 May 2014 10:01:51 -0500
To: Philip Jägenstedt <philipj@opera.com>
CC: public-html <public-html@w3.org>
Message-ID: <5384A8DF.3090705@brendanlong.com>
On 05/26/2014 04:18 AM, Philip Jägenstedt wrote:
> TextTrackCue.id is mutable, so that setter would have to throw (or 
> something) to guarantee that there are no duplicate ids. Elements have 
> non-unique ids and getElementById() can still be implemented 
> efficiently, so I disagree that a non-unique TextTrackCue.id is a 
> problem worth solving in its own right.
That makes sense. It probably isn't worth handling at the HTML level. 
Doing this in the WebVTT parser would work though.

> That being said, corrections in live broadcasts are a real thing in 
> CEA-608 and the teletext system used in (at least) Sweden. As you say, 
> the problem can be solved by delaying the stream by some arbitrary 
> amount, I guess 5 seconds would work in practice. Isn't there already 
> that amount of delay due to buffering, and if there isn't, would it be 
> a problem to introduce 5 seconds of buffer for those users who would 
> otherwise see captions being corrected in real time?
Delaying the stream would make CEA-608 and CEA-708 to WebVTT 
translations mostly work, but I don't see how it would help with 
corrections.

I'm not really a fan of using a delay. Most cues only last a short time, 
but it's possible to have arbitary-length cues, and no amount of delay 
will always help. It also seems like a time delay (even a short one) is 
much more significant than a tiny amount of extra bandwidth and 
processing to send out a replacement cue.

With buffering, don't video players usually let the video play until the 
buffer completely runs out? If we need data in the buffer to display 
captions at the current position, then we'd have to stop playback 
whenever the buffer is too small. For example, if we need a 5 second 
buffer for smooth playback, and we have a 5 second delay in our 
captions, then we'll need to have a 10 second buffer for smooth playback 
with captions. This is probably a bit simplistic, but I don't think the 
buffer helps as much as you're suggesting.

>
> Philip
>
>
> On Fri, May 23, 2014 at 1:18 AM, Brendan Long <self@brendanlong.com 
> <mailto:self@brendanlong.com>> wrote:
>
>     I've proposed this in the past, but I think I've narrowed it down
>     to specific changes, and I'm wondering if other people think this
>     would be a useful way to allow rewriting of cues.
>
>     *Why*
>
>     In streaming text tracks, we need a way to fix incorrect cues.
>     Some examples:
>
>       * In live TV, people type the captions in by hand
>         <http://en.wikipedia.org/wiki/Closed_captioning#Television_and_video>
>         shortly before you see them. If they make a mistake, we need a
>         way to fix it.
>       * CEA-608 and CEA-708 captions don't start with a convenient
>         startTime --> endTime block like WebVTT does. A caption ends
>         when we get a command that makes it stop displaying. If we
>         want to transcode to WebVTT in real-time, we have to either
>         wait until the caption is over to translate it (delaying the
>         stream by some arbitary time in the hope that it will be long
>         enough), or we need to start a caption immediately with a
>         guess of the end time and then rewrite it once we know the
>         correct end time (or rewrite it to extend the end time until
>         we find the correct one).
>
>     *How*
>
>     The solution I'm proposing is that if we see two cues with the
>     same id, the earlier cue will be removed.
>
>         some-id
>         00:00:00 --> 00:00:30
>         This is an xeample
>
>         some-id
>         00:00:00 --> 00:00:10
>         This is an example
>
>     In this example, the text "This is an example" will be displayed
>     for 10 seconds starting at time 0.
>
>     *Why This Solution*
>
>     This solution is nice because the syntax is simple and easy to
>     understand, and it's powerful enough to rewrite any cue in any way
>     you could possibly want, because the new cue completely replaces
>     the old one.
>
>     *Arguments against*
>
>     This isn't particularly efficient. If you just want to change the
>     time, you need to send the entire updated cue, instead of just the
>     change.
>
>     I don't think this is a big deal, because even the most heavily
>     edited subtitle file will be orders of magnitude smaller than the
>     accompanying video.
>
>     *Specifically..*
>
>     There are a couple ways of doing this in HTML:
>
>      1. Do this at the WebVTT-layer: If the WebVTT parser sees a cue
>         with the same id as an older cue, it explicitly removes the
>         older cue from the track and then adds the new cue.
>      2. Do this at the HTML layer: If any TextTrack gets a cue with
>         the same id as a cue it already has, it removes the old cue
>         before adding the new one.
>
>     I think doing this at the HTML layer makes sense, because:
>
>       * ids should be unique anyway. That's why they're ids.
>       * TextTrack.getCueById() doesn't make sense if ids aren't unique.
>       * The implementation is much easier if we can throw a hash table
>         in TextTrack and use it for detecting duplicate ids, and for
>         making getCueById() fast.
>       * Handling weird edge cases is simpler:
>           o If JavaScript adds a cue with the same id as an existing
>             cue, the existing cue is removed.
>           o If the UA adds a cue with the same id as a cue added by
>             JavaScript, the cue added by JavaScript is removed (in
>             this case, presumably whatever trigger caused JavaScript
>             to add that cue will be triggered again).
>
>     I think if we do this, it should *also* be added to the WebVTT
>     spec, so files that do this will render properly in non-HTML media
>     players.
>
>     *More specifically**...*
>
>     https://github.com/w3c/html/pull/20
>
>     *Conclusion*
>
>     Does this seem like a reasonable change to you?
>
>
>


--------------020302070307060107040703
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: 8bit

<html>
  <head>
    <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <br>
    <div class="moz-cite-prefix">On 05/26/2014 04:18 AM, Philip
      Jägenstedt wrote:<br>
    </div>
    <blockquote
cite="mid:CAMQvoCn1Z=Xt1j2nABo-1VKnYTvGA1UtkBJ8ZG4kC4gt3BrF_w@mail.gmail.com"
      type="cite">
      <div dir="ltr">TextTrackCue.id is mutable, so that setter would
        have to throw (or something) to guarantee that there are no
        duplicate ids. Elements have non-unique ids and getElementById()
        can still be implemented efficiently, so I disagree that a
        non-unique TextTrackCue.id is a problem worth solving in its own
        right.
      </div>
    </blockquote>
    That makes sense. It probably isn't worth handling at the HTML
    level. Doing this in the WebVTT parser would work though.<br>
    <br>
    <blockquote
cite="mid:CAMQvoCn1Z=Xt1j2nABo-1VKnYTvGA1UtkBJ8ZG4kC4gt3BrF_w@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div>That being said, corrections in live broadcasts are a real
          thing in CEA-608 and the teletext system used in (at least)
          Sweden. As you say, the problem can be solved by delaying the
          stream by some arbitrary amount, I guess 5 seconds would work
          in practice. Isn't there already that amount of delay due to
          buffering, and if there isn't, would it be a problem to
          introduce 5 seconds of buffer for those users who would
          otherwise see captions being corrected in real time?</div>
      </div>
    </blockquote>
    Delaying the stream would make CEA-608 and CEA-708 to WebVTT
    translations mostly work, but I don't see how it would help with
    corrections.<br>
    <br>
    I'm not really a fan of using a delay. Most cues only last a short
    time, but it's possible to have arbitary-length cues, and no amount
    of delay will always help. It also seems like a time delay (even a
    short one) is much more significant than a tiny amount of extra
    bandwidth and processing to send out a replacement cue.<br>
    <br>
    With buffering, don't video players usually let the video play until
    the buffer completely runs out? If we need data in the buffer to
    display captions at the current position, then we'd have to stop
    playback whenever the buffer is too small. For example, if we need a
    5 second buffer for smooth playback, and we have a 5 second delay in
    our captions, then we'll need to have a 10 second buffer for smooth
    playback with captions. This is probably a bit simplistic, but I
    don't think the buffer helps as much as you're suggesting.<br>
    <br>
    <blockquote
cite="mid:CAMQvoCn1Z=Xt1j2nABo-1VKnYTvGA1UtkBJ8ZG4kC4gt3BrF_w@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div><br>
        </div>
        <div>Philip</div>
      </div>
      <div class="gmail_extra"><br>
        <br>
        <div class="gmail_quote">On Fri, May 23, 2014 at 1:18 AM,
          Brendan Long <span dir="ltr">&lt;<a moz-do-not-send="true"
              href="mailto:self@brendanlong.com" target="_blank">self@brendanlong.com</a>&gt;</span>
          wrote:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0
            .8ex;border-left:1px #ccc solid;padding-left:1ex">
            <div text="#000000" bgcolor="#FFFFFF"> I've proposed this in
              the past, but I think I've narrowed it down to specific
              changes, and I'm wondering if other people think this
              would be a useful way to allow rewriting of cues.<br>
              <br>
              <b>Why</b><br>
              <br>
              In streaming text tracks, we need a way to fix incorrect
              cues. Some examples:<br>
              <ul>
                <li>In live TV, <a moz-do-not-send="true"
href="http://en.wikipedia.org/wiki/Closed_captioning#Television_and_video"
                    target="_blank">people type the captions in by hand</a>
                  shortly before you see them. If they make a mistake,
                  we need a way to fix it.</li>
                <li>CEA-608 and CEA-708 captions don't start with a
                  convenient startTime --&gt; endTime block like WebVTT
                  does. A caption ends when we get a command that makes
                  it stop displaying. If we want to transcode to WebVTT
                  in real-time, we have to either wait until the caption
                  is over to translate it (delaying the stream by some
                  arbitary time in the hope that it will be long
                  enough), or we need to start a caption immediately
                  with a guess of the end time and then rewrite it once
                  we know the correct end time (or rewrite it to extend
                  the end time until we find the correct one).</li>
              </ul>
              <p><b>How</b><br>
              </p>
              <p>The solution I'm proposing is that if we see two cues
                with the same id, the earlier cue will be removed.<br>
              </p>
              <blockquote>
                <p>some-id<br>
                  00:00:00 --&gt; 00:00:30<br>
                  This is an xeample<br>
                </p>
                <p>some-id<br>
                  00:00:00 --&gt; 00:00:10<br>
                  This is an example<br>
                </p>
              </blockquote>
              <p>In this example, the text "This is an example" will be
                displayed for 10 seconds starting at time 0.<br>
              </p>
              <p><b>Why This Solution</b><br>
              </p>
              <p>This solution is nice because the syntax is simple and
                easy to understand, and it's powerful enough to rewrite
                any cue in any way you could possibly want, because the
                new cue completely replaces the old one.<br>
              </p>
              <p><b>Arguments against</b><br>
              </p>
              <p>This isn't particularly efficient. If you just want to
                change the time, you need to send the entire updated
                cue, instead of just the change.<br>
              </p>
              <p>I don't think this is a big deal, because even the most
                heavily edited subtitle file will be orders of magnitude
                smaller than the accompanying video.<br>
              </p>
              <p><b>Specifically..</b><br>
              </p>
              <p>There are a couple ways of doing this in HTML:<br>
              </p>
              <ol>
                <li>Do this at the WebVTT-layer: If the WebVTT parser
                  sees a cue with the same id as an older cue, it
                  explicitly removes the older cue from the track and
                  then adds the new cue.</li>
                <li>Do this at the HTML layer: If any TextTrack gets a
                  cue with the same id as a cue it already has, it
                  removes the old cue before adding the new one.</li>
              </ol>
              <p>I think doing this at the HTML layer makes sense,
                because:<br>
              </p>
              <ul>
                <li>ids should be unique anyway. That's why they're ids.</li>
                <li>TextTrack.getCueById() doesn't make sense if ids
                  aren't unique.</li>
                <li>The implementation is much easier if we can throw a
                  hash table in TextTrack and use it for detecting
                  duplicate ids, and for making getCueById() fast.</li>
                <li>Handling weird edge cases is simpler:</li>
                <ul>
                  <li>If JavaScript adds a cue with the same id as an
                    existing cue, the existing cue is removed.</li>
                  <li>If the UA adds a cue with the same id as a cue
                    added by JavaScript, the cue added by JavaScript is
                    removed (in this case, presumably whatever trigger
                    caused JavaScript to add that cue will be triggered
                    again).</li>
                </ul>
              </ul>
              <p>I think if we do this, it should *also* be added to the
                WebVTT spec, so files that do this will render properly
                in non-HTML media players.<br>
              </p>
              <p><b>More specifically</b><b>...</b><br>
              </p>
              <p><a moz-do-not-send="true"
                  href="https://github.com/w3c/html/pull/20"
                  target="_blank">https://github.com/w3c/html/pull/20</a><br>
              </p>
              <p><b>Conclusion</b><br>
              </p>
              <p>Does this seem like a reasonable change to you?<br>
              </p>
              <p><br>
              </p>
            </div>
          </blockquote>
        </div>
        <br>
      </div>
    </blockquote>
    <br>
  </body>
</html>

--------------020302070307060107040703--
Received on Tuesday, 27 May 2014 15:02:29 UTC