Re: Unifying the rendering approach from Victor Carbune on 2014-03-07 (public-texttracks@w3.org from March 2014)

From: Victor Carbune <victor.carbune@gmail.com>
Date: Fri, 7 Mar 2014 15:40:22 +0100
To: Loretta Guarino Reid <lorettaguarino@google.com>
Cc: Silvia Pfeiffer <silviapfeiffer1@gmail.com>, "public-texttracks@w3.org" <public-texttracks@w3.org>, Philip Jägenstedt <philipj@opera.com>
Message-ID: <CA+nQPrmUrjsQNJ=XgKCFxt=kN1SiVxEYc1T=SLAqPh-0waOHpA@mail.gmail.com>
On Fri, Mar 7, 2014 at 3:36 PM, Loretta Guarino Reid
<lorettaguarino@google.com> wrote:
> YouTube-generated WebVTT always uses non-snap-to-lines cues. And we already
> struggle to deal with the different levels of WebVTT support in the
> browsers we need to work with. Removing non-snap-to-line cues outside of
> regions will make this situation much more difficult for us.

I was hoping that the final version of regions will cover all the
use-cases you are currently struggling with, because of the
non-snap-to-lines positioning algorithm.

> On Fri, Mar 7, 2014 at 6:16 AM, Victor Carbune <victor.carbune@gmail.com>
> wrote:
>>
>> On Wed, Mar 5, 2014 at 12:03 PM, Silvia Pfeiffer
>> <silviapfeiffer1@gmail.com> wrote:
>> >
>> > On 5 Mar 2014 18:29, "Victor Carbune" <victor.carbune@gmail.com> wrote:
>> >>
>> >> On Tue, Mar 4, 2014 at 10:46 PM, Silvia Pfeiffer
>> >> <silviapfeiffer1@gmail.com> wrote:
>> >> >
>> >> > On 4 Mar 2014 20:58, "Victor Carbune" <victor.carbune@gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> On Mon, Mar 3, 2014 at 11:55 PM, Silvia Pfeiffer
>> >> >> <silviapfeiffer1@gmail.com> wrote:
>> >> >> >
>> >> >> > On Tue, Mar 4, 2014 at 12:34 AM, Victor Carbune
>> >> >> > <victor.carbune@gmail.com> wrote:
>> >> >> > > On Mon, Mar 3, 2014 at 12:53 PM, Silvia Pfeiffer
>> >> >> > > <silviapfeiffer1@gmail.com> wrote:
>> >> >> > >> On Mon, Mar 3, 2014 at 10:41 PM, Victor Carbune
>> >> >> > >> <victor.carbune@gmail.com> wrote:
>> >> >> > >>> On Mon, Mar 3, 2014 at 12:11 PM, Silvia Pfeiffer
>> >> >> > >>> <silviapfeiffer1@gmail.com> wrote:
>> >> >> > >>>>
>> >> >> > >>>> Aha! I see. The first case is so as to keep the line counting
>> >> >> > >>>> correct
>> >> >> > >>>> for snap-to-lines cues, I assume? Couldn't we make these two
>> >> >> > >>>> cases
>> >> >> > >>>> into a single case if the line positioning both for
>> >> >> > >>>> snap-to-lines
>> >> >> > >>>> and
>> >> >> > >>>> for non-snap-to-lines is done on the anonymous region that
>> >> >> > >>>> wraps
>> >> >> > >>>> each
>> >> >> > >>>> cue? What's the advantage of splitting these two cases?
>> >> >> > >>>
>> >> >> > >>> If we throw non-snap-to-lines cues within regions it means
>> >> >> > >>> that
>> >> >> > >>> we
>> >> >> > >>> need to support a rendering case for these cues within
>> >> >> > >>> regions,
>> >> >> > >>> and
>> >> >> > >>> also support named regions on them.
>> >> >> > >>
>> >> >> > >> I don't think so, since it will be the region that is placed,
>> >> >> > >> not
>> >> >> > >> the
>> >> >> > >> cue. So, the cue inside the region is still placed
>> >> >> > >> "snap-to-line"
>> >> >> > >> even
>> >> >> > >> if the line is basically just a single line (minus line
>> >> >> > >> wrapping
>> >> >> > >> and
>> >> >> > >> newlines).
>> >> >> > >
>> >> >> > > Well, it's one thing to deal with snap-to-lines, where you only
>> >> >> > > move
>> >> >> > > one line on top of the other until they don't overlap, and
>> >> >> > > another
>> >> >> > > one
>> >> >> > > is to deal with overlap between a percentage-positioned cues
>> >> >> > > together
>> >> >> > > with line-positioned cues; moving lines is simple and
>> >> >> > > straightforward.
>> >> >> >
>> >> >> > Correct. I don't see how this is relevant though. If we give all
>> >> >> > non-region cues their own anonymous region box, then we never have
>> >> >> > to
>> >> >> > worry about cue overlap inside regions. All we have to worry about
>> >> >> > is
>> >> >> > region overlap.
>> >> >> >
>> >> >> > Was your intent to separate overlap avoidance for the percentage
>> >> >> > positioned non-region cues from overlap avoidance of the regions?
>> >> >> > That
>> >> >> > would potentially cause overlap between non-region
>> >> >> > non-snap-to-line
>> >> >> > cues and snap-to-line cues (in regions), right? Are you suggesting
>> >> >> > not
>> >> >> > to deal with that? Would we even do overlap avoidance for regions?
>> >> >>
>> >> >> I want to avoid solving overlap avoidance between non-snap-to-lines
>> >> >> and snap-to-lines cues by:
>> >> >> *) ensuring they never end up in the same region (thus, I don't see
>> >> >> a
>> >> >> need to support non-snap-to-lines cues with author-specified
>> >> >> regions,
>> >> >> there's no use-case for this situation)
>> >> >> *) deferring the overlap avoidance mechanism to regions.
>> >> >
>> >> > Agree. That's why I wouldn't want all non-snap-to-lines cues end up
>> >> > in a
>> >> > single full-viewport-sized region.
>> >> >
>> >> >> > >>> *) No need to think what happens if some percentage-positioned
>> >> >> > >>> cue
>> >> >> > >>> overlaps a line-positioned cue (see "underspecced overlapping
>> >> >> > >>> positioning" bug)
>> >> >> > >>
>> >> >> > >> We still have to deal with overlapping cues, no matter whether
>> >> >> > >> they
>> >> >> > >> are in snap-to-lines regions or in non-snap-to-lines regions.
>> >> >> > >
>> >> >> > > This would move to dealing with overlapping regions - which we
>> >> >> > > decided
>> >> >> > > we don't want to support, right? Or at least differ it to a
>> >> >> > > higher
>> >> >> > > level mechanism that would deal with all the caption boxes from
>> >> >> > > any
>> >> >> > > format ending up on the screen.
>> >> >> >
>> >> >> > Hmm... I thought we didn't want to deal with overlap for
>> >> >> > region-cues.
>> >> >> > But you're now also saying we don't want to deal with overlap for
>> >> >> > non-region snap-to-line cues. I don't think that was the
>> >> >> > intention.
>> >> >>
>> >> >> We need unification: imagine, exaggerating here, having
>> >> >> {snap-to-lines, non-snap-to-lines} x {region, non-region} type of
>> >> >> cues.
>> >> >>
>> >> >> One solution is for all cues to end up in regions, anonymous or
>> >> >> author-specified, for rendering purposes.
>> >> >
>> >> > Yes, that's the best approach IMO.
>> >> >
>> >> >> > I can imagine a single overlap avoidance algorithm that works on
>> >> >> > lines
>> >> >> > only where for percentage-positioned cues a line is deemed
>> >> >> > occupied
>> >> >> > if
>> >> >> > a part of a cue is in it.
>> >> >> >
>> >> >> >
>> >> >> > >>> *) Better abstraction: author can already obtain exactly the
>> >> >> > >>> same
>> >> >> > >>> positioning using regions that they can with
>> >> >> > >>> percentage-positioned
>> >> >> > >>> cues. Why integrate two different elements solving the same
>> >> >> > >>> problem
>> >> >> > >>> together, if we can keep only one?
>> >> >> > >>
>> >> >> > >> Because it avoids another big case statement in the rendering
>> >> >> > >> algorithm. This way we have all three cases in one branch
>> >> >> > >> rather
>> >> >> > >> than
>> >> >> > >> 2 different branches. Also, this is just about the rendering,
>> >> >> > >> since
>> >> >> > >> we're still keeping the two different ways of specifying
>> >> >> > >> positioning
>> >> >> > >> (cues with line cue setting and cues inside regions).
>> >> >> > >
>> >> >> > > Wouldn't this simply be something like: if
>> >> >> > > non-snap-to-lines=true
>> >> >> > > create on the fly an anonymous region, render the text in it
>> >> >> > > according
>> >> >> > > to the rules in "paragraph where layout in a region is done" and
>> >> >> > > then
>> >> >> > > resize the anonymous to perfectly match the cue and set the
>> >> >> > > region
>> >> >> > > positioning parameters accordingly?
>> >> >> >
>> >> >> > Hold on. Earlier you said that all non-snap-to-lines cue will be
>> >> >> > rendered in a single anonymous region that covers the full
>> >> >> > viewport.
>> >> >> > What you are instead describing here is the rendering approach for
>> >> >> > snap-to-lines-cues.
>> >> >>
>> >> >> This was for snap-to-lines cues with no author-specified region.
>> >> >
>> >> > Oh! But then you can't do overlap avoidance with these cues either.
>> >>
>> >> Well if all the snap-to-lines cues without an author-specified region
>> >> go into the same anonymous region of the size of the video, then you
>> >> are just using the cue snap-to-lines positioning algorithm to do
>> >> overlapping.
>> >
>> > I didn't mean for cues to share a region unless they were authored with
>> > a
>> > region.
>> >
>> >> > I'd rather they go into  individual regions, too, and are all dealt
>> >> > with
>> >> > by
>> >> > a single overlap avoidance approach that works on regions.
>> >>
>> >> Then how do you honor line positioning for a cue that has no region,
>> >> and has line:3 attribute? You will have to make position the region in
>> >> line 3 of the video viewport, rather than the text lines of the cue in
>> >> a region.
>> >
>> > Correct. That's what I thought we are doing with all cues now.
>>
>> The cleanest way to me looks like having regions always absolutely
>> positioned within the video viewport and cues always snapped to line
>> within a region.
>>
>> On Fri, Mar 7, 2014 at 11:27 AM, Philip Jägenstedt <philipj@opera.com>
>> wrote:
>> > On Mon, Mar 3, 2014 at 6:41 PM, Victor Carbune
>> > <victor.carbune@gmail.com> wrote:
>> >
>> >> A more personal comment: I feel that non-snap-to-lines cues are hard
>> >> to use for authors that want to actually position things precisely on
>> >> top of the video, and I'm not aware of other use-cases for it, so I
>> >> would even go as far as removing them as soon as we support regions.
>> >> But since they are already here, we can easily keep them for
>> >> backwards-compatible purposes with wrapped anonymous regions
>> >> fulfilling the same positioning behavior.
>> >
>> > If we go down this route I think we should try just removing the old
>> > way. I could add use counters to Blink to see if it's still possible.
>> > Would counting VTTCues where snapToLines is false be enough, or would
>> > something about position/size/align also change?
>>
>> I'm certainly in favor of this, but I'm sure on this list there might
>> be other vtt-users that are able to tell how important
>> percentage-positioned cues in their current form are.
>>
>> Victor
>>
>
Received on Friday, 7 March 2014 14:41:10 UTC