RE: WebVTT from Sean Hayes on 2013-06-16 (public-tt@w3.org from June 2013)

From: Sean Hayes <Sean.Hayes@microsoft.com>
Date: Sun, 16 Jun 2013 12:03:18 +0000
To: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
CC: John Birch <John.Birch@screensystems.tv>, "public-tt@w3.org" <public-tt@w3.org>
Message-ID: <E9A92BD0A4FC934EB7935470A46D15241F6AA94D@DB3EX14MBXC323.europe.corp.microsoft.c>
>Oh right, you misunderstand how :past and :future are applied. They only work if you have provided time stamps in your markup ><00:00:00:00>. And time matches on according to the video, so video.current time provides the JS dev the current time. And the >timeupdate event provides regular hooks into that playback loop.

Actually I think I understand pretty well how these are applied. I think you maybe misunderstand what I'm trying to do here. I'm not really interested in WebVTT as a user might be. My purpose here is to determine exactly what WebVTT is capable of, which involves digging into all the corner cases, and to what extent a full fidelity translation is possible between the two formats, where they share a common model and where they diverge.

This is a point of divergence, TTML cues have no internal time structure, everything in one cue happens "in the now", any observable change in the on screen display happens from one cue to the next, and there is an event for that. In contrast WebVTT has effectively a wavefront that passes through it dividing a cue into three regions past, present and future over time, therefore observable changes can happen within a cue, and there are no events for those changes.

In order to convert from one to the other we have to ascertain whether this is significant, and it may not be. But in my opinion a caption format expresses its semantics through visual presentation over time, (well at least today it does - we have discussed in the past a higher level semantic encoding, but that's never really got anywhere). Thus time based semantic information is being conveyed within WebVTT cues which appears to be hidden; this could be a problem if I am using AT to follow the captions for example. Anyway the question becomes how do we translate this.

>Also, these CSS classes don't necessarily make text invisible - they may just change its color or font weight (think of karaoke).

I can see what they can do, and amongst the things they can do is make text visible/invisible. Part of the exercise here is to, as much as possible, predict what a user might do and come to rely on. One thing I have learned over the years is never to underestimate the creativity of users to completely blindside you with what they do with the tools you create.

For example, it would be perfectly legal, given the current text, for a user have just a single cue and rely only on CSS to convey a slideshow of images.
00:00:00.000 --> 00:10:00:00
<c.a>&nbsp;</c><00:00:01.000><c.b>&nbsp;</c><00:00:02.000><c.c>&nbsp;</c><00:00:03.000><c.d>&nbsp;</c>...

Where

::cue(c)::past {
            transform:scale(0,0);
}
::cue(c)::future {
            transform:scale(0,0);
}
::cue(c.a) {
            background-size:contain;
            background-image: url(A.jpg);
            transform:scale(60,60);
            transform-origin:0% 0%;
            background-repeat:no-repeat;
}
::cue(c.b) {
            background-size:contain;
            background-image: url(B.jpg);
            transform:scale(60,60);
            transform-origin:0% 0%;
            background-repeat:no-repeat;
}
::cue(c.c) {
            background-size:contain;
            background-image: url(C.jpg);
            transform:scale(60,60);
            transform-origin:0% 0%;
            background-repeat:no-repeat;

}

>Why would a JS dev need to be alerted of CSS changes? What's the use case?

Well I'm not sure I've ever been sold on the use case for JS having access into the captions at all actually, except maybe for metadata tracks, but if they do, one would assume you would want to provide full access to the semantics of the format; of course you don't have to, that's really up to you and the community that sees this as a valuable format.  One specific use case might be putting the visible text into an ARIA live region at the right time.

>From my perspective though, it means that if I am translating the above example, I cannot use tools exposed by a browser, I'm going to have to grub around in the internal structure of the displayed HTML and try and find the timestamps and set up my own handler on the video timeline events and then reverse out the CSS on the HTML fragments. Makes life just a bit harder is all.


From: Silvia Pfeiffer [mailto:silviapfeiffer1@gmail.com]
Sent: 15 June 2013 22:18
To: Sean Hayes
Cc: John Birch; public-tt@w3.org
Subject: RE: WebVTT


On 16 Jun 2013 00:31, "Sean Hayes" <Sean.Hayes@microsoft.com<mailto:Sean.Hayes@microsoft.com>> wrote:
>
>
> >> but all right, I guess I could specify that position:30% width:60% and top 80%. If remove the repositioning logic, that might work, it's at least predictable.
>
> > Right.
>
> OK. Your first port of call then should be to remove the text :
>
> 1.Set up x and y as follows:
> If the text track cue writing direction is horizontal, and direction is 'ltr'
> Let x be a percentage given by the text track cue text position, and let y be a percentage given by the text track cue computed line position.
> If the text track cue writing direction is horizontal, and direction is 'rtl'
> Let x be a percentage given by the text track cue text position subtracted from 100, and let y be a percentage given by the text track cue computed line position.
> If the text track cue writing direction is vertical growing left
> Let x be a percentage given by the text track cue computed line position subtracted from 100, and let y be a percentage given by the text track cue text position.
> If the text track cue writing direction is vertical growing right
> Let x be a percentage given by the text track cue computed line position, and let y be a percentage given by the text track cue text position.
>
> 2. Position the boxes in boxes such that the point x% along the width of the bounding box of the boxes in boxes is x% of the way across the width of the video's rendering area, and the point y% along the height of the bounding box of the boxes in boxes is y% of the way across the height of the video's rendering area, while maintaining the relative positions of the boxes in boxes to each other.
>
>
>
> >>  I don't see any mechanism for vertical centering of the text, so top will always be line, yes?
>
> >Yes.
>
> OK
>
>
> >>Yes, external CSS is applied after the WebVTT cue rendering algorithm has executed and the basic CSS parameters been set. However, during the rendering algorithm, the width of the video is taken into account and lines are broken and create new CSS boxes that become part of what is being rendered. That's what I was referring to.
> >
> > Sure. But those boxes fit within the containing block right?
>
> The initial containing block is viewport:
> http://www.w3.org/TR/CSS21/visudet.html#containing-block-details .
> Everything has to fit into that.
>
> Indeed, but also:
> 5.2.2 On the (root) list of WebVTT Node Objects, the 'position' property must be set to 'absolute',
> This is what actually constrains and positions the inline boxes for the inline boxes generated..
>
> >> so the outer box is created appropriately for a 5vh font of unspecified advance. If CSS changes the font much from that it's going to over or underflow, but its not going to move the box.
>
> >No, when the CSS changes, the cue is re-rendered and thus the box may end up somewhere else.
>
> It could if there were CSS properties that could be applied affected the top left or width, but there aren't. Can you explain to me how you would see this happening.

The first line box will indeed be in the same place, but for a multi-line cue the n next lines will end up further down because of the higher line height.

> >>No, the rendering algorithm is executed again:
> >>"User agents that support the pseudo-element described below must dynamically update renderings accordingly. When either 'white-space'
> >>or one of the properties corresponding to the 'font' shorthand (including 'line-height') changes value, then the text track cue's text track cue display state must be emptied and the text track's rules for updating the text track rendering must be immediately rerun."
> >>
> >>The re-run is then using the new CSS settings for those properties and thus ends up creating different boxes.
> >
> > Yes OK, but rules state that no style sheets are used, and there's no other text to indicate that the cascade properties are projected onto the WebVTT nodes, so re-running the algorithm does not does not pick up these values.
>
> > We have the ::cue pseudo selector through which CSS properties of individual cues can be changed.
> > When they are changed, they are the new properties of the cues and are applied in the re-run of the rendering algorithm.
>
> Yes, but no property that can be set by cue can change the fact that the root box is absolutely positioned, or influence its top, left, or width. You can change the font and lineheight, but these would only affect the height assuming they were applied before the height trait is calculated.
>
>
> > Ff it was intended the new values to be utilised then the spec text needs to change.
>
> You don't think:
> "User agents that support the pseudo-element described below must dynamically update renderings accordingly."
> is sufficient?
>
> No. because of this text : "No style sheets are associated with nodes. (The nodes are subsequently restyled using style sheets after their boxes are generated, as described below.)."
>
> So for purposes of 5.2.1 no style sheets are in use, whether run the first time or the 100th time, they are used *subsequently*, which means occurring or coming after, according to my dictionary. Therefore CSS style is not applied until after the boxes are created. Delete this text, or better still put 'apply the cascade' somewhere at the start of 5.2.1
>
> But even if you remove this text, as I say the only thing that will change is the height, since it is set to 'auto',
>
>
> >> Moreover changing the font will only really affect the height of the box. And once the repositioning rules that cause the problems above are removed, changing the height will not cause the box to move.
>
> >If the line height changes because of a font size, that influences the height of the line box. So on a re-rendering, the calculations will be different.
>
> Yes, but the position will not be different.
>
>
> >> I will of course be looking at the region spec, in much more detail, I have already noted a few issues with it. However my understanding is that is an optional extension. Is it intended to a mandatory feature?
>
> >Since we want all browsers to support it, it would be a mandatory feature for browsers. Of course, authors can choose if they want to make use of all features or not.
>
> OK
>
> >> If the latter then I think having two different positioning schemes could cause even more problems, so I would like to see these unified.
>
> >They don't conflict and the one without regions is easier to understand, while the one with regions is more like CEA708. I think WebVTT can deal with two different caption positioning approaches: one fixed and one scrolling.
>
> They don't conflict as such, since it's clear which takes precedence, however they are different, so this increases the cognitive load for authors. The tools without regions will be simpler as and when you take out the offending text above.

Indeed. But for professional caplets that want more control over cue rendering boxes or need roll up, regions are required. Anyone else can just ignore that feature.

> >> OK, just so I'm clear do you intend the region spec to replace the current positioning and be a mandatory component of the spec?
>
> >No, it adds to the current positioning.
>
> OK. If you wish. Seems poor design to me, but at least it's not ambiguous.
>
> >>> Anyway it seems we may need to table this discussion until the text is a somewhat more mature.
> >
> >>I think it's one particular section that you have the most trouble with. But feel free to wait until that bug is fixed.
> >
> > Yes, currently I'm concentrating on understanding section 5.2 so that's what I'm reporting on right now. I'll get to the rest of the spec in due course.
> >
> > boiling it down I think my issues are so far:
> >   Step 10.5 & 10.6 occurring first (or indeed at all)
> >   Step 10.7 before step 10.10 (as per your bug), although not sure 10.10 should happen at all.
> >   Step 10.8 x-position and y-position not just being position and line respectively (units here could be % or em)
> >   Step 10.10 unspecified UA specific value for margin repositioning content. Needs to be predictable.
> >   Step 10.12 Not clear that it is intended for CSS properties from the cascade to work here (as per your comment above), and if it is it would be better for top, left, width and height to work directly as the cue settings are not amenable to document style
> >   Step 10.12 Unclear algorithm for 'balanced line breaking', which should definitely be switchable if kept.
> >   Step 10.14 should not happen at all
>
>
> Ok, I'm going to have to work through these
>
>
> >   5.2.2 Specifying a sans-serif font of 5vh is not adequate if this is the only mechanism to control the box height. The font advance needs to be more predictable. Or, more preferably the size setting controls both the width and height directly, and the user gets to specify a font in terms of the box height/width.
>
> >Which box height are you asking for more control over? The box that the font is rendered into is the line box and its height is specified from the font. What other box height are you after?
>
> The box created for the (root) list of WebVTT Node Objects, where the 'position' property must be set to 'absolute'.

Right, that's what the region spec offers you.

> > 5.3.3 past: and future: it's not clear to me how these are communicated to external script - if I get the cue as HTML can I get events for when these change the styles?
>
> >I don't think I understand this question. :past and :future are CSS pseudo-classes that are provided on WebVTT Node Objects. They are, e.g. specified in a style sheet that is included into the HTML page that also has a <track> element with the WebVTT file in it. The browser will take care to change the CSS styles of the WebVTT Node Objects accordingly as time marches on - no extra work is necessary by the developer for this styling - there are no events to indicate that the CSS style change happened. Why would they be required?
>
> Because they affect what text is visible to the user.
> So if the JS developer gets the cue text, they always receive all the text in the cue. They cannot know where the time is internal to that cue, and therefore what part of that text is currently visible?

Oh right, you misunderstand how :past and :future are applied. They only work if you have provided time stamps in your markup <00:00:00:00>. And time matches on according to the video, so video.current time provides the JS dev the current time. And the timeupdate event provides regular hooks into that playback loop.

Also, these CSS classes don't necessarily make text invisible - they may just change its color or font weight (think of karaoke).

> Basically no  onenter event is raised for the internal times.

Why would a JS dev need to be alerted of CSS changes? What's the use case?

> BTW, it seems that in the 5.1 nightly nothing ever raises onenter, is that right?

Really? If so then that's a recently introduced bug.

Silvia.
Received on Sunday, 16 June 2013 12:04:27 UTC