RE: WebVTT from Sean Hayes on 2013-06-12 (public-tt@w3.org from June 2013)

From: Sean Hayes <Sean.Hayes@microsoft.com>
Date: Wed, 12 Jun 2013 16:58:59 +0000
To: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
CC: John Birch <John.Birch@screensystems.tv>, "public-tt@w3.org" <public-tt@w3.org>
Message-ID: <E9A92BD0A4FC934EB7935470A46D15241F695B00@DB3EX14MBXC324.europe.corp.microsoft.c>


> That's all there is - the mailing list and the bugs. Just add your name to all the CG bugs (all you need to do is click the "Add me to CC list" toggle button at the top right and hit "Save Changes" when you are on the page of a bug). There's not that many.

Yes I know how the bugs list works. I guess my point is that some of those discussions haven't been updated in months, I see a couple of emails a week at most on the CG. This spec was claimed as 'substantially done', however I think that is evidently something of an overstatement, and at this rate of progress you are going to make TTMLs 10 year gestation look positively lightning.

>> My concern is that one cannot pick an arbitrary rectangle on the screen for example 0vw 80vh to 60vw 94vh with centered text using the controls in the current spec text.

>The theory is that because you have a centered text (i.e. align:middle), you need to specify the middle of your cue as the cue position (i.e. line:60%, position:47%). Then you specify the cue's width appropriately (i.e. size:94%). That should do it.

Well I want the middle of my cue to be at 30vw. So not sure where 47% comes in but all right, I guess I could specify that position:30% width:60% and top 80%. If remove the repositioning logic, that might work, it's at least predictable  I don't see any mechanism for vertical centering of the text, so top will always be line, yes?

>> Essentially In order to be able to translate TTML into this you need be able to directly set the CSS values for top, left, width and height directly on a cue, and not have it subsequently altered (it may of course be clipped by the video viewport)

>Subsequent altering will only happen if the text doesn't fit into this box or overlaps other cues.

That's not how it works today. But maybe in your fixed version.


> >Yes as I explain immediately below, the text is too long because its 50 characters of 3vh, which adds up to 150vh.
> >If the width of my text is 150% of the height of the video and the width of my video is 133% of its height (in a 4x3 aspect ratio), then as 150 > 133 clearly it needs to be broken into more than one line.
>> Now your implementation may be using some very narrow characters, or wider video, in which case you might not have seen this; but 3vh for a 5vh font seems to be typical.

>OK, I still don't follow. But if you have discovered a bug, please register it.

It seems like it’s a bug in whatever you are using to render the spec, rather than the spec itself. Although I have noted my distilled list of issues below


>Yes, external CSS is applied after the WebVTT cue rendering algorithm has executed and the basic CSS parameters been set. However, during the rendering algorithm, the width of the video is taken into account and lines are broken and create new CSS boxes that become part of what is being rendered. That's what I was referring to.

Sure. But those boxes fit within the containing block right? so the outer box is created appropriately for a 5vh font of unspecified advance. If CSS changes the font much from that it's going to over or underflow, but its not going to move the box.


>No, the rendering algorithm is executed again:
>"User agents that support the pseudo-element described below must dynamically update renderings accordingly. When either 'white-space'
>or one of the properties corresponding to the 'font' shorthand (including 'line-height') changes value, then the text track cue's text track cue display state must be emptied and the text track's rules for updating the text track rendering must be immediately rerun."
>
>The re-run is then using the new CSS settings for those properties and thus ends up creating different boxes.

Yes OK, but rules state that no style sheets are used, and there's no other text to indicate that the cascade properties are projected onto the WebVTT nodes, so re-running the algorithm does not does not pick up these values. Ff it was intended the new values to be utilised then the spec text needs to change.
Moreover changing the font will only really affect the height of the box. And once the repositioning rules that cause the problems above are removed, changing the height will not cause the box to move.

>> Well precisely. It shouldn't. It shouldn't resize it either. In my opinion the whole notion of the browser "fixing up" a layout to suit itself is misguided. However that is what the spec-as-written requires it to do.

>Only when it has no other choice.

OK, but it currently does so all the time, and really it should not be choosing at all.


>>>> The size constraint appears to happen at the wrong time, and is IMO actually unnecessary. Just define the video viewport to clip all cues, and let the author  be responsible for keeping their content visible. That is the CSS way of things.

>>The "size" defines the width of the cue. It's an important part of defining the box as you outlined above.
>Right, but it needs to be done after top and left are known, and not going to change. As it is now it is done before top and left are computed.
>
>BTW: you might be interested in
>https://www.w3.org/Bugs/Public/show_bug.cgi?id=20146


Yes I had spotted that. Very much related I agree.


>>>As I said: WebVTT tries really hard to keep all cues within the viewport, including avoiding clipping text. We may change that constraint, though, as a consequence of the bugs.
>>
>> OK well as I said I think the principle should be trust the author, and rely on clipping to the viewport where the author abuses that trust.

>Yes, that might be how we may fix it. I've got to look into that bug in more detail.

>> "Fixup" behavior may be fine for a user whipping up some captions for a 5m YouTube video, it's not really appropriate if VTT is intended as the delivery vehicle for the worlds caption corpus, which unless I am very much mistaken Ian has explicitly stated it was never intended to be.

>I don't remember such a statement. Anyway - WebVTT should be good for both use cases. With the new region spec, you may find you can achieve a bit more control, because cues that are painted into regions will not try to be adjusted, but simply overlap when not authored carefully: https://dvcs.w3.org/hg/text-tracks/raw-file/default/608toVTT/region.html


<p role="quote" actor="Hixie">WebVTT is definitely _not_ a format "in which all existing captioning from TV can be represented on the Web"</p> http://lists.w3.org/Archives/Public/public-texttracks/2011Nov/0034.html 

Of course he may have changed his mind since then, but seems like he stays pretty steadfast.

I will of course be looking at the region spec, in much more detail, I have already noted a few issues with it. However my understanding is that is an optional extension. Is it intended to a mandatory feature?
If the latter then I think having two different positioning schemes could cause even more problems, so I would like to see these unified.

>> If you want to preserve word structure then you need more nuanced wording here (or better still remove it altogether and rely on CSS).
>
>"even if doing so requires.." is careful wording. It says that only if necessary words will be split in the middle.

It will generally be  necessary to produce absolutely minimal delta, as delta is currently defined, of course you clearly don't mean absolute minimum, which is why as I say you need more nuanced wording.

> I don't assume anything, it's the logical conclusion of what the spec says. It explicitly says a) 'even if doing so requires splitting a word' and b) minimize delta.

>It clearly says "Text runs must be wrapped according to the CSS line-wrapping rules", but in addition the wrapping should minimize the delta and if necessary split words where there is no line breaking opportunity. I don't see how that could be interpreted differently.

It claims precedence over CSS, and CSS has no concept of delta. I think you need to unroll this a little more before it makes any sense.

>The experience from YouTube has shown that many captions/subtitles are provided by in a single long line. These lines usually end up having to be broken (because they are wider than the video or wider than the available caption width). And they usually end up being broken with a massive imbalance of words (only one or two words ending up in the second line, the rest in the first). That's the only problem that this approach is trying to solve. A professional captioner will naturally provide balanced captions that the browser does not have to reflow.

Right, they aren't being done by a professional, and so are unlikely to look good. Your algorithm may, once its properly specified, do a better job than nothing at all, or it may not. I'm not really concerned with that use case. What I am concerned about is where the author did know what they were doing and took some care over it.  Maybe all we need is a "trust me I'm a professional" flag somewhere to switch off all this hoopla.

> If and when CSS comes up with a solution, then let's by all means look at it, but I can 100% guarantee it won't be the solution you have here.

The CSS work is being influenced by the need of the WebVTT reflow algorithm, so I expect it to satisfy the needs.

That's great. My comment still stands though. I wouldn't call it an algorithm BTW until you break it down into steps I can convert into code; which I don't think we have yet.


>>>>Try adding an "align:start" and you will be fine.
>>> Yes, but I don’t want a left aligned caption, I want a centered aligned caption which stretches from 0vw to 50vw, i.e. centered around  25vw.
>>>
>>>  Let's say because I am modeling a two speaker dialog and I want each speaker to have their own half of the screen. There is plenty of room for that.  Can you tell me what values I should use to achieve it?
>>If you want it centered around 25vw, then you have to write "position:25%"  and not "position:0%".
>
> That doesn't work in the spec-as-written.
> It appears you intend to remove the text in section 15.2.1.10.14.else.2 is that correct?

>There is no section 15 .. looking ... - I assume you mean section 5.
>Yes, that part needs to be rewritten. 

Right, I meant 5.2.1 sorry not sure where the leading 1 came from.

>> But even after removing the reposition text I still cannot have a centered box that is left or right aligned in the video and is wider than 50% of the video width.

>You will be able to, once it's changed. And also, right now, the region spec allows you to do this.

OK, just so I'm clear do you intend the region spec to replace the current positioning and be a mandatory component of the spec?

>> No the parts of the spec I'm having issues with happen well before considering any overlap, and need to work properly before this document can go to rec.
>> What I am implementing and where that implementation might end up isn't relevant to this discussion.

>Yes, there are bugs to fix.
No disagreement there :)

>> I think it may only be partially captured in the bugs. But lets see when you have finished correcting them.
>Feel free to register any further bugs that you come across.

OK, I'll see what I can do. But since I think it basically needs a ground up rewrite of section 5.2 not sure it's going to go down too well.

>>>Agreed, that's indeed the intention of WebVTT. I'll go now and make 
>>>some spec changes. ;-)
>>
>> Interesting . I would have thought you'd need to resolve the issues with the community group first?

>What do you mean? Of course I have to fix bugs that have been registered. I can't just go and change the spec randomly.

It sounded as if you were just going to unilaterally makes some changes to the spec without getting input and consensus from the community, maybe that's not what you meant; it just sounded strange. 

>> Anyway it seems we may need to table this discussion until the text is a somewhat more mature.

>I think it's one particular section that you have the most trouble with. But feel free to wait until that bug is fixed.

Yes, currently I'm concentrating on understanding section 5.2 so that's what I'm reporting on right now. I'll get to the rest of the spec in due course.

boiling it down I think my issues are so far:
  Step 10.5 & 10.6 occurring first (or indeed at all)
  Step 10.7 before step 10.10 (as per your bug), although not sure 10.10 should happen at all.
  Step 10.8 x-position and y-position not just being position and line respectively (units here could be % or em)
  Step 10.10 unspecified UA specific value for margin repositioning content. Needs to be predictable.
  Step 10.12 Not clear that it is intended for CSS properties from the cascade to work here (as per your comment above), and if it is it would be better for top, left, width and height to work directly as the cue settings are not amenable to document style
  Step 10.12 Unclear algorithm for 'balanced line breaking', which should definitely be switchable if kept.
  Step 10.14 should not happen at all

  5.2.2 Specifying a sans-serif font of 5vh is not adequate if this is the only mechanism to control the box height. The font advance needs to be more predictable. Or, more preferably the size setting controls both the width and height directly, and the user gets to specify a font in terms of the box height/width.

5.3.3 past: and future: it's not clear to me how these are communicated to external script - if I get the cue as HTML can I get events for when these change the styles?

Cheers,
Sean
Received on Wednesday, 12 June 2013 16:59:55 UTC