- From: Nigel Megitt <nigel.megitt@bbc.co.uk>
- Date: Mon, 9 Dec 2013 14:54:49 +0000
- To: Silvia Pfeiffer <silviapfeiffer1@gmail.com>, Victor Cărbune <vcarbune@chromium.org>
- CC: Glenn Adams <glenn@chromium.org>, Silvia Pfeiffer <silviapf@chromium.org>, "public-texttracks@w3.org" <public-texttracks@w3.org>, "Timed Text Working Group" <public-tt@w3.org>
All, This could be a good start for populating mapping tables in the TTWG wiki as part of the process for creating the mapping deliverable in the new draft charter - format suggestions and editing effort welcome. The feature set will need to be thought through carefully though - do we use elements and attributes in TTML, TTML feature designators, or some other set for a starting point? Not a question we have to answer immediately but from a linguistic perspective it makes sense not to restrict the set to 'only those things that can be expressed in one language' to allow for concepts that are initially unique in the other language. Copying in TTWG as the outputs of this discussion need to end up there. Additional CIL. Kind regards, Nigel On 09/12/2013 03:16, "Silvia Pfeiffer" <silviapfeiffer1@gmail.com> wrote: >Hi Glenn, Victor, all, > >This took a bit of time to prepare a reply for, but I think it will >also give us a start at mapping TTML to WebVTT. So, see my analysis >inline below. > > >On Wed, Nov 27, 2013 at 7:37 AM, Victor Cărbune <vcarbune@chromium.org> >wrote: >> (bcc: blink-dev, cc: public-texttracks) >> >> Hi Glenn, >> >> I'm moving the discussion to public-texttracks@ because I think these >>are >> good points that should generally be debated and eventually extend >>WebVTT to >> support some of them, if needed by caption authors. >> >> Victor >> >> >> On Tue, Nov 26, 2013 at 3:39 PM, Glenn Adams <glenn@chromium.org> wrote: >>> >>> >>> >>> >>> On Tue, Nov 26, 2013 at 3:25 PM, Silvia Pfeiffer >>><silviapf@chromium.org> >>> wrote: >>>> >>>> >>>> Have they tried to convert from TTML to WebVTT for presentation in >>>> browsers? Since all major browsers now support WebVTT, it would the >>>> path of least pain. It would also help to find out which TTML features >>>> cannot be presented in WebVTT. You might find that to be a very small >>>> set. >>> >>> >>> I expect that greater than 50% of TTML features aren't translatable >>>into >>> WebVTT. > >Are those features actually used anywhere in the real world? Are they >features that CEA608 or CE708 supports? CEA608 and 708 are too narrow a set of subtitle/caption format standards to consider for global applicability. WSTeletext also needs to be in scope, and possibly others too. Features that support workflow and document lifecycle should also be considered - merely targeting the final deliverable document is too narrow: it has to be authored too. >Also, rather then redefining styling properties in WebVTT, we're >simply relying on CSS properties in WebVTT. >There are many CSS properties that are allowed to be applied on a >::cue or ::cue-region , see >http://dev.w3.org/html5/webvtt/#applying-css-properties-to-webvtt-node-obj >ects >and >http://dev.w3.org/html5/webvtt/#css-extensions > >Note also that we have an open bug to introduce inline CSS >functionality into WebVTT: >https://www.w3.org/Bugs/Public/show_bug.cgi?id=15023 >It's an extension because you can already apply CSS properties via the >Web page, but it's a feature that's on the roadmap. That's good - the ability for a processor to receive a single document with everything required is a good minimal constraint with real world applicability. >>>For example, TTML1 makes use of 24 style properties [1], all based >>> on CSS or SVG properties (in most cases identically defined). Of these >>>24, >>> the following 10 cannot be expressed in whole or part by WebVTT >>>content: > >To interprete these accurately, I can't just look at CSS or SVG, but >have to reference their special meaning in the TTML spec. To start >creating a mapping, I'll include a reference for each of the >properties and then explain how to do them in WebVTT. > > >>> backgroundColor >In TTML: http://www.w3.org/TR/ttaf1-dfxp/#style-attribute-backgroundColor >values: CSS color names Color values may also be specified. >In WebVTT: http://dev.w3.org/html5/webvtt/#css-extensions >allows setting of all CSS 'background' properties, including >background-color on: > >* cue content: > ::cue(selector) [selector addresses a tag in the cue text] > >* cues: > ::cue [for all cues] or via ::cue(#cue-id) [for individual cues] > >* cue groups: > ::cue-region [for all regions] or via ::cue-region(#region-id) [for >individual regions] > > >>> display >In TTML: http://www.w3.org/TR/ttaf1-dfxp/#style-attribute-display >values: auto/none > >In WebVTT: >If an author is trying author a cue that doesn't get displayed, >they're best off breaking the cue parsing, e.g. by replacing "-->" >with "->". I understand that one of the reasons to avoid XML is to be more forgiving of potential syntax errors. Might some processors 'forgive' this as an accidental error and assume that where -> is in the document --> was intended? >If they want to do this dynamically during display in the browser, >they can use always turn the TextTrack.mode to hidden or disabled >(http://www.w3.org/html/wg/drafts/html/master/single-page.html#texttrackmo >de) >during the duration of the cue. > > >>> displayAlign >In TTML: http://www.w3.org/TR/ttaf1-dfxp/#style-attribute-displayAlign >values: before/center/after > >I'm having a hard time understanding what that does. Is this about >vertical alignment of the text in the region? I'll assume that for >now. Yes, for left to right and right to left text. TTML supports top to bottom text too, via tts:writingMode. Regions containing top to bottom text have block progression horizontally, so the effect of displayAlign in that case is to change horizontal alignment. >In WebVTT: >This isn't currently possible. However, we discussed at FOMS to allow >authors to specify line and position settings within regions the same >as within the video viewport. So that would allow exact specification >of where the content within a region needs to be painted. > > >>> extent >In TTML: http://www.w3.org/TR/ttaf1-dfxp/#style-attribute-extent >values: audio or <length> <length> s/audio/auto >In WebVTT: >We specify the width and number of lines of a region in the region >definition, see >https://dvcs.w3.org/hg/text-tracks/raw-file/default/608toVTT/region.html#w >ebvtt-region-metadata-header-syntax >Lengths are not in pixels because they make little sense when the >video viewport or characters are your reference of authoring. In environments with known pixel dimensions it may make most sense to use pixels directly and avoid possible rounding errors from percentages. >Right now we have number of lines and percentages. >At FOMS we discussed to also use em (i.e. largest character width and >height) as a metric, which maps better to 708. Viewport percentages are in scope for TTML2 as well. The c metric permits arbitrary cell-based grids to be used too, which may be related to font size and thus have a defined relationship with lines. >>>origin >In TTML: http://www.w3.org/TR/ttaf1-dfxp/#style-attribute-origin >values: audio or <length> <length> s/audio/auto >In WebVTT: >We specify the placement of regions via anchor settings, see >https://dvcs.w3.org/hg/text-tracks/raw-file/default/608toVTT/region.html#w >ebvtt-region-viewport-anchor >and >https://dvcs.w3.org/hg/text-tracks/raw-file/default/608toVTT/region.html#w >ebvtt-region-anchor >. > >This allows placing a region anywhere on screen which explicitly >specifies how it grows around the anchor point with increasing font >size, which maps exactly onto how 708 does its placement. I don't >think that capability is available in TTML - could you clarify? TTML regions are boxes with an origin and extent, i.e. top left corner and size. Alignment of text within any region is based on settings i.e. displayAlign and textAlign. Regions don't grow in size based on content alone. The placement and dimensions of text inside the region box defines e.g. any background color that's applied, depending on which element (region, body, div, p or span) the styling applies to. To make child content elements of a region 'anchor' and grow from a specific place the author would manipulate the set of origin, extent, displayAlign, textAlign and writingMode to achieve the desired effect. >>>overflow >In TTML: http://www.w3.org/TR/ttaf1-dfxp/#style-attribute-overflow >values: visible / hidden > >In WebVTT: >We specifically try to never obscure any content when rendering cues. >The only exception is rendering of cues inside regions, which have a >limited number of lines that they render. There, when a cue runs >outside the region, it becomes hidden. This is the intended result for >scrolling captions. Since font metrics and font presentation are not identical across all implementations (e.g. text may be rendered with different width even for an identical font selected to match the required height) this allows authors to define required implementation behaviour in case the region is not large enough. New lines may be considered by some authors to be a worse solution than a small amount of overflow. See also wrapOption. >>>padding >In TTML: http://www.w3.org/TR/ttaf1-dfxp/#style-attribute-padding >values: <length> <length?> <length?> <length?> (1-4 values) > >In WebVTT: >We specify a safe rendering area according to 708 which provides >padding on the video viewport, defaulting to 1.5% of width and height >on all sides. Otherwise, "padding" is indeed not yet listed as one of >the properties that can be set for cues or regions in >http://dev.w3.org/html5/webvtt/#css-extensions . This could be >something to consider adding. It would be simple to add, too, since we >just refer to CSS for such properties. Padding in TTML is not AFAIK intended to replace safe viewport rendering areas - the author would instead simply create a region somewhat smaller than the viewport (root container extent). >>>showBackground >In TTML: http://www.w3.org/TR/ttaf1-dfxp/#style-attribute-showBackground >values: always | whenActive >IIUC this is intended to have cues show up even when no content is >rendered. > >In WebVTT: >We don't usually render cues that have no content. >We can, however, define a region with a given dimension and render an >n-line cue into it just with characters - that would have the >same effect. >What is the use case for rendering regions/cues without content? Other use cases may have been discussed in the past. One is to provide a constant background area on the screen on to which subtitles will appear, to prevent the background apparently flashing on and off. Another that I've been told of is to deliberately obscure a known part of the video with a block color for content-related reasons. >>>wrapOption >In TTML: http://www.w3.org/TR/ttaf1-dfxp/#style-attribute-wrapOption >values are: wrap | noWrap >IIUC this specifies automatic word wrapping. > >In WebVTT: >Newline characters in cues as authored are rendered as new lines in >WebVTT, since WebVTT is a line-based format and not XML-based. It wouldn't be ideal for the file format to impose content restrictions. Clearly it remains possible to use long lines in the WebVTT document to define text without pre-determined line breaks. TTML offers pre-defined line breaks via <p> or <br>. >If text lines get too long within their containing blocks, cues wrap >according to CSS rules at the edge of their containing blocks. When >word-wrap occurs, WebVTT will try to balance multiple lines so as to >provide the best possible user experience. Automatic balancing of text broken across multiple lines is not a feature of TTML. >WebVTT tries hard not to hide any text, so no-wrapping and hiding the >overflow is avoided. What is the use cases for non-wrapping? See overflow above. Two use cases off the top of my head: 1) If you know that overflow is okay and you've got no space for new lines you'd set wrapOption to noWrap. 2) If you deliberately want to change the extent of a region without causing differently flowed text (using the set element) you'd use overflow="hidden" and wrapOption="noWrap" and for example reveal or hide the text in steps. >>>zIndex >In TTML: http://www.w3.org/TR/ttaf1-dfxp/#style-attribute-zIndex >values are auto | <integer> >This resolves what is rendered on top of what when there is overlap. > >In WebVTT: >WebVTT tries very hard to avoid overlap. There is even an algorithm to >move cues into spare screen real estate when (due to font increase or >cue clash over multiple simultaneously active tracks or poor authoring >for smaller video viewports) two cues overlap. > >The repositioning of cues only happens to simple cues. Regions are >explicitly placed and it's possible for them to overlap. There was >originally a proposal to introduce a "layer" setting on regions (see >http://www.w3.org/community/texttracks/wiki/MultiCueBox#Layering_of_cues) >and this may still eventuate. In general I prefer deterministic and predictable behaviour so it would be good to use a zIndex-like parameter to achieve that. >>>The following can be expressed, but not in a WebVTT file, only in a CSS >>> stylesheet associated with the page in which the WebVTT HTML/CSS >>> presentation will be rendered: >>> >>> color >>> fontFamily >>> fontSize >>> fontStyle >>> fontWeight >>> lineHeight >>> opacity >>> textDecoration >>> textOutline >>> visibility > >Right. As discussed above, that was a design decision for WebVTT. We >have a proposal for in-line styling, too. > > >>> Support for the following TTML (CSS) properties require mutating the >>>text >>> to insert or modify explicit bidi control codes: >>> >>> direction >In TTML: http://www.w3.org/TR/ttaf1-dfxp/#style-attribute-direction >values: ltr | rtl > >In WebVTT: >We rely on the text being authored UTF-8 compliant for its language. >rtl text starts with a rtl mark when authored as rtl. This allows us >to fully support bi-directional text containing mixed left-to-right >scripts. > > >>> unicodeBidi >In TTML: http://www.w3.org/TR/ttaf1-dfxp/#style-attribute-unicodeBidi >values: normal | embed | bidiOverride > >In WebVTT: >Rather than having to author this as a style attribute on a span or p >element, we simply rely on the bidi algorithm and the correct use of >bidi UTF-8 characters in the text. If you have to deal with a text >that is not appropriately authored, you have to insert something to >fix this. In the case of TTML it's <p> or <span> elements with styling >attributes, in the case of WebVTT it's UTF-8 characters. Both "mutate >the text". > > >>> So nearly half (ten) of the style properties do not translate at all or >>> only in part, and ten other style properties require use of separate >>>style >>> sheets that have to be delivered independently from the related WebVTT >>>file. > >You might like to adjust your counting after the information provided >above. > > >>> Overally, TTML1 defines 114 features [2], 69 of which are related to >>>the >>> above 24 style properties. > >How many of these 114 features are actually in use in the wild? WebVTT >has a strong drive to only support features that are motivated by a >use case. If any of these features are necessary, WebVTT can be >extended to support them. Some of them (like the 'padding' above) >would be fixed simply by adding the feature to the list of supported >CSS properties, which takes less than 5min to fix. It would be best >for us to find this out before we freeze the spec. Any input on use >cases would be welcome. Considering the amount of scrutiny and effort that's been put into developing TTML over many years it's safest to assume that all of the features have a use case/requirement within the overall scope of requirements rather than starting from the opposite perspective. Different subsets of these features may be needed for different parts of the entire author->audience workflow. >>>I fully expect that more than half of these >>> features are not encodable or translatable to WebVTT, or if they are, >>>then >>> have the added disadvantage of having to maintain a separate CSS style >>>sheet >>> containing rules that apply to specific VTT files. > >Why is that a disadvantage? Separating the styling from the content >has been a driving design principle of the Web and has been part of >the cause for the success of HTML. I don't see how that would be a >disadvantage. NB the style and content are separated in both formats. However in WebVTT the styling must (currently) be provided externally whereas in TTML it must (currently) be provided internally to the document. There is a use case for a single resource containing all the information needed to render, e.g. when the document is not provided over a two-way connection or a processor does not support resource pointer dereferencing. In the general case it may be a good idea to allow user customisation, however in practice many users of subtitles/captions displayed over 'television' expect it simply to work, and to honour the author's careful effort in positioning, styling and colouring to create an acceptable compromise of readable text and visible video - user customisation is very likely to break that, and should be considered at the user's risk. I noted that the WebVTT equivalents you've described above that could be considered to mix styling with content are: . Breaking cue parsing to remove cues from visibility. . Using   characters to render cues with 'no content'. > >Best Regards, >Silvia. > > >>> [1] >>> >>>http://www.w3.org/TR/2013/REC-ttml1-20130924/#styling-attribute-vocabula >>>ry >>> [2] http://www.w3.org/TR/2013/REC-ttml1-20130924/#feature-designations >>> >> ----------------------------- http://www.bbc.co.uk This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this. -----------------------------
Received on Monday, 9 December 2013 14:55:22 UTC