Re: [blink-dev] WebVTT vs TTML Features

All,

This could be a good start for populating mapping tables in the TTWG wiki
as part of the process for creating the mapping deliverable in the new
draft charter - format suggestions and editing effort welcome. The feature
set will need to be thought through carefully though - do we use elements
and attributes in TTML, TTML feature designators, or some other set for a
starting point? Not a question we have to answer immediately but from a
linguistic perspective it makes sense not to restrict the set to 'only
those things that can be expressed in one language' to allow for concepts
that are initially unique in the other language.

Copying in TTWG as the outputs of this discussion need to end up there.

Additional CIL.

Kind regards,

Nigel

On 09/12/2013 03:16, "Silvia Pfeiffer" <silviapfeiffer1@gmail.com> wrote:

>Hi Glenn, Victor, all,
>
>This took a bit of time to prepare a reply for, but I think it will
>also give us a start at mapping TTML to WebVTT. So, see my analysis
>inline below.
>
>
>On Wed, Nov 27, 2013 at 7:37 AM, Victor Cărbune <vcarbune@chromium.org>
>wrote:
>> (bcc: blink-dev, cc: public-texttracks)
>>
>> Hi Glenn,
>>
>> I'm moving the discussion to public-texttracks@ because I think these
>>are
>> good points that should generally be debated and eventually extend
>>WebVTT to
>> support some of them, if needed by caption authors.
>>
>> Victor
>>
>>
>> On Tue, Nov 26, 2013 at 3:39 PM, Glenn Adams <glenn@chromium.org> wrote:
>>>
>>>
>>>
>>>
>>> On Tue, Nov 26, 2013 at 3:25 PM, Silvia Pfeiffer
>>><silviapf@chromium.org>
>>> wrote:
>>>>
>>>>
>>>> Have they tried to convert from TTML to WebVTT for presentation in
>>>> browsers? Since all major browsers now support WebVTT, it would the
>>>> path of least pain. It would also help to find out which TTML features
>>>> cannot be presented in WebVTT. You might find that to be a very small
>>>> set.
>>>
>>>
>>> I expect that greater than 50% of TTML features aren't translatable
>>>into
>>> WebVTT.
>
>Are those features actually used anywhere in the real world? Are they
>features that CEA608 or CE708 supports?

CEA608 and 708 are too narrow a set of subtitle/caption format standards
to consider for global applicability. WSTeletext also needs to be in
scope, and possibly others too. Features that support workflow and
document lifecycle should also be considered - merely targeting the final
deliverable document is too narrow: it has to be authored too.

>Also, rather then redefining styling properties in WebVTT, we're
>simply relying on CSS properties in WebVTT.
>There are many CSS properties that are allowed to be applied on a
>::cue or ::cue-region , see
>http://dev.w3.org/html5/webvtt/#applying-css-properties-to-webvtt-node-obj

>ects
>and
>http://dev.w3.org/html5/webvtt/#css-extensions

>
>Note also that we have an open bug to introduce inline CSS
>functionality into WebVTT:
>https://www.w3.org/Bugs/Public/show_bug.cgi?id=15023

>It's an extension because you can already apply CSS properties via the
>Web page, but it's a feature that's on the roadmap.

That's good - the ability for a processor to receive a single document
with everything required is a good minimal constraint with real world
applicability.

>>>For example, TTML1 makes use of 24 style properties [1], all based
>>> on CSS or SVG properties (in most cases identically defined). Of these
>>>24,
>>> the following 10 cannot be expressed in whole or part by WebVTT
>>>content:
>
>To interprete these accurately, I can't just look at CSS or SVG, but
>have to reference their special meaning in the TTML spec. To start
>creating a mapping, I'll include a reference for each of the
>properties and then explain how to do them in WebVTT.
>
>
>>> backgroundColor
>In TTML: http://www.w3.org/TR/ttaf1-dfxp/#style-attribute-backgroundColor

>values: CSS color names

Color values may also be specified.

>In WebVTT: http://dev.w3.org/html5/webvtt/#css-extensions

>allows setting of all CSS 'background' properties, including
>background-color on:
>
>* cue content:
>  ::cue(selector) [selector addresses a tag in the cue text]
>
>* cues:
>  ::cue [for all cues] or via ::cue(#cue-id) [for individual cues]
>
>* cue groups:
>  ::cue-region [for all regions] or via ::cue-region(#region-id) [for
>individual regions]
>
>
>>> display
>In TTML: http://www.w3.org/TR/ttaf1-dfxp/#style-attribute-display

>values: auto/none
>
>In WebVTT:
>If an author is trying author a cue that doesn't get displayed,
>they're best off breaking the cue parsing, e.g. by replacing "-->"
>with "->".

I understand that one of the reasons to avoid XML is to be more forgiving
of potential syntax errors. Might some processors 'forgive' this as an
accidental error and assume that where -> is in the document --> was
intended?

>If they want to do this dynamically during display in the browser,
>they can use always turn the TextTrack.mode to hidden or disabled
>(http://www.w3.org/html/wg/drafts/html/master/single-page.html#texttrackmo

>de)
>during the duration of the cue.
>
>
>>> displayAlign
>In TTML: http://www.w3.org/TR/ttaf1-dfxp/#style-attribute-displayAlign

>values: before/center/after
>
>I'm having a hard time understanding what that does. Is this about
>vertical alignment of the text in the region? I'll assume that for
>now.

Yes, for left to right and right to left text. TTML supports top to bottom
text too, via tts:writingMode. Regions containing top to bottom text have
block progression horizontally, so the effect of displayAlign in that case
is to change horizontal alignment.

>In WebVTT:
>This isn't currently possible. However, we discussed at FOMS to allow
>authors to specify line and position settings within regions the same
>as within the video viewport. So that would allow exact specification
>of where the content within a region needs to be painted.
>
>
>>> extent
>In TTML: http://www.w3.org/TR/ttaf1-dfxp/#style-attribute-extent

>values: audio or <length> <length>

s/audio/auto

>In WebVTT:
>We specify the width and number of lines of a region in the region
>definition, see
>https://dvcs.w3.org/hg/text-tracks/raw-file/default/608toVTT/region.html#w

>ebvtt-region-metadata-header-syntax
>Lengths are not in pixels because they make little sense when the
>video viewport or characters are your reference of authoring.

In environments with known pixel dimensions it may make most sense to use
pixels directly and avoid possible rounding errors from percentages.

>Right now we have number of lines and percentages.
>At FOMS we discussed to also use em (i.e. largest character width and
>height) as a metric, which maps better to 708.

Viewport percentages are in scope for TTML2 as well.
The c metric permits arbitrary cell-based grids to be used too, which may
be related to font size and thus have a defined relationship with lines.

>>>origin
>In TTML: http://www.w3.org/TR/ttaf1-dfxp/#style-attribute-origin

>values: audio or <length> <length>

s/audio/auto


>In WebVTT:
>We specify the placement of regions via anchor settings, see
>https://dvcs.w3.org/hg/text-tracks/raw-file/default/608toVTT/region.html#w

>ebvtt-region-viewport-anchor
>and
>https://dvcs.w3.org/hg/text-tracks/raw-file/default/608toVTT/region.html#w

>ebvtt-region-anchor
>.
>
>This allows placing a region anywhere on screen which explicitly
>specifies how it grows around the anchor point with increasing font
>size, which maps exactly onto how 708 does its placement. I don't
>think that capability is available in TTML - could you clarify?

TTML regions are boxes with an origin and extent, i.e. top left corner and
size. Alignment of text within any region is based on settings i.e.
displayAlign and textAlign. Regions don't grow in size based on content
alone. The placement and dimensions of text inside the region box defines
e.g. any background color that's applied, depending on which element
(region, body, div, p or span) the styling applies to. To make child
content elements of a region 'anchor' and grow from a specific place the
author would manipulate the set of origin, extent, displayAlign, textAlign
and writingMode to achieve the desired effect.

>>>overflow
>In TTML: http://www.w3.org/TR/ttaf1-dfxp/#style-attribute-overflow

>values: visible / hidden
>
>In WebVTT:
>We specifically try to never obscure any content when rendering cues.
>The only exception is rendering of cues inside regions, which have a
>limited number of lines that they render. There, when a cue runs
>outside the region, it becomes hidden. This is the intended result for
>scrolling captions.

Since font metrics and font presentation are not identical across all
implementations (e.g. text may be rendered with different width even for
an identical font selected to match the required height) this allows
authors to define required implementation behaviour in case the region is
not large enough. New lines may be considered by some authors to be a
worse solution than a small amount of overflow. See also wrapOption.

>>>padding
>In TTML: http://www.w3.org/TR/ttaf1-dfxp/#style-attribute-padding

>values: <length> <length?> <length?> <length?> (1-4 values)
>
>In WebVTT:
>We specify a safe rendering area according to 708 which provides
>padding on the video viewport, defaulting to 1.5% of width and height
>on all sides. Otherwise, "padding" is indeed not yet listed as one of
>the properties that can be set for cues or regions in
>http://dev.w3.org/html5/webvtt/#css-extensions . This could be
>something to consider adding. It would be simple to add, too, since we
>just refer to CSS for such properties.

Padding in TTML is not AFAIK intended to replace safe viewport rendering
areas - the author would instead simply create a region somewhat smaller
than the viewport (root container extent).

>>>showBackground
>In TTML: http://www.w3.org/TR/ttaf1-dfxp/#style-attribute-showBackground

>values: always | whenActive
>IIUC this is intended to have cues show up even when no content is
>rendered.
>
>In WebVTT:
>We don't usually render cues that have no content.
>We can, however, define a region with a given dimension and render an
>n-line cue into it just with &nbsp; characters - that would have the
>same effect.
>What is the use case for rendering regions/cues without content?

Other use cases may have been discussed in the past. One is to provide a
constant background area on the screen on to which subtitles will appear,
to prevent the background apparently flashing on and off. Another that
I've been told of is to deliberately obscure a known part of the video
with a block color for content-related reasons.

>>>wrapOption
>In TTML: http://www.w3.org/TR/ttaf1-dfxp/#style-attribute-wrapOption

>values are: wrap | noWrap
>IIUC this specifies automatic word wrapping.
>
>In WebVTT:
>Newline characters in cues as authored are rendered as new lines in
>WebVTT, since WebVTT is a line-based format and not XML-based.

It wouldn't be ideal for the file format to impose content restrictions.
Clearly it remains possible to use long lines in the WebVTT document to
define text without pre-determined line breaks. TTML offers pre-defined
line breaks via <p> or <br>.

>If text lines get too long within their containing blocks, cues wrap
>according to CSS rules at the edge of their containing blocks. When
>word-wrap occurs, WebVTT will try to balance multiple lines so as to
>provide the best possible user experience.

Automatic balancing of text broken across multiple lines is not a feature
of TTML.

>WebVTT tries hard not to hide any text, so no-wrapping and hiding the
>overflow is avoided. What is the use cases for non-wrapping?

See overflow above. Two use cases off the top of my head: 1) If you know
that overflow is okay and you've got no space for new lines you'd set
wrapOption to noWrap. 2) If you deliberately want to change the extent of
a region without causing differently flowed text (using the set element)
you'd use overflow="hidden" and wrapOption="noWrap" and for example reveal
or hide the text in steps.

>>>zIndex
>In TTML: http://www.w3.org/TR/ttaf1-dfxp/#style-attribute-zIndex

>values are auto | <integer>
>This resolves what is rendered on top of what when there is overlap.
>
>In WebVTT:
>WebVTT tries very hard to avoid overlap. There is even an algorithm to
>move cues into spare screen real estate when (due to font increase or
>cue clash over multiple simultaneously active tracks or poor authoring
>for smaller video viewports) two cues overlap.
>
>The repositioning of cues only happens to simple cues. Regions are
>explicitly placed and it's possible for them to overlap. There was
>originally a proposal to introduce a "layer" setting on regions (see
>http://www.w3.org/community/texttracks/wiki/MultiCueBox#Layering_of_cues)
>and this may still eventuate.

In general I prefer deterministic and predictable behaviour so it would be
good to use a zIndex-like parameter to achieve that.

>>>The following can be expressed, but not in a WebVTT file, only in a CSS
>>> stylesheet associated with the page in which the WebVTT HTML/CSS
>>> presentation will be rendered:
>>>
>>> color
>>> fontFamily
>>> fontSize
>>> fontStyle
>>> fontWeight
>>> lineHeight
>>> opacity
>>> textDecoration
>>> textOutline
>>> visibility
>
>Right. As discussed above, that was a design decision for WebVTT. We
>have a proposal for in-line styling, too.
>
>
>>> Support for the following TTML (CSS) properties require mutating the
>>>text
>>> to insert or modify explicit bidi control codes:
>>>
>>> direction
>In TTML: http://www.w3.org/TR/ttaf1-dfxp/#style-attribute-direction

>values: ltr | rtl
>
>In WebVTT:
>We rely on the text being authored UTF-8 compliant for its language.
>rtl text starts with a rtl mark when authored as rtl. This allows us
>to fully support bi-directional text containing mixed left-to-right
>scripts.
>
>
>>> unicodeBidi
>In TTML: http://www.w3.org/TR/ttaf1-dfxp/#style-attribute-unicodeBidi

>values: normal | embed | bidiOverride
>
>In WebVTT:
>Rather than having to author this as a style attribute on a span or p
>element, we simply rely on the bidi algorithm and the correct use of
>bidi UTF-8 characters in the text. If you have to deal with a text
>that is not appropriately authored, you have to insert something to
>fix this. In the case of TTML it's <p> or <span> elements with styling
>attributes, in the case of WebVTT it's UTF-8 characters. Both "mutate
>the text".
>
>
>>> So nearly half (ten) of the style properties do not translate at all or
>>> only in part, and ten other style properties require use of separate
>>>style
>>> sheets that have to be delivered independently from the related WebVTT
>>>file.
>
>You might like to adjust your counting after the information provided
>above.
>
>
>>> Overally, TTML1 defines 114 features [2], 69 of which are related to
>>>the
>>> above 24 style properties.
>
>How many of these 114 features are actually in use in the wild? WebVTT
>has a strong drive to only support features that are motivated by a
>use case. If any of these features are necessary, WebVTT can be
>extended to support them. Some of them (like the 'padding' above)
>would be fixed simply by adding the feature to the list of supported
>CSS properties, which takes less than 5min to fix. It would be best
>for us to find this out before we freeze the spec. Any input on use
>cases would be welcome.

Considering the amount of scrutiny and effort that's been put into
developing TTML over many years it's safest to assume that all of the
features have a use case/requirement within the overall scope of
requirements rather than starting from the opposite perspective. Different
subsets of these features may be needed for different parts of the entire
author->audience workflow.

>>>I fully expect that more than half of these
>>> features are not encodable or translatable to WebVTT, or if they are,
>>>then
>>> have the added disadvantage of having to maintain a separate CSS style
>>>sheet
>>> containing rules that apply to specific VTT files.
>
>Why is that a disadvantage? Separating the styling from the content
>has been a driving design principle of the Web and has been part of
>the cause for the success of HTML. I don't see how that would be a
>disadvantage.

NB the style and content are separated in both formats. However in WebVTT
the styling must (currently) be provided externally whereas in TTML it
must (currently) be provided internally to the document. There is a use
case for a single resource containing all the information needed to
render, e.g. when the document is not provided over a two-way connection
or a processor does not support resource pointer dereferencing.

In the general case it may be a good idea to allow user customisation,
however in practice many users of subtitles/captions displayed over
'television' expect it simply to work, and to honour the author's careful
effort in positioning, styling and colouring to create an acceptable
compromise of readable text and visible video - user customisation is very
likely to break that, and should be considered at the user's risk.

I noted that the WebVTT equivalents you've described above that could be
considered to mix styling with content are:
. Breaking cue parsing to remove cues from visibility.
. Using &nbsp characters to render cues with 'no content'.

>
>Best Regards,
>Silvia.
>
>
>>> [1]
>>>
>>>http://www.w3.org/TR/2013/REC-ttml1-20130924/#styling-attribute-vocabula

>>>ry
>>> [2] http://www.w3.org/TR/2013/REC-ttml1-20130924/#feature-designations

>>>
>>



-----------------------------
http://www.bbc.co.uk

This e-mail (and any attachments) is confidential and
may contain personal views which are not the views of the BBC unless specifically stated.
If you have received it in
error, please delete it from your system.
Do not use, copy or disclose the
information in any way nor act in reliance on it and notify the sender
immediately.
Please note that the BBC monitors e-mails
sent or received.
Further communication will signify your consent to
this.
-----------------------------

Received on Monday, 9 December 2013 14:55:23 UTC