This document compares two timed text formats, TTML and WebVTT, and describes how to map content between them.
This is an editor's draft.
In today's media landscape, content is available to viewers in a variety different ways, such as traditional avenues including cinema and television, and modern internet-enabled alternatives. Content often needs to be transformed into different formats in order to be available across this breadth of delivery options.
When authoring media with captions, content creators can choose a timed text format from the set of available formats. While there are many different formats available to carry captions in media, support for different formats is fragmented, with different content delivery channels supporting different formats.
TTML and WebVTT are two popular formats for captions. The two formats have different histories, and as a result they differ in both supported features and approach.
This document focuses on how to translate captions data between the TTML and WebVTT formats. It is divided into three main sections. The first section provides an overview of TTML and WebVTT formats, and describes a high level strategy for performing a mapping between them. The second section provides a detailed discussion of how to map content from TTML to WebVTT. Finally, the third section provides a similar discussion of how to map content from WebVTT to TTML.
Before beginning any mapping between TTML and WebVTT, it is necessary to understand the basic constructs used in these formats for the conveyance of timed text information. This section provides an overview of the fundamentals of text carriage in both TTML and WebVTT, and highlights those features most relevant to a mapping discussion. For the complete and authoritative definitions of these formats, please refer to their respective specifications.
TTML defines the following XML elements as syntactic structure that are used to group text content:
<tt>
<body>
<div>
<p>
<span>
An XML document forms a tree through the nesting of elements. An element may have a subtree "beneath" itself. Any of the above listed elements may have text content in their subtree as direct children or descendants. Because of this relationship they may carry information that can be applied to all text content in their subtree. This information could be used for example for:
Positioning of text content in a rectangular area, such as the specification of left or right horizontal alignment for Latin text.
Styling of text content, such as the specification of text color, with all descendants in the sub-tree inheriting this information.
Timing of text content, such as the specification of the begin and end time attributes.
Because the <tt>
element is the root of a TTML document the information specified on the <tt>
element is significant to all text content in a TTML document. For example, the specification of the language using the xml:lang attribute of the <tt>
element applies to all elements within it. This is also true for the body element because all text content has the body element as ancestor.
Ignoring recursive structures for a moment the nesting hierarchy of content elements is:
Amongst these elements only the <p>
and <span>
element may have text content as direct children. In XML text content is represented by text nodes. Three variations are possible:
<span>
element<p>
and <span>
elements have text content and <span>
elements as childrenThere are two content structures that can be nested to build up recursive structures: <div>
and <span>
. A <div>
can have other <div>
as children. <span>
can also have other <span>
as children.
The TTML specification mandates that TTML docs will contain language information. @xml:lang
is used to provide this information on the <tt>
element. If the language is not known, the empty string can be set.
Example: TTML English Language Identification:
<tt xml:lang="en" ...> ...</tt>
Example: TTML Unknown Language:
<tt xml:lang="" ...> ...</tt>
The @xml:lang
can also be specified on <body>
, <div>
, <p>
and <span>
.
An explicit line break is introduced in TTML through the <br>
element. By default the character codes in the TTML document that represent a line break, such as line feed, carriage return or a pair of both, are not interpreted as a line break for the presentation of the content. This changes only when the @xml:space
attribute with the value "preserve" is applied to some content.
Automatic line breaking can occur during presentation due to limited space of the area where the content is rendered. This behavior could be switched on and off through the style attribute word-wrap. Its default value is "on".
Although UTF-8 is recommended by the TTML spec, a TTML document may use any other character encoding permitted by XML.
In TTML, a region is defined simply as a rectangular area that text can be flowed into. In TTML documents, regions are defined in a <layout>
element. TTML regions can have a variety of properties defined, including the following:
Property | Definition |
---|---|
id, as an xml:id value | An identifier that can be used by other TTML elements to reference the region. |
origin, as a tts:origin value | The x and y coordinates denoting the ( top, left ) corner of the region, with respect to the Root Container. |
extent, as a tts:extent value | The width and height of a region area. |
writing mode, as a tts:writingMode value | Defines the block and inline progression directions. |
padding, as a tts:padding value | Padding or inset space on all sides of the region area. |
inline alignment, as a tts:textAlign value | How inline areas are aligned in the line progression direction within a block. |
block alignment, as a tts:displayAlign value | How block areas are aligned in the block progression direction. |
Notes:
TTML defines for all types of styling information a style attribute. Different style attributes (e.g. @tts:color
) can be specified on <style>
. The <region>
, <body>
, <div>
, <p>
and <span>
elements can reference these style sets by using the id that is defined for a <style>
. A <style>
element can reference other <style>
elements to combine two or more style sets.
Style attributes can also be specified directly on <region>
, <body>
, <div>
, <p>
and <span>
elements. This is called inline styling.
To calculate the style information for content and boxes, use the style resolution process defined by TTML. This merges a chain of referenced <style>
elements with inline defined style attributes to produce a single style set. It also takes into account initial values and inheritance of values.
Although all style attributes can be specified on <region>
, <body>
, <div>
, <p>
and <span>
elements, only a subset of these attributes can be applied to the presentational unit that each of these elements represent.
The table below shows which style attributes apply to which element:
Style Attribute | <region> | <body> | <div> | <p> | <span> |
---|---|---|---|---|---|
backgroundColor | x | x | x | x | x |
color (i*) | x | ||||
direction (i) | x | x | |||
display | x | x | x | x | x |
displayAlign | x | ||||
extent | x | ||||
fontFamily (i) | x | ||||
fontSize (i) | x | ||||
fontStyle (i) | x | ||||
fontWeight (i) | x | ||||
lineHeight (i) | x | ||||
opacity | x | ||||
origin | x | ||||
overflow | x | ||||
padding | x | ||||
showBackground | x | ||||
textAlign (i) | x | ||||
textDecoration (i) | x | ||||
textOutline (i) | x | ||||
unicodeBidi | x | x | |||
visibility (i) | x | x | x | x | x |
wrapOption (i) | x | ||||
writingMode (i) | x | ||||
zIndex (i) | x |
*
If a style attribute is marked with "(i)" it is inheritable.
At first glance, specifying a style attribute on an element where it does not apply doesn't make much sense. But through the concept of inheritance, the values of style attributes could be inherited down the syntax tree to the element where they do apply. It is not a syntax error to specify a style attribute when it has no effect. The style attribute is simply ignored in such cases.
TTML defines several types in the ttp parameter namespace that carry timing information. Some of the TTML timing parameters may be required to accurately convert TTML time expressions as part of a mapping process. The necessary parameters can be provided in the TTML file using the defined types.
This section describes the relevant TTML timing parameters and provides a brief discussion of each type.
TTML Parameter | Possible Values |
---|---|
ttp:timeBase | media, smpte, clock |
Time Base parameter defines the temporal coordinate system to use in interpreting time expressions.
Notes:
TTML Parameter | Possible Values |
---|---|
ttp:dropMode | dropNTSC, dropPAL, nonDrop |
When the Time Base is “smpte”, Drop Mode parameter defines the drop mode to use in interpreting time expressions.
Note:
TTML Parameter | Possible Values |
---|---|
ttp:clockMode | local, gps, utc |
When the Time Base is “clock”, Clock Mode parameter defines the clock mode to use in interpreting time expressions.
Notes:
TTML Parameter | Possible Values |
---|---|
ttp:frameRate | non-zero, positive integer |
When the frame rate associated with a TTML document is integral, Frame Rate represents the frame rate to use in interpreting time expressions.
TTML Parameter | Possible Values |
---|---|
ttp:frameRateMultiplier | two non-zero, positive integers: numerator and denominator |
When the frame rate associated with a TTML document is not integral, Frame Rate Multiplier provides a numerator and denominator to multiply the frame rate value by in order to calculate the effective frame rate for use in interpreting time expressions.
TTML Parameter | Possible Values |
---|---|
ttp:subFrameRate | non-zero, positive integer |
When the frame rate associated with a TTML document is integral, Sub Frame Rate provides a sub frame rate for dividing frames..
Note:
TTML Parameter | Possible Values |
---|---|
ttp:tickRate | non-zero, positive integer |
Tick Rate provides a tick rate to use in interpreting time expression in a TTML document.
The main syntactic structure of WebVTT is the WebVTT cue. All content that is defined for presentation belongs to exactly one WebVTT cue. Content inside a WebVTT cue can be further grouped by using specific spans to apply information that is significant for the rendering of the enclosed content. As with TTML, a WebVTT cue can be translated into a tree structure. Ignoring the specific names of the spans the tree structure could look like the following:
cue->span->"text content"
cue-> "text content"
cue-> mix of span and "text content"
Like TTML, WebVTT has a concept of regions. WebVTT regions are optional, and when no region is specified, the entire video frame is used as a defacto region. WebVTT cues can be affiliated to a region. Cues, with their text content, can signal this affiliation by specifying the identifier of the WebVTT region. As in TTML, this is a one-to-many relationship: a region may have zero to many cue affiliations, but a cue can only have one region affiliation.
If we assume a region with the id "foo", the hierarchical structure would look like the following:
cue(region-id ="foo")->content
When processing WebVTT, the region can be de-referenced and the intermediate tree could look like:
region(id="foo")->cue->content
WebVTT allows the specification of the text through the use of a language span.
Example: WebVTT English Language Specification:
<lang en>...</lang>
In WebVTT an explicit line break is introduced through the Unicode code values for carriage return, line feed or a pair of both. Automatic line breaks can occur during presentation due to limited space in the area where the content is rendered. There is no syntactic construct to influence this behavior.
Text content in a WebVTT file is always encoded in UTF-8.
In WebVTT, a region is also defined as an area that text can be flowed into. However, WebVTT uses different properties to specify a region than TTML. Specifically, WebVTT defines the following properties for a region:
Property | Definition |
---|---|
identifier | An arbitrary string that can be used in cues to reference the region. |
width | The width of the region carried as a percentage of the video width. Defaults to 100. |
lines value | The number of lines of text within the region. Defaults to 3. |
region anchor point | The x and y coordinates, as percentages of the region area, for the point within the region that is anchored to the viewport. Defaults to ( 0,100 ), or the ( bottom, left ) corner of the region. |
region viewport anchor point | The x and y coordinates, as percentages of the viewport, for the point within the viewport to which the region anchor point is affixed. |
scroll value | The scroll value can have one of two values: None or Up. If it is set to None, then text remains on the line it was originally drawn upon. If it is set to Up, then new cues are added to the bottom of the region, and push up any text that is already drawn in the region until the new cue is fully displayed. |
Notes:
In WebVTT, cues can be drawn directly into the video viewport, without the use of regions. When this mode is employed, WebVTT defines automatic behavior for renderers to adjust cue positions in order to avoid any overlap. When WebVTT does use regions, overlap can occur if regions are defined to overlap and contain text at the same time.
When WebVTT cues are drawn directly into the video viewport, and no regions are used, properties on cues are used for specifying position and size.
WebVTT cues can be positioned in one of two ways:
When authoring WebVTT without regions, the position of a cue is determined by its "line" and "position" cue settings. The interpretation of these cue settings will be affected by the value of the "vertical" cue setting, the writing direction, and possibly the "size" cue setting.
The "size" cue setting controls one dimension of the block and is a percentage of the video viewport. For horizontal cues, the size cue setting will be the width of the cue box. For vertical cues, the size cue setting will be the height. The other dimension of the block is determined by the content. The cue box will expand as needed to accommodate the cue text. For horizontal cues, the cue expands down. For vertical cues, the cue expands either left or right, depending on the value of the "vertical" cue setting.
Style information can be applied through span tags (e.g. the tag "b" for bold) or through reference to CSS style information. CSS style information that is defined outside of the WebVTT document can be matched by id-strings defined for cue boxes, regions or span tags.
Defined span tags for styling are:
CSS properties that apply:
In WebVTT, timing information is applied to cues and spans.
In contrast to TTML's support for a large set of timing expressions, WebVTT supports only a single timing expression: hours: minutes: seconds.fractional-seconds. In WebVTT, the hours portion of the timing expression is optional.
While the two formats are both used to carry captions information, there are some important differences between them that should be noted when mapping from one to the other.
<div>
StructuresWebVTT does not have a component that corresponds to <div>
.
TTML allows language identification in different positions in the content hierarchy (e.g. on <tt>
, <p>
and <span>
). WebVTT only permits the specification of a language on the "inline" level.
Mapping positioning information between the TTML and WebVTT formats may be the most difficult part of any conversion process, due to some fundamental differences in the ways the two formats express spatial information. This section discusses of the differences in the spatial controls offered in TTML and WebVTT.
When mapping spatial information between the two formats, it is important to be aware of these differences in their spatial models, and then apply this awareness when making mapping decisions. The two formats differ in positioning support in four main ways:
This section will examine these differences in detail.
TTML provides support for hierarchical elements, and spatial information, including associations with region, can be applied to elements at different levels of the hierarchy. In WebVTT, spatial information and region associations are provided at the cue level.
In TTML, the following elements may be reference regions:
<body>
<div>
<p>
<span>
When converting from TTML to WebVTT, all of the spatial information and region references in the hierarchy must be preserved by applying them to elements within, as each hierarchical item is flattened.
The TTML and WebVTT specifications use different units to express spatial coordinates or distances. The following table compares support for several units between the two formats:
Spatial Units | TTML Supports? | WebVTT Supports? |
---|---|---|
pixel | Yes | No |
em | Yes | No |
cell | Yes | No |
percent | Yes | Yes |
line number | No | Yes |
While both TTML and WebVTT define a construct known as a region, the definition of a region differs significantly from one format to another.
In addition, in WebVTT only a cue can reference a region. In TTML several structures can be associated with a region (e.g. <body>
, <div>
and <p>
).
Both TTML and WebVTT define properties that denote the block and inline progression directions. TTML uses the tts:writingMode attribute to convey this information. WebVTT uses a vertical text cue setting to define the writing direction. In the case where the WebVTT writing direction is defined as vertical, an additional cue setting denotes whether the block progresses from left to right or right to left. Below is a table showing how to express various inline and block progression directions in both TTML and WebVTT.
Inline Progression Direction | Block Progression Direction | TTML | WebVTT |
---|---|---|---|
Left->Right | Top->Bottom | lrtb | auto or horizontal |
Right->Left | Top->Bottom | rltb | auto or horizontal |
Top->Bottom | Right->Left | tbrl | vertical:rl |
Top->Bottom | Left->Right | tblr | vertical:lr |
Left->Right | lr | auto or horizontal | |
Right->Left | rl | auto or horizontal | |
Top->Bottom | tb | vertical:lr |
The two formats differ in the amount of control available over spatial placement of timed text. In general, the TTML format provides a greater degree of control of spatial positioning, while the WebVTT format provides some control, and combines it with some automatic behaviors. The implementation of specified automatic behaviors may vary from one renderer to the next.
In TTML, <p>
elements are flowed into regions. If two regions are defined to overlap spatially, and both display text at the same time, the text may overlap.
In WebVTT, cues can be drawn directly into the video viewport, without the use of regions. When this mode is employed, WebVTT defines automatic behavior for renderers to adjust cue positions in order to avoid any overlap. When WebVTT does use regions, overlap can occur if regions are defined to overlap and contain text at the same time.
When WebVTT cues are drawn directly into the video viewport, and no regions are used, properties on cues are used for specifying position and size.
Some of the style features in TTML and WebVTT are not supported by the other format. The following TTML style attributes have no corresponding CSS property:
The following CSS properties allowed by WebVTT have no corresponding TTML @style
attributes:
Although there may be strategies for mapping these unsupported style features, an evaluation of these strategies is out of scope of this document.
With TTML, all style information is present in the document itself. In contrast, for WebVTT, all CSS selectors and properties are defined in a context external to the WebVTT document. One common case is the specification of the CSS styles in an HTML context where the WebVTT documents are embedded.
In contrast to TTML, WebVTT does not allow inline styling. Inline styling is the direct specification of a style attribute on a syntax structure that "wraps" the text content (e.g. a <p>
and <span>
in TTML or a class span tag in WebVTT).
<div>
ElementsSince WebVTT defines no structure that corresponds to the TTML <div>
element, any style information on <div>
cannot be cannot be mapped directly to a WebVTT document.
<region>
ElementsIn TTML each defined <region>
can hold style information. Although regions exist in WebVTT, CSS properties can only be defined for all WebVTT regions in a file, and cannot be tied to a specific region individually.
In TTML, <style>
elements can reference other <style>
elements to merge the style sets. In WebVTT and CSS, it is not possible to establish a similiar relationship between ::cue pseudo elements.
<body>
, <div>
and <p>
In TTML, multiple <style>
elements can be referenced by <<body>
>, <div>
and <p>
elements. In WebVTT, only one style set can be applied to the complete document or to a cue.
Some of the style features have a slightly different value scope. These differences are described in greater detail in the following sections of this document.
The two formats provide different models for expressing timing information, and support different timing capabilities. For the most part, the timing support in TTML is a superset of the timing support in WebVTT, with the exception of some inter-cue timing constructs in WebVTT that do not exist in the same form in TTML. This section discusses the differences between timing support in the two formats.
TTML and WebVTT differ in timing support in three ways:
This section will examine the different functionality offered by the two formats in detail.
TTML provides support for hierarchical elements, and timing information can be applied at most levels of the hierarchy. In contrast, WebVTT has a flat structure, with no ability to nest captions cues within other elements. In addition, WebVTT defines some intra-cue timing concepts which are not present in TTML.
In TTML, the following elements may contain timing information:
<body>
<region>
<div>
<p>
<span>
When converting from TTML to WebVTT, all of the timing information in this hierarchy must be preserved by applying it to elements within, as each hierarchical item is flattened. In addition, whether a parent element is parallel or sequential must be taken into account when adjusting the timing for child elements during the flattening process.
While WebVTT does not include the support for hierarchical elements found in TTML, it instead introduces some additional timing concepts for intra-cue timing. These can be employed when expressing display modes for text such as roll-up and the past and future pseudo-classes.
WebVTT requires that cues be represented in sequential order, with the earliest cue preceding later cues. TTML does not have this requirement. When converting from TTML to WebVTT, TTML <p>
elements must be sorted into sequential order.
Simply put, timing expressions are the ways in which timing information may be specified in a timed text document. TTML supports a greater set of timing expressions than WebVTT. The following table shows the set of timing expressions available and the support for each expression in TTML and WebVTT.
Timing Expression | TTML Supports? | WebVTT Supports? |
---|---|---|
hours: minutes: seconds | Yes | No |
hours: minutes: seconds.fractional-seconds | Yes | Yes |
hours: minutes: seconds: frames | Yes | No |
hours: minutes: seconds: frames.sub-frames | Yes | No |
hours.fractional-hours | Yes | No |
minutes.fractions-minutes | Yes | No |
seconds.fractional-seconds | Yes | No |
milliseconds.fractional-milliseconds | Yes | No |
frames.fractional-frames | Yes | No |
ticks.fractional-ticks | Yes | No |
When transforming captions from TTML to WebVTT, it is necessary to take into account the differences in the feature sets of the two formats, and to develop some strategies for handling them. The TTML format provides a broader set of options than WebVTT for authoring captions. In addition, the TTML format allows for more complex hierarchical relationships between elements than can be achieved in WebVTT.
Based on these differences, the following strategy for mapping emerges:
The TTML To WebVTT ( TVTT ) mapping profile constrains a TTML document structure to make the mapping between TTML and WebVTT simple and transparent. Many TTML documents will not conform to this profile.
Feature | Provisions |
---|---|
Relative to the TT Feature namespace | |
|
SHALL NOT be used. |
|
MAY be used.
|
|
MAY be used.
|
|
SHALL NOT be used. |
|
MAY be used.
|
|
MAY be used. |
|
MAY be used.
|
|
SHALL NOT be used. |
|
SHALL NOT be used. |
|
SHALL NOT be used. |
|
SHALL NOT be used. |
|
MAY be used.
|
|
MAY be used.
|
|
MAY be used.
|
|
MAY be used. |
|
SHALL NOT be used. |
|
SHALL NOT be used. |
|
SHALL NOT be used. |
|
SHALL NOT be used. |
|
MAY be used. |
|
SHALL NOT be used. |
|
SHALL NOT be used. |
|
SHALL NOT be used. |
|
SHALL NOT be used. |
|
MAY be used.
|
|
SHALL NOT be used. |
|
MAY be used. |
|
MAY be used.
|
|
MAY be used. |
|
MAY be used. |
|
SHALL NOT be used. |
|
MAY be used. |
|
MAY be used. |
|
MAY be used. |
|
MAY be used. |
|
MAY be used. |
|
MAY be used. |
|
MAY be used. |
|
SHALL NOT be used. |
|
SHALL NOT be used. |
|
MAY be used.
|
|
SHALL NOT be used. |
|
MAY be used. |
|
MAY be used. |
|
SHALL NOT be used. |
|
MAY be used. |
|
MAY be used. |
|
MAY be used. |
|
MAY be used. |
|
MAY be used. |
|
MAY be used. |
|
MAY be used. |
|
SHALL NOT be used. |
|
SHALL NOT be used. |
|
SHALL NOT be used. |
|
MAY be used. |
|
SHALL NOT be used. |
|
MAY be used. |
|
MAY be used. |
|
MAY be used.
|
|
SHALL NOT be used. |
|
SHALL NOT be used. |
|
SHALL NOT be used. |
|
SHALL NOT be used. |
|
SHALL NOT be used. |
|
SHALL NOT be used. |
|
SHALL NOT be used. |
|
SHALL NOT be used. |
|
MAY be used. |
|
MAY be used. |
|
SHALL NOT be used. |
|
MAY be used. |
|
SHALL NOT be used. |
|
MAY be used. |
|
SHALL NOT be used. |
|
MAY be used.
|
|
SHALL NOT be used. |
|
MAY be used.
|
|
MAY be used. |
|
SHALL NOT be used. |
|
MAY be used. |
|
MAY be used. |
|
MAY be used.
|
|
MAY be used. |
|
MAY be used. |
|
MAY be used. |
|
MAY be used. |
|
MAY be used. |
|
MAY be used. |
|
MAY be used. |
|
SHALL NOT be used. |
|
SHALL NOT be used. |
|
SHALL NOT be used. |
|
SHALL NOT be used. |
|
SHALL NOT be used. |
|
SHALL NOT be used. |
|
SHALL NOT be used. |
|
MAY be used. |
|
SHALL NOT be used. |
|
MAY be used. |
|
MAY be used.
|
|
MAY be used. |
|
MAY be used. |
|
MAY be used. |
|
MAY be used. |
|
SHALL NOT be used. |
|
MAY be used. |
|
SHALL NOT be used. |
|
MAY be used. |
|
MAY be used. |
|
MAY be used. |
|
MAY be used. |
|
MAY be used.
|
|
SHALL NOT be used. |
To ease conversion, a source TTML document can be transformed to a TTML document that conforms to the TVTT (TTML To WebVTT Document Profile). Below are some steps for how to pre-process a source TTML document so that it is valid against the TVTT.
WebVTT does not define hierarchical elements such as the <body>
or <div>
elements found in TTML. Similarly, the TVTT profile constrains the use of hierarchical elements in documents that conform to it. As a result, when converting TTML documents to TVTT documents, all the information provided in the hierarchical elements must be applied to either the captions within those elements. Through this process, individual captions may have their timing, styling, layout and other information adjusted to take into account values inherited from sections that contain them. Within this document, the term “flattening” is used to designate this process.
In order to flatten a TTML document, work through any hierarchy of <body>
and <div>
elements in a TTML document starting from the <body>
element and apply values from that section to each of the child elements within it. This process can be applied iteratively to create a set of timed text elements with adjusted values that reflect the values inherited from <body>
or <div>
elements. For simple TTML documents without much hierarchy, this step may not be necessary.
Note that once this process has been completed, some information from the original TTML file has been lost, such as the grouping of timed text elements. In addition, some information that was expressed succinctly in TTML is now repeated.
<div>
TTML documents that contain more than one <div>
should be mapped to a document with just one <div>
. All <p>
have to be copied to the outermost <div>
element, in document order. All other <div>
should be pruned.
Example: Merging Multiple <div>
elements
Before:
After:
Add @xml:id
to <p>
If a <p>
element does not yet have an @xml:id
, one should be added with a value as identifier.
Example: Add @xml:id
to <p>
Before:
<p ...>...</p>
<p ...>...</p>
<p ...>...</p>
After:
<p xml:id="p1">...</p>
<p xml:id="p2">...</p>
<p xml:id="p3">...</p>
<p>
For every <p>
the value of @xml:lang
needs to be resolved, taking into account the value of the @xml:lang
of its ancestors. If the value is not the empty string, then a <span>
child should be added to <p>
. This <span>
should enclose the complete content of the <p>
. @xml:lang
of that <span>
shall be set to the resolved language.
Example: Push Down xml:lang to <p>
Before:
<tt xml:lang="en">
<div>
<p>
<span>...</span>
</p>
<p xml:lang="de">
<span>....</span><span xml:lang="fr">....</span>
</p>
</div>
</tt>
After:
<tt xml:lang="">
<div>
<p>
<span xml:lang="en"><span>...</span></span>
</p>
<p>
<span xml:lang="de">....<span xml:lang="fr">....</span><span>
</p>
</div>
</tt>
<region>
for Default RegionTTML defines a "default region" that applies if no <region>
could be resolved. This default region should be explicitly defined in an T2PV document as follows:
<region xml:id="defaultRegion" tts:extent="100% 100%" tts:origin="0% 0%" />
<p>
For every <p>
, a @region
should be specified. If no @region
is specified on <p>
, than @region
should be set to the id of the <region>
that is referenced on the nearest ancestor of the <p>
. If @region
is not specified on <p>
or on its ancestors, the @region
of <p>
should be set to the id of the default region.
Example: Region Resolution for<p>
Before:
<div region="r1">
<p ...>...</p>
<p region="2" ...>...</p>
<p ...>...</p>
</div>
After:
<div>
<p region="1" ...> ...</p>
<p region="2" ...>...</p>
<p region="1" ...> ...</p>
</div>
Note: TTML allows the @region attribute to be set on elements. The discussion of how to map a TTML document containing such elements is out of the scope of this document.
@xml:space
PreserveIf the resolved @xml:space
value of <p>
or <span>
is set to preserve
, then all characters for linefeed in a <p>
and <span>
should be replaced by a <br>
and all spaces should be replaced by the entity for non-breaking-space (
).
Example: Translate @xml:space
Preserve
Before:
<p xml:space="preserve" ...>
Good morning!
- Good morning!
</p>
After:
<p ...><br/>Good morning!<br/> - morning!<br/></p>
All text content of <p>
and <span>
elements should be whitespace normalized, with leading and trailing whitespace characters deleted, and any whitespace character replaced by a space character.
Example: Whitespace Normalization
Before:
<p xml:space="default" ...>
Good evening!<br/>
- Good evening!
</p>
After:
<p xml:space="default" ...>Good evening!<br/>- Good evening!</p>
<p>
ElementsIn order to conform to the TVTT profile, all references to regions must be applied to <p>
. If a <p>
element is nested in other elements, any region references that exist on its parent elements should be moved to the <p>
element.
<style>
Elements from <region>
ElementsIn order to conform to the TVTT profile, all nested <style>
elements in <region>
element definitions must be removed, with the attributes applied directly to the <region>
itself. The sections below describe the details of this process.
Inline styles shall not be used in the TVTT apart from @tts:textAlign
specified on <p>
elements and @tts:extent
, @tts:origin
, @tts:displayAlign
and @tts:writingMode
on <region>
elements. All other style attributes specified inline on <body>
, <div>
, <p>
or <span>
must be mapped to a <style>
element that is then referenced by this content element.
Example: Conversion of Inline Styling
Before:
<tt ...>
...
<body>
<div>
<p xml:id="p1" tts:fontFamily="monospace" tts:color="white" >
<span tts:backgroundColor="black">Whose house?</span><br/>
<span tts:color="lime" tts:backgroundColor="black">- My master´s</span>
</p>
</div>
</body>
</tt>
After:
<tt ...>
<head>
<styling>
<style xml:id="p1_style" tts:color="white" tts:fontFamily="monospace" />
<style xml:id="background_black" tts:backgroundColor="black"/>
<style xml:id="color_lime" tts:color="lime"/>
</styling>
</head>
<body>
<div>
<p xml:id="p1" style="p1_style">
<span style="background_black">Whose house?</span><br/>
<span style="color_lime background_black">- My master´s</span>
</p>
</div>
</body>
</tt>
<style>
Elements that Reference Other <style>
ElementsIf a <style>
element references another <style>
element, the style values that result from this reference, or from a continuing chain of style references, have to be resolved and merged into the set of style attributes of the referencing <style>
element. If the same style attribute is defined in both a referenced <style>
element and the referencing <style>
element, the value of the attribute in the referencing <style>
element is used.
Example: Conversion of Referenced Style Elements
Before:
<styling>
<style xml:id="s3" tts:color="blue" tts:backgroundColor="white" tts:fontFamily="monospace" />
<style xml:id="s2" tts:color="white" tts:backgroundColor="black" style="s3"/>
<style xml:id="s1" tts:color="lime" style="s2"/>
</styling>
After:
<styling>
<style xml:id="s3" tts:color="blue" tts:backgroundColor="white" tts:fontFamily="monospace" />
<style xml:id="s2" tts:color="white" tts:backgroundColor="black" tts:fontFamily="monospace" />
<style xml:id="s1" tts:color="lime" tts:backgroundColor="black" tts:fontFamily="monospace"/>
</styling>
<body>
If more than one <style>
is referenced by <body>
, a new <style>
needs to be created where all style attributes of the referenced styles are merged.
If no style is referenced by <body>
, an empty <style>
is created and referenced by the <body>
.
Example: Conversion of Multiple Styles
Before:
<head>
<styling>
<style xml:id="fontStyles" tts:fontFamily="monospace" tts:fontSize="200%" tts:lineHeight="normal"/>
<style xml:id="colorStyles" tts:color="white" tts:backgroundColor="black"/>
</styling>
</head>
<body style="colorStyles fontStyles">
</body>
After:
<head>
<styling>
<style xml:id="defaultStyle" tts:color="white" tts:fontFamily="monospace" tts:fontSize="200%"/>
<style xml:id="fontStyles" tts:fontFamily="monospace" tts:fontSize="200%" />
<style xml:id="colorStyles" tts:color="white" />
</styling>
</head>
<body style="defaultStyle">
</body>
<div>
ElementsThe resolved style set of the first <div>
in the document is merged into the <style>
element referenced by the <body>
. If a style attribute is already set in the <style>
referenced by <body>
, the value is overwritten by the value of the style value applied to the <div>
.
Example: Conversion of Styles Applied to<div>
Elements
Before:
<head>
<styling>
<style xml:id="defaultStyle" tts:color="white" tts:fontFamily="monospace" tts:fontSize="200%"/>
<style xml:id="newFont" tts:fontFamily="Verdana" tts:fontSize="160%"/>
</styling>
</head>
<body style="defaultStyle">
<div style="newFont"> .... </div>
</body>
After:
<head>
<styling>
<style xml:id="defaultStyle" tts:color="white" tts:fontFamily="Verdana" tts:fontSize="160%"/>
<style xml:id="newFont" tts:fontFamily="Verdana" tts:fontSize="160%"/>
</styling>
</head>
<body style="defaultStyle">
<div> .... </div>
</body>
<region>
ElementsIn a TVTT document, only the style attributes @tts:extent
, @tts:origin
, @tts:displayAlign
and @tts:writingMode
shall be specified on a <region>
. The <region>
shall not contain any style references nor <style>
elements as children. If a source TTML document does not comply with this constraint, then all style references have to be resolved and merged, taking into account the style values of the <style>
children of the <region>
. A new <style>
for the resolved set of style values is created. Every <p>
that references that <region>
should reference this <style>
.
All <style>
elements that are not referenced should be pruned from the document.
As mentioned above, there are a few incompatibilities between the set of styling attributes available in TTML and in WebVTT. These are expressed in the TVTT profile. Below are some recommendations how to handle these when translating a general TTML document into the TVTT profile.
tts:color
tts:background-color
font-family names
length metric "c":
font-size:
The cell resolution should be set to "1 1". If the tts:fontSize is not specified on the region element and no fontSize was specified on the parent element then the percentage value of the attribute tts:fontSize is relative to the computed size of 1c. By extending 1c over the height of the video viewport the percentage values in fontSize are relative to the height of the video viewport and therefore map directly to CSS he font-size attribute in WebVTT that uses the 'vh' metric.
As default values for corresponding style structures (e.g. font-size) may differ between TTML and WebVTT, the document structure should be pre-processed to apply explicitly defined values, rather than relying on default values.
Because TTML supports many more timing expressions than those included in the TVTT profile, it may be necessary to perform a pre-processing step to convert a TTML document into a document that conforms to the TVTT profile. In many cases, this type of processing will require timing parameters to convert timing expressions into the format supported by TVTT.
As a first step in mapping timing information from TTML to TVTT, convert all times in the TTML document into the time format supported by TVTT: hours: minutes: seconds.fractional-seconds, and limit the fractional-seconds to three decimal places. This conversion simplifies the mapping to WebVTT, as it results in all timing information expressed in the units supported by WebVTT. This section steps through the supported TTML timing expressions and describes how to convert from each of them into to the format that is included in the TVTT profile.
TTML Time Expression | TVTT Time Expression | Relevant Parameters |
---|---|---|
hours: minutes: seconds | hours: minutes: seconds.fractional-seconds | Time Base, Clock Mode |
For example:
TTML Time Expression | TVTT Time Expression |
---|---|
00:00:40 | 00:00:40.000 or 00:40.000 |
Notes:
TTML Time Expression | TVTT Time Expression | Relevant Parameters |
---|---|---|
hours: minutes: seconds.fractional-seconds | hours: minutes: seconds.fractional-seconds | Time Base, Clock Mode |
This second case requires no transformation, except to limit the fraction-seconds portion of the timing expression to three decimal places.
For example:
TTML Time Expression | TVTT Time Expression |
---|---|
01:02:43.0345555 | 01:02:43.035 |
Note:
TTML Time Expression | TVTT Time Expression | Relevant Parameters |
---|---|---|
hours: minutes: seconds:frames | hours: minutes: seconds.fractional-seconds | Time Base, Clock Mode, Drop Mode, Marker Mode, Frame Rate, Frame Rate Multiplier |
When converting from time expressions that contain frames, it is necessary to know the frame rate that the TTML document uses. This information may be provided as parameters within the TTML document. TTML specifies two parameter types for carrying frame rate: ttp:frameRate and ttp:frameRateMultiplier.
In addition, in the case where the ttp:timeBase is equal to smpte and the ttp:markerMode is either not set or set to discontinuous, it will be necessary to account for any discontinuities in timing expressions when converting.
For example, in a TTML file with a frame rate of 30:
TTML Time Expression | TVTT Time Expression |
---|---|
01:02:43:07 | 01:02:43.233 |
As another example, in a TTML file with a frame rate of 30 and a frame rate multiplier of 1000:1001:
TTML Time Expression | TVTT Time Expression |
---|---|
01:02:43:07 | 01:02:43.234 |
Notes:
TTML Time Expression | TVTT Time Expression | Relevant Parameters |
---|---|---|
hours: minutes: seconds: frames.sub-frames | hours: minutes: seconds.fractional-seconds | Time Base, Clock Mode, Drop Mode, Frame Rate, Sub Frame Rate |
When converting from time expressions that contain frames and sub-frames, it is necessary to know the frame rate and the sub-frame rate that the TTML document uses. This information may be provided as parameters within the TTML document, or as external data that is input to the mapping process. TTML specifies two parameter types for carrying frame rate: ttp:frameRate and ttp:subFrameRate.
For example, in a TTML file with a frame rate of 30 and a sub-frame rate of 2:
TTML Time Expression | TVTT Time Expression |
---|---|
01:02:43:07.1 | 01:02:43.25 |
Notes:
TTML Time Expression | TVTT Time Expression | Relevant Parameters |
---|---|---|
hours.fractional-hours | seconds.fractional-seconds | Time Base, Clock Mode |
When converting between durations in hours to durations in seconds, simply multiply by the number of seconds in an hour: 3600.
For example, for a duration of 3 hours:
TTML Time Expression | TVTT Time Expression |
---|---|
3h | 03:00:00.000 |
Similarly, for a duration of 3.45 hours:
TTML Time Expression | TVTT Time Expression |
---|---|
3.45h | 03:27:00.000 |
Note:
TTML Time Expression | TVTT Time Expression | Relevant Parameters |
---|---|---|
minutes.fractional-minutes | seconds.fractional-seconds | Time Base, Clock Mode |
When converting between durations in hours to durations in seconds, simply multiply by the number of seconds in a minute: 60.
For example, for a duration of 3 minutes:
TTML Time Expression | TVTT Time Expression |
---|---|
3m | 00:03:00.000 |
Similarly, for a duration of 3.45 minutes:
TTML Time Expression | TVTT Time Expression |
---|---|
3.45m | 00:03:27.000 |
Note:
TTML Time Expression | TVTT Time Expression | Relevant Parameters |
---|---|---|
seconds.fractional-seconds | seconds.fractional-seconds | Time Base, Clock Mode |
This case requires very little transformation, as WebVTT supports this format. In some cases, it may be necessary to append the fractional seconds equal to zero to the timestamp.
For example, for a duration of 3 seconds:
TTML Time Expression | TVTT Time Expression |
---|---|
3s | 00:00:03.000 |
Similarly, for a duration of 3.45 seconds:
TTML Time Expression | TVTT Time Expression |
---|---|
3.45s | 00:00:03.450 |
Notes:
TTML Time Expression | TVTT Time Expression | Relevant Parameters |
---|---|---|
milliseconds.fractional-seconds | seconds.fractional-seconds | Time Base, Clock Mode |
When converting between durations in hours to durations in seconds, simply divide by the number of seconds in a minute: 1000.
For example, for a duration of 3 milliseconds:
TTML Time Expression | TVTT Time Expression |
---|---|
3ms | 00:00:00.003 |
Similarly, for a duration of 3.45 milliseconds:
TTML Time Expression | TVTT Time Expression |
---|---|
3ms | 00:00:00.004 |
Note:
TTML Time Expression | TVTT Time Expression | Relevant Parameters |
---|---|---|
frames.fractional-frames | seconds.fractional-seconds | Time Base, Clock Mode, Drop Mode, Frame Rate, Frame Rate Multiplier |
When converting from time expressions that contain frames, it is necessary to know the frame rate that the TTML document uses. This information may be provided as parameters within the TTML document, or as external data that is input to the mapping process. TTML specifies two parameter types for carrying frame rate: ttp:frameRate and ttp:frameRateMultiplier.
For example, in a TTML file with a frame rate of 30:
TTML Time Expression | TVTT Time Expression |
---|---|
75f | 00:00:02.500 |
As another example, in a TTML file with a frame rate of 30 and a frame rate multiplier of 1000:1001:
TTML Time Expression | TVTT Time Expression |
---|---|
75f | 00:00:02.502 |
Notes:
TTML Time Expression | TVTT Time Expression | Relevant Parameters |
---|---|---|
ticks.fractional-ticks | seconds.fractional-seconds | Time Base, Clock Mode, Tick Rate |
When converting from time expressions that contain ticks, it is necessary to know the tick rate that the TTML document uses. This information may be provided as parameters within the TTML document, or as external data that is input to the mapping process. TTML specifies the following parameter type for carrying tick rate: ttp:tickRate.
For example, given a Tick Rate of 15:
TTML Time Expression | TVTT Time Expression |
---|---|
50t | 00:00:03.333 |
Similarly, for a duration of 50.45 ticks:
TTML Time Expression | TVTT Time Expression |
---|---|
50.45t | 00:00:03.363 |
Note:
The TVTT profile does not support durations, but rather requires that cue timings be expressed as begin and end times. Therefore, as part of transforming general TTML documents to conform with the TVTT profile, any timing expressed as duration must be transformed into an end time.
Example: Duration to End Time Conversion
This example starts with an excerpt from a TTML file that uses "dur" to express the amount of time that <p>
elements should be displayed.
<body begin="00:00:20.000" end="00:00:50.000>
<div begin="00:00:01.000" dur="10.000s">
<p begin="00:00:00.000" dur="5s">Appears at 21 secs<br>
and remains visible to 26 secs</p>
<p begin="00:00:05.000" dur="5s">Appears at 26 secs<br>
and remains visible to 31 secs</p>
</div>
</body>
Transform timing information from "dur" to "end".
<body begin="00:00:20.000" end="00:00:50.000>
<div begin="00:00:01.000" end="00:00:31.000">
<p begin="00:00:00.000" end="00:00:05.000">Appears at 21 secs<br>
and remains visible to 26 secs</p>
<p begin="00:00:05.000" end="00:00:10.000"">Appears at 26 secs<br>
and remains visible to 31 secs</p>
</div>
</body>
Once all timing expressions have been converted to be valid against the TVTT profile, the next step is to preserve and apply timing information from parent elements in order to calculate the correct timing to use for the WebVTT cue.
Below are several examples of TTML excerpts containing body, <div>
and <p>
elements, with timing information included in each element.
Example: Parallel Timing
This example starts with an excerpt from a TTML file that does not specify a timeContainer attribute on any element. When not specified, the timeContainer defaults to parallel timing of child elements.
<body begin="00:00:20.000" end="00:00:50.000>
<div begin="00:00:01.000" end="00:00:11.000">
<p begin="00:00:00.000" end="00:00:05.000">Appears at 21 secs<br>
and remains visible to 26 secs</p>
<p begin="00:00:05.000" end="00:00:10.000">Appears at 26 secs<br>
and remains visible to 31 secs</p>
</div>
</body>
Step 1: Apply the timing information from the <body>
to the <div>
.
The <div>
begin and duration times must be adjusted to account for the <body>
element's begin and end times, so that:
<div begin="00:00:21.000" end="00:00:31.000">
<p begin="00:00:00.000" end="00:00:05.000">Appears at 21 secs<br>
and remains visible to 26 seconds</p>
<p begin="00:00:05.000" end="00:00:10.000">Appears at 26 secs<br>
and remains visible to 31 secs</p>
</div>
Step 2: Apply the timing information from the <div>
to the <p>
.
<p begin="00:00:21.000" end="00:00:26.000">Appears at 21 secs<br>
and remains visible to 26 seconds</p>
<p begin="00:00:26.000" end="00:00:31.000">Appears at 26 secs<br>
and remains visible to 31 secs</p>
Note:
<body>
and <div>
elements did not impact the timestamps of the <p>
elements during the flattening process. If one of the <p>
elements had had an end time beyond either the <body>
or <div>
durations, it would have been truncated to the enclosing elements’ durations.Example: Sequential Timing
This example starts with an excerpt from a TTML file that specifies a timeContainer attribute on the div element with a sequential value.
<body begin="00:00:20.000" end="00:00:50.000">
<div timeContainer="seq" begin="00:00:01.000" end="00:00:21.000">
<p begin="00:00:00.000" end="00:00:05.000">Appears at 21 secs<br>
and remains visible to 26 secs</p>
<p begin="00:00:05.000" end="00:00:05.000">Appears at 31 secs<br>
and remains visible to 36 secs</p>
</div>
</body>
Step 1: Apply the timing information from the body to the div element.
The div begin and duration times must be adjusted to account for the body’s begin and end times, so that:
<div begin="00:00:21.000" end="00:00:31.000">
<p begin="00:00:00.000" end="00.00.05.000">Appears at 21 secs<br>
and remains visible to 26 seconds</p>
<p begin="00:00:05.000" end="00:00:10.000">Appears at 26 secs<br>
and remains visible to 31 secs</p>
</div>
Step 2: Apply the timing information from the <div>
to the <p>
.
<p begin="00:00:21.000" end="00:00:26.000">Appears at 21 secs<br>
and remains visible to 26 seconds</p>
<p begin="00:00:31.000" end="00:00:36.000">Appears at 31 secs<br>
and remains visible to 36 secs</p>
<head>
Elements to WebVTTThe TTML <head>
element contains metadata as well as Styling and Layout elements. Other sections of this document provide detailed descriptions for mapping information from the <head>
element to WebVTT.
<body>
Elements to WebVTTTTML documents contain a <body>
element. This element holds the captions, and makes reference to the styling, timing, layout and other information defined in the TTML <head>
element. The <body>
element can also use <div>
elements to organize captions into groups. Captions inherit timing, layout or styling information from the elements that contain them.
In order to map the contents of a TTML <body>
to WebVTT, several processes must be applied. These processes attempt to preserve information while transforming the captions data into a form that can be represented in WebVTT.
TTML documents can have captions listed in arbitrary order with respect to time, while WebVTT documents must have captions listed according to their display time, ordered from earliest time to latest time. Therefore, captions from a TTML document must be put into display time order prior to mapping them into WebVTT. Once the flattening step is finished, the next step is to re-order the captions based on the timing of each <p>
.
For many attributes, including spatial and timing values, TTML supports a larger set of representations and units than WebVTT does. As a result, many TTML documents will require unit conversions to be transformed into valid WebVTT documents.
Later sections of this document describe in detail how to map positioning, styling and timing information between the two formats.
<p>
Every <p>
is mapped to a WebVTT cue. The value of @xml:id
of <p>
is mapped to the id of the corresponding cue. The text content of the <p>
is mapped to cue text. See the styling section for how to map the @style
of a <p>
.
Example: Mapping the <p>
Element
Before:
<p begin="00:00:00.000" end="00:00:02.000" xml:id="p1" ...>
Good morning!
</p>
After:
p1
00:00:00.000 --> 00:00:02.000
Good morning!
<span>
Every <span>
that has a style attribute is mapped to a class span tag
in WebVTT where the values of the @style
are mapped to applicable classes of the class span tag. Every <span>
with @xml:lang
is mapped to a language span tag with a the corresponding value.
Example: Mapping the <span>
Element
Before:
<span xml:lang="en"><span style="s1 s2">Good morning</span>
After:
<lang en><c.s1.s2>Good morning</c>
<br>
A <br>
is mapped to a WebVTT line terminator.
Example: Mapping the <br>
Element
Before:
<p ...>What a day!<br/>- Yes!</p>
After:
What a day!
- Yes!
The position and dimension of the TTML Root Container region may differ from the dimensions of the video. In other words, there may be some padding around the Root Container region. In this case, the padding must be taken into account when computing WebVTT percentages. Ultimately, the WebVTT values must be expressed relative to the video viewport dimensions.
To convert tts:extent, when applied to a TTML region, to WebVTT cue settings:
To convert tts:extent, when applied to a TTML region, to a WebVTT region:
To convert tts:origin to WebVTT cue settings:
To convert tts:origin to WebVTT region settings:
Example: Converting from a TTML Region to a WebVTT Cue
In this example, the TTML is converted to WebVTT without using WebVTT regions. It begins with a few fragments of TTML, the first containing a region definition, and the second containing a <p>
element:
Before:
<layout>
<region xml:id="reg3" tts:origin="25% 80%" tts:extent="50% 16%" >
</region>
</layout>
<p region="reg3" begin="00:00:00.000" end="00:00:10.000">A simple caption example.</p>
After: Converted to WebVTT syntax
00:00:00.000 --> 00:00:10.000 position:25% line:80% size:50% align:start
A simple caption example.
Notes:
In WebVTT, text alignment defaults to the middle of the cue box. In order to have cue text aligned to the left, it is necessary to add the align:start, or align:left value to the cue.
In WebVTT, there is no way to directly specify the size of the dimension in the block progression direction, in this case, the vertical direction. This dimension is determined by the amount of text in the cue box.
Example: Convert a Region from TTML to WebVTT, Positioned at the (top, left) Corner of the Viewport
This example begins with a TTML region definition:
<layout>
<region xml:id="regionExample1" tts:origin="0% 0%" tts:extent="50% 16%" >
</region>
</layout>
Step 1: Convert from the vertical extent value from percent to line number
WebVTT specifies the vertical size of a region in terms of integer lines of text. Assuming a default line height of 5.33,
number of lines = 16% / 5.33vh = 3
Step 2: Convert from TTML to WebVTT syntax
Region: id=regionExample1 width=50% lines=3 regionanchor=0%,0% viewportanchor=0%,0% scroll=up
Notes:
Example: Converting a Region from TTML to WebVTT, Positioned in the Middle of the Viewport
This example begins with a TTML region definition:
<layout>
<region xml:id="regionExample2" tts:origin="25% 80%" tts:extent="50% 32%">
</region>
</layout>
Step 1: Convert from the vertical extent value from percent to line number
WebVTT specifies the vertical size of a region in terms of integer lines of text. Assuming a default line height of 5.33,
number of lines = 32% / 5.33vh = 6
Step 2: Convert from TTML to WebVTT syntax
Region: id=regionExample2 width=50% lines=6 regionanchor=0%,0% viewportanchor=25%,80% scroll=up
When starting from a TTML document that conforms to the TVTT profile, the first step is to create a CSS file where all of the <style>
elements are mapped to ::cue pseudo elements with the value of @xml:id
of the <style>
element as a class name.
Example: Creating CSS Style Class from TTML Style Element
Starting with the following TTML snippet:
<style xml:id="fontStyles" tts:fontFamily="monospace" tts:fontSize="200%"/>
<style xml:id="colorStyles" tts:color="white" tts:backgroundColor="black"/>
The corresponding CSS classes looks like this:
::cue(.fontStyles) {
font-family=monospace;
font-size=200%;
}
::cue(.colorStyles){
color= white;
background-color: black;
}
The table below shows how to map TTML style attributes to CSS properties.
TTML Style attribute | CSS property |
---|---|
<tts:backgroundColor> |
background-color |
<tts:color> |
color |
<tts:fontFamily> |
font-family |
<tts:fontSize> |
font-size |
<tts:fontStyle> |
font-style |
<tts:fontWeight> |
font-weight |
<tts:lineHeight> |
line-height |
<tts:textDecoration> |
text-decoration |
<tts:textOutline> |
outline-color |
<tts:visibility> |
visibility |
As part of the mapping process, it is necessary to convert rgba notated colors from the notation used in TTML to the notation used in CSS. This conversion is accomplished by dividing the last value by 255 and rounding to a decimal with a fraction expressed as one digit.
Example: Translating TTML Background Color to CSS
TTML: tts:backgroundColor="rgba(0,0,0,178)"
CSS: background-color: rgba(0,0,0,0.7);
@style
on <body>
Any <style>
elements referenced by the <body>
of a TTML document should be mapped to a ::cue pseudo element containing the corresponding CSS properties and values.
Example: Mapping Style Elements Applied to <body>
TTML:
<style xml:id="defaultStyle" fontWeight="normal" fontSize="100%" .../>
CSS:
::cue {
font-weight=normal;
font-size=100%;
}
@style
on <p>
If a <p>
contains references to one or more <style>
elements, the corresponding cue should start with a c class span tag, where the style references are mapped to applied classes.
Example: Mapping Style Elements Applied to <p>
TTML:
<p style="s1 s2 s3" ...>Good morning!</p>
WebVTT:
<c.s1.s2.s3>Good morning!
@style
on <span>
A <span>
that references a <style>
can be mapped to a "c span tag" where the references to styles are mapped to applicable CSS class names.
Example: Mapping Style Elements Applied to <span>
<span style="speaker1">What a day!</span><br/>
<span style="speaker2">Yes!</span>
Example WebVTT:
<c.speaker1>What a day!</c>
<c.speaker2>Yes!</c>
Most of the transformation of timing information occurs during pre-processing. Once a document conforms to the TVTT format, only a few remaining transformations must be handled during the mapping to WebVTT:
<span>
ElementsDocuments that employ <span>
elements with timing information will require additional processing when mapping from TVTT to WebVTT.
Example: <span>
Elements
This example starts with an excerpt from a TTML file that includes some <span>
elements within <p>
elements.
<body timeContainer="par">
<div timeContainer="par">
<p begin="00:00:10.000" end="00:00:40.000">
<span end="00:00:24.400">Appears at 10 seconds and
disappears at 24.4 seconds</span>
<br/>
<span begin="00:00:25.000" end="00:00:35.000">Appears at 25 seconds and
disappears at 35 seconds</span>
</p>
</div>
</body>
Step 1: Define a CSS class for hidden text
::cue(.invisible_text) { color: rgba(0, 0, 0, 0);}
Step 2: Transform the spans into separate <p>
elements
<p begin="10.000s" end="24.400s">Appears at 10 seconds and
disappears at 24.4 seconds</p>
<p begin="00:00:25.000" end="35.000s">Appears at 25 seconds and
disappears at 35 seconds</p>
Step 3: Convert from TTML to WebVTT syntax
00:00:10.000 --> 00:00:24.400
This text must appear at 10 seconds and disappear at 24.4 seconds
<c.invisible_text>This text must appear at 25 seconds and disappear at 35 seconds</c>
00:00:25.000 --> 00:00:35.000
<c.invisible_text>This text must appear at 10 seconds and disappear at 24.4 seconds</c>
This text must appear at 25 seconds and disappear at 35 seconds
An alternative approach is to use intra-cue timings.
0:00:10.000 --> 0:00:40.000
This text must appear at 10 seconds and disappear at 24.4 seconds\r
<0:00:24.400><0:00:25.000>This text must appear at 25 seconds and disappear at 35 seconds<0:00:35.000>
TTML:
<p begin="00:00:21.000" end="00:00:26.000">Appears at 21 secs<br>
and remains visible to 26 seconds</p>
<p begin="00:00:31.000" end="00:00:36.000">Appears at 31 secs<br>
and remains visible to 36 secs</p>
WebVTT:
00:00:21.000 --> 00:00:26.000
Appears at 21 secs
and remains visible to 26 seconds
00:00:31.000 --> 00:00:36.000
Appears at 31 secs
and remains visible to 36 secs
Completing these steps results in a document with a list of WebVTT cues. The last step is to sort these cues from earliest to latest time, based on each cue's beginning timestamp.
<p>
ElementsBefore mapping the syntax of WebVTT cues to the syntax of TTML <p>
elements, it can be useful to assemble WebVTT cues into groups that can be transformed into TTML <div>
elements. The hierarchical elements of TTML's syntactic structure can provide opportunities for consolidating expressions of style and layout.
When mapping positioning information from WebVTT to TTML, start by generating a TTML region definition for each WebVTT region or cue that has a different block size or location. After developing these independent regions, it may be possible to optimize the TTML by sharing or merging region definitions.
WebVTT has the automatic behavior that cue positions are subject to adjustment if cues overlap as positioned by the cue settings. TTML does not have analogous automatic behavior. To avoid overlap in the TTML version of a document, adjust the positioning of WebVTT cues and regions as part of the mapping process.
In WebVTT, the position of a cue is determined by its "position" and "line" cue settings. When interpreting these cue settings, it is necessary to apply the values of other WebVTT cues settings including:
The vertical cue setting
The writing direction cue setting
The size cue setting
The line cue setting may either be a percentage or a line number. Positive line numbers are counted from the top, negative line numbers are counted from the bottom.
The position is configured in the direction opposite to the writing direction. For example, with horizontal cues, the writing direction is the vertical direction.
The optional alignment value determines whether the position is calculated relative to the start, middle, or end of the cue box. If the alignment value is not "start", then alignment depends upon the size of the cue box. For example, an alignment value of "end" will be equivalent to an alignment value of "start" plus the size of the cue box in the appropriate direction.
In order to calculate the coordinates of the upper left vertex of a WebVTT cue box as percentage, use the following steps.
The "position" cue setting will determine one coordinate, referred to as the position_coordinate.
The "line" cue setting will determine the other coordinate, referred to as the line_coordinate.
When converting from WebVTT cue settings to a tts:extent value, the goal is to arrive at extent values expressed as percentages of the viewport's width and height. In cases where WebVTT cues express spatial dimensions solely using percentages of the viewport, there will be no need to convert into different units, as TTML supports percentages. In cases where WebVTT uses line numbers for vertical dimensions, it will be necessary to convert the line numbers into percentages of the viewport's height. As discussed above, this conversion depends upon determining the correct line height to use. Assuming WebVTT dimensions have been converted into percentages, the TTML extent can be calculated in the following way
When the cue's vertical setting is 'auto' or 'horizontal' the first value of the tts:extent pair will be equal to the cue's size property. The second value of the tts:extent pair must be synthesized, as WebVTT does not specify the size of the cue in the block progression direction. This second value can be computed by looking at the number of lines of text in the cue, and multiplying it by the computed line height, to achieve a value as a percentage of the viewport height.
tts:extent = ( size, computed line height * number of lines )
When the cue's vertical setting is 'vertical', the first value in the tts:extent pair that must be synthesized, while the second value in the pair will be equal to the cue's size property. The first value should be computed based on the number of lines in the cue and some font metrics.
tts:extent = ( computed line width * number of lines, size )
To convert from WebVTT cue position settings to TTML, it is necessary to set both the tts:origin and tts:writingMode attributes on a TTML region.
As a first step, determine the TTML writing mode, based on the WebVTT vertical text cue setting and writing direction. If the setting is not present, the default value is horizontal. If the setting is vertical, it will have a writing direction of either left to right or right to left associated with it. Refer to the table above to map WebVTT values to TTML values.
Next, calculate the tts:origin value. The tts:origin will correspond to the value of the top, left corner of the WebVTT cue box.
In the case of a horizontal or auto writing mode:
tts:writing = "lrtb"
tts:origin = ( position, line )
Note:
In the case of a vertical writing mode, with a left to right writing direction:
tts:writing = "tblr"
tts:origin = ( line, position )
Example: Converting from a WebVTT Cue to TTML
This example begins with a WebVTT fragment containing a cue that does not use a region:
00:00:00.000 --> 00:00:10.000 position:50% line:0% size:50%
A cue with no region.
Step 1: Define a TTML Region
Start by defining a TTML Region for this caption.
<layout>
<region xml:id="reg4" tts:origin="50% 0%" tts:extent="50% 16%" tts:writingMode="lrtb" >
</region>
</layout>
Step 2: Convert from WebVTT to TTML Syntax
Reference the region created above using its id.
<p region="reg4" begin="00:00:00.000" end="00:00:10.000" tts:textAlign="center">A cue with no region.</p>
Notes:
WebVTT does not specify a vertical dimension, so in the process of mapping from WebVTT to TTML, it is necessary to synthesize a value for this dimension. Choosing 16%, assuming a default line height of 5.33 vh, gives us a vertical dimension roughly equal to three lines of text. This choice may be appropriate for some WebVTT content where there are no cues with more than three lines.
In WebVTT, text alignment defaults to the middle of the cue box. In order to replicate this behavior in TTML, it is necessary to set the tts:textAlign attribute to "center" on the region.
In the case of a vertical writing mode, with a left to right writing direction:
tts:writingMode = "tbrl"
tts:origin = ( line, position )
When determining the TTML origin based on a WebVTT region, it is necessary to take into account the two different types of anchors defined for WebVTT regions. The TTML origin will always correspond to the top, left corner of the region, while the WebVTT anchor point may correspond to some other point within the region.
To convert from WebVTT region settings to a tts:origin value:
Example: Converting from a WebVTT Cue with Region to TTML
This example begins with a WebVTT fragment containing a cue that does use a region:
Region: id=reg5 width=30% lines=3 regionanchor=50%,50% viewportanchor=25%,40% scroll=up
00:00:00.000 --> 00:00:10.000 region:reg5
A cue that uses a region.
Step 1: Convert Height from Number of Lines to Percent Once again, the default line height of 5.33 vh will be used for this conversion.
line height = 3 lines * 5.33vh = 16%
Step 2: Calculate Coordinates of Top, Left Corner from Anchors Given that the region anchor is in the middle of the region, the viewport anchor provides the coordinates for the center of the region. From that information, and the calculated line height, the coordinates for the top, left corner can be calculated.
The first coordinate of the tss:extent pair will measure the horizontal position of the origin. To calculate it, the horizontal values of both the viewport anchor and the region anchor are needed, along with the width of the region.
extent horizontal = horizontal-viewport anchor - horizontal-region anchor * region width
extent horizontal = 25 - .50*30
extent horizontal = 10
The second coordinate of the tss:extent pair will measure the vertial position of the origin. To calculate it, the vertical values of both the viewport anchor and the region anchor are needed, along width the height of the region, converted into a percentage of the viewport.
extent vertical = vertical-viewport anchor - vertical-region anchor * region height
extent vertical = 40 - .50*16
extent vertical = 32
Step 3: Convert Region Definition to TTML Start by defining a TTML Region for this caption.
<layout>
<region xml:id="reg5" tts:origin="10% 32%" tts:extent="30% 16%" tts:textAlign="center">
</region>
</layout>
Step 4: Convert WebVTT Cue to TTML <p>
Reference the region created above using its id.
<p region="reg5" begin="00:00:00.000" end="00:00:10.000">A cue that uses a region.</p>
WebVTT uses CSS to carry styling information. Information held in CSS can be translated into TTML Style elements, contained in the Styling section of the TTML Head section.
Translation of style information from WebVTT to TTML has to be done in two steps:
Translation of the CSS properties associated with the ::cue pseudo selector to <style>
elements with corresponding TTML style attributes and values.
References to these <style>
elements by one of the TTML content elements <body>
, <p>
or <span>
.
The following table contains the mapping of CSS properties to TTML style elements.
VTT CSS Property or Span Tag | TTML style attribute mapping |
---|---|
CSS property: background-attachment | - |
CSS property: background-color | - |
CSS property: background-image | - |
CSS property: background-position | - |
CSS property: background-repeat | - |
CSS property: color | color |
CSS property: font-family | fontFamily |
CSS property: font-size | fontSize |
CSS property: font-style | fontStyle |
CSS property: font-variant | - |
CSS property: font-weight | fontWeight |
CSS property: line-height | lineHeight |
CSS property: opacity | opacity |
CSS property: outline-color | textOutline |
CSS property: outline-color | textOutline |
CSS property: outline-style | textOutline |
CSS property: outline-width | textOutline |
CSS property: text-decoration | textDecoration |
CSS property: text-shadow | - |
CSS property: visibility | visibility |
Span tag: b | fontWeight |
Span tag: i | fontStyle |
Span tag: u | textDecoration |
If a WebVTT document references CSS using a ::cue pseudo element without arguments, a <style>
element should be created in the TTML <head>
section, to hold styling information. This <style>
element should then be referenced in the <body>
element. If there is no use of ::cue pseudo elements in the WebVTT document, the TTML <style>
element should be set according to the initial values that apply by default in WebVTT.
<style xml:id="bodyStyle" tts:color="rgba(255,255,255,255)" tts:fontFamily="sansSerif" ..../>
If there are no ::cue pseudo element without arguments applied to a WebVTT file, then all CSS properties that override the default values specified in WebVTT should also be set in the "bodyStyle" <body>
element.
Example: Translating CSS associated with VTT to TTML
This example begins with a css fragment that defines a ::cue.
::cue {
font-family: Verdana;
}
This information can be translated into a TTML <style>
element, with a synthesized id, to be used to reference in the <body>
element of the TTML document.
<style xml:id="bodyStyle" tts:color="rgba(255,255,255,255)" tts:fontFamily="Verdana" ..../>
For every ::cue pseudo element where the argument is a cue-id a corresponding <style>
element in TTML should be created and should be referenced by the <p>
element that corresponds to that cue.
Example CSS associated with VTT
::cue(#id1) {
font-family: Verdana;
}
Example VTT
id1
00:00:00.000 --> 00:00:02.000
Some text
Example TTML
....
<style xml:id="pStyleId1" tts:fontFamily="Verdana" ..../>
....
<p style="pStyleId1" begin="00:00:00.000" end="00:00:02.000" ..../>
For every ::cue pseudo element where the argument is a class name a corresponding <style>
element in TTML should be created and should be referenced by the <span>
element that corresponds to the tag in WebVTT that uses this classname.
Example
::cue(.cyanColor) {
font-color: cyan;
}
Example VTT
00:00:00.000 --> 00:00:02.000
<c.cyanColor>Some text
Example TTML
....
<style xml:id="cyanColor" tts:color="cyan" />
....
<span style="cyanColor">Some text</span>
Three <style>
elements should be created to map the WebVTT tags for bold, italic and underline:
Example TTML
<style xml:id="bold" tts:fontWeight="bold" />
<style xml:id="italic" tts:fontStyle="italic" />
<style xml:id="underline" tts:textDecoration="underline" />
The <style>
elements should be referenced by the <span>
elements that correspond to the span tags <b>
, <i>
and <u>
.
WebVTT offers fewer options for Timing Expressions than TTML and does not provide a means to hierarchically group cues. These restrictions simplify the process of mapping from WebVTT to TTML. In fact, it is possible to use WebVTT timing information without any transformation, provided the correct values are specified for timing parameters and attributes in the destination TTML file.
This section describes the best values to use for TTML timing parameters and attributes and then provides some conversion examples.
Setting TTML timing parameters to the following values will allow WebVTT cues to be transformed into TTML <p>
elements with no conversion of timing information required.
Parameter: Time Base
TTML Parameter | Value |
---|---|
ttp:timeBase | media |
Note:
TTML Attribute | Value |
---|---|
timeContainer | par |
Notes:
Example: WebVTT to TTML Conversion
This example begins with a short WebVTT file:
WEBVTT
00.00:00.000 --> 00.00:10.000
This caption starts at 0s and remains for 10s.
00.00:15.000 --> 00.00:20.000
This caption starts at 15s and remains for 5s.
Step 1: Convert from WebVTT to TTML syntax
Transform the cues into TTML <p>
elements.
<p begin="00:00:00.000" end="00:00:10.000">
This caption starts at 0s and remains for 10s.</p>
<p begin="00:00:15.000" end="00:00:20.000">
This caption starts at 15s and remains for 5s.</p>
Step 2: Add the TTML Hierarchical Elements
<body>
<div>
<p begin="00:00:00.000" end="00:00:10.000" >
This caption starts at 0s and remains for 10s.</p>
<p begin="00:00:15.000" end="00:00:20.000" >
This caption starts at 15s and remains for 5s.</p>
</div>
</body>
or, to be explicit, state the timeContainer attribute for the containing elements:
<body timeContainer="par">
<div timeContainer="par">
<p begin="00:00:00.000" end="00:00:10.000" >
This caption starts at 0s and remains for 10s.</p>
<p begin="00:00:15.000" end="00:00:20.000" >
This caption starts at 15s and remains for 5s.</p>
</div>
</body>