The TTML format (Timed Text Markup Language) is a W3C format intended for marking up external timed track resources.
A TTML file referenced by an HTML5 track element must consist of a
TTML file body and
is labeled with the MIME
type application/ttml+xml
.
Presenting TTML in HTML 5 consists of the following steps:
TTTML offers three mechanisms for defining the equivalent of HTML 5 inline style. The nested and referential styles of TTML being used to avoid having large numbers of repeated attributes on each element, and allow groups of styles to be applied all at once; however this is merely a shorthand mechanism and is entirely equivalent to HTML 5 inline styles. TTML does not define an applicative mode of style application itself, but does not preclude the use of a mechanism such as CSS additionally being used for this purpose, where TTML style application would all have the same specificity as inline style.
The Specified Style Set of properties is computed for each element:
style
attribute (referential
styling) are processed in the following manner:
For each style
element referenced by a style
attribute on the affected element and in the order specified in that
style
attribute; if the referenced style element is a descendant of a styling
element, merge the specified style set of the referenced
element into the specified style set of the affected element.style
element child of the affected element, and
in the specified order of child elements, merge the specified style set of the
child element, into the specified style set
of the affected element.style
attribute (inline
styling) are processed in the following manner:
For each style property expressed as a specified styling attribute of on the
affected element merge that property into the specified style set of the
affected element.
A set of TTML cue objects are constructed from the referenced TTML file by evaluating the TTML document instance at the TTML cue event times, that is, the set of time coordinates where some element becomes temporally active or inactive. The TTML document instance is mapped once for each time coordinate in the TTML cue event times to a list of TTML cue objects as defined below, each TTML cue object is then converted into an HTML 5 cue object.
Each region active at the TTML cue event time in the source TTML will map to one TTML cue object in the list. If there is no region specified in the TTML document instance, then the default region is used, and there will be at most one TTML cue object in the list.
Map the TTML source document to a set of event times by recursively walking the DOM tree annotating each node with its absolute begin and end times, based on the begin, end and dur attributes; and recursively for each of the nodes children. The initial time containement context is seq, and the initial reference start and end times are that of the media to which the timed track applies.
Compute time intervals for an element based on the time containment context, a reference start time and a reference end time in the following manner:
set computed start time and computed end time to the zero time Compute the beginning of the current element interval: set begin to the value of the "begin" attribute if present or the zero time otherwise set computed start time to the reference start time + begin; Compute the simple duration of the interval: (Note that par children have indefinite default duration, while seq children have zero default duration. indefinite is truncated to the reference end time) if the "dur" attribute is set and the "end" attribute is not set and the time container context is seq set referenceDur to the zero time else if computed start time is less than the reference end time set referenceDur to the reference end time - computed start time else set referenceDur to the zero time if the "dur" attribute is set set dur to the "dur" attribute value. if dur is greater than referenceDur set dur to referenceDur; else set dur to referenceDur; set computed end time to computed start time + dur; (note end can truncate the simple duration) set offsetEnd to the zero time and add reference start time if attribute "end" is set set end to reference start time + value of "end" attribute else set end to reference end time; set computed end time to the min of end and computed end time Compute the child nodes: if the time container context is par for each child element of the node: Compute Time Intervals for the child with par context, start time as computed start time and end time as computed end time else for each child element of the node: set s to computedStartTime Compute Time Intervals for the child with seq context, start time as s and end time as computed end time set s to the computed end time of child;
An element is temporally active at time t, if the computed start time of the element is less than or equal to t, and t is less than the computed end time of the element. The TTML cue event times are those times where some element changes state from temporally inactive to temporally active or vice versa; that is, the Set of computed begin and end times in the annotated tree placed in order.
Map the TTML source document to a set of active regions at each TTML cue event time as follows:
region
element replicate the
sub-tree of the source document headed by the body
element;
A content element is associated with a region according to the following ordered rules, where the first rule satisfied is used and remaining rules are skipped:
if the element specifies a
region
attribute, then the element is associated with
the region referenced by that attribute;
if some ancestor of that element specifies a
region
attribute,
then the element is associated with the region referenced by the most
immediate ancestor that specifies this attribute;
if the element contains a descendant element that specifies a
region
attribute,
then the element is associated with the region referenced by that
attribute;
if a default region was implied (due to the absence of any
region
element),
then the element is associated with the default region;
the element is not associated with any region.
An example of the processing steps described above is elaborated below, starting with Example Sample Source Document.
<tt tts:extent="640px 480px" xml:lang="en" xmlns="http://www.w3.org/ns/ttml" xmlns:tts="http://www.w3.org/ns/ttml#styling"> <head> <layout> <region xml:id="r1"> <style tts:origin="10px 100px"/> <style tts:extent="300px 96px"/> </region> <region xml:id="r2"> <style tts:origin="10px 300px"/> <style tts:extent="300px 96px"/> </region> </layout> </head> <body xml:id="b1"> <div xml:id="d1" begin="0s" dur="2s"> <p xml:id="p1" region="r1">Text 1</p> <p xml:id="p2" region="r2">Text 2</p> </div> <div xml:id="d2" begin="1s" dur="2s"> <p xml:id="p3" region="r2">Text 3</p> <p xml:id="p4" region="r1">Text 4</p> </div> </body> </tt> |
The event times for this document are 0s, 1s, 2s and 3s. The result of performing the processing described above for eah of these times will be an intermediate document containing a sequence of region elements; for example at media time of 0s the following intermediate document would be produced:
<region xml:id="r1" tts:origin="10px 100px" tts:extent="300px 96px" /> <body xml:id="b1"> <div xml:id="d1"> <p xml:id="p1">Text 1</p> </div> </body> </region> <region xml:id="r2" tts:origin="10px 300px" tts:extent="620px 96px" /> <body xml:id="b1"> <div xml:id="d1"> <p xml:id="p2">Text 2</p> </div> </body> </region> |
To support the timed track model of HTML, each region element in the intermediate document is converted to a timed track cue with the following assignments:
The timed track cue identifier
Is set to the value of xml:id of the region used to construct the cue, or "" if the default region is used.
The timed track cue pause-on-exit flag
Is set to false unless the html:pauseOnExit attribute is set anywhere in the region markup.
The timed track cue writing direction
Is set to the dominant writing direction used in the region markup if that is defined. "" otherwise.
The timed track cue snap-to-lines flag
Is set to false
The timed track cue line position
Is made equivalent to the y part of the origin of the region if set, 0 otherwise.
The timed track cue text position
Is made equivalent to the x part of the origin of the active region if set, 0 otherwise.
Is made equivalent to the x part of the extent of the active region if set, 0 otherwise. (height should be set similarly)
Set to zero.
The timed track cue voice identifier
Calculate the set of ttm:role attribute values used in the region markup. If it the set is a singleton set consisting of one of the following values, then the value of voice identifier is mapped as follows:
narration
then set voice to narrator
.music
then set voice to music
.lyrics
then set voice to lyric
.sound
then set voice to sound
.x-comment
then set voice to comment
.x-credit
then set voice to credit
.If value is not mapped above, or the set is multivalued, the value of voice is set to an integer; if the same set of roles is used in subsequent cues, then the same number shall be re-used, otherwise the number shall be unique over all cues in the track.
The body of the HTML5 cue is constructed from the markup of the region by converting
the TTML Intermediate Document Object Tree into a DOM
tree for the Document
owner.
User agents must create a DocumentFragment
node for each HTML 5 cue, and populate it with a tree of DOM nodes that is isomorphous to the
tree of TTML Intermediate Document Object Tree,
using the following mapping of TTML Intermediate Document Object to DOM nodes:
TTML Intermediate Document Object | DOM node |
---|---|
ttml:region element |
HTMLElement element node with
localName
"div "
and the
namespaceURI
set to the
HTML namespace.
|
ttml:body element |
HTMLElement element node with
localName
"div "
and the
namespaceURI
set to the
HTML namespace.
|
ttml:div element |
HTMLElement element node with
localName
"div "
and the
namespaceURI
set to the
HTML namespace.
|
ttml:p element |
HTMLElement element node with
localName
"p "
and the
namespaceURI
set to the
HTML namespace.
|
ttml:span element |
HTMLElement element node with
localName
"span "
and the
namespaceURI
set to the
HTML namespace.
|
ttml:set element | The Specified Style Set of properties of the element is merged into the The Specified Style Set of properties of its parent. |
ttml:br element |
HTMLElement element node with
localName
"br "
and the
namespaceURI
set to the
HTML namespace.
|
ttml:metadata node | If the TTML source domain is not the same as the referencing HTML domain, then ignore. Otherwise, if the metadata contains only text elements, append a "data-metadata" attribute to the HTMLElement element associated with the containing TTML node, whose character data is the text of the metadata node, otherwise process child nodes of the metadata element and add to the HTMLElement element associated with the containing TTML node in an XML Island. |
Anonymous span text | Text node whose character data is the text of the anonymous span. |
Elements in non ttml namespace | If the TTML source domain is the same as the referencing HTML domain, then copy the nodes in their existing namespace; otherwise ignore. (TBD) |
The ownerDocument
attribute of all nodes in the DOM tree must be set to the given
document owner.
For each HTMLElement in the document fragment constructed above; if the specified style set computed for the corresponding TTML element is not empty, create a CSSStyleDeclaration and add to it the styles as defined by the ordered rules below, finally add the CSSStyleDeclaration to the style attribute on the HTMLElement.
Map the following elements in the #metadata namespace to attributes on the parent HTMLElement as follows:
Map attributes in the #metadata namespace on the TTML DOM element to attributes on the HTMLElement as follows:
Copy xml:id, xml:lang attributes if present on the TTML DOM element to the HTMLElement as the id, and lang attributes.
The xml:space attribute on an element, if the value is 'preserve'; then the
content of the element should be contained within an
HTMLElement element node with
localName
"pre"
and the
namespaceURI
set to the
HTML namespace.
All characteristics of the DOM nodes that are not described above or dependent on characteristics defined above must be left at their initial values.
Continuing the above example, the HTML fragment equivalents for the two HTML cue objects will be as follows:
<div xml:id="r1" style="position:absolute; left:10px; top:100px; width:300px; height:96px" /> <div xml:id="b1"> <div xml:id="d1"> <p xml:id="p1">Text 1</p> </div> </div> </div> <div xml:id="r2" style="position:absolute; left:10px; top:300px; width:620px; height:96px" /> <div xml:id="b1"> <div xml:id="d1"> <p xml:id="p2">Text 2</p> </div> </div> </div> |
The mapping from TTML style values into HTML 5 is as follows:
color value
-
Convert the TTML color to its RGBA equivalent, and set value to CSS
rgba(R,G,B,A). [CSSCOLOR]direction value
-
map to like named CSS values.[CSS]display value
-
map none
to like named CSS value, map auto
to CSS block
if the
HTML element is not span or text node, CSS inline
otherwise.[CSS]display align value
-
map .[CSS3]font family value
-
copy value.[CSS]font size value
-
cell based values must be calculated from the content rectangle of the video containing
CSS box and denoted as CSS % metrics.
Only square font sizes are converted (any second size is ignored), and to like named metrics from CSS.[CSS]font style value
-
map to like named values from CSS.[CSS]font weight value
-
map to like named values from CSS.[CSS]height value
-
Use the second value in the extent pair, cell based values must be calculated from the content rectangle
of the video containing CSS box and denoted as CSS % metrics.left value
-
Use the first value in the origin pair, cell based values must be calculated from the content rectangle
of the video containing CSS box and denoted as CSS % metrics.line height value
-
map to like named values from CSS.[CSS]opacity value
-
map to like named values from CSS.[CSS3]overflow value
-
map to like named values from CSS.[CSS]padding value
-
map to like named values from CSS.[CSS] text align value
-
map left, center and right to like named values from CSS. If direction is ltr, map start and end to left and right
respectivley, if direction is rtl map start and end to right and left respectively. [CSS]text decoration value
-
do not map noUnderline noLineThrough or noOverline; map lineThrough to line-through, otherwise map to like named values from CSS.[CSS]text outline value
-
map to like named values from CSS3.[CSS3]top value
-
Use the second value in the origin pair, cell based values must be calculated from the content rectangle
of the video containing CSS box and denoted as CSS % metrics.bidi value
-
map bidiOverride to bidi-override, otherwise map to like named values from CSS.[CSS]visibility value
-
map to like named values from CSS.[CSS]width value
-
Use the first value in the extent pair, cell based values must be calculated from the content rectangle
of the video containing CSS box and denoted as CSS % metrics.writing mode value
-
map to like named values from CSS3.[CSS3]z value
-
The value is added to the z value of the media element that references
the track in such a way that the media element rendering area (including
any controls) will lie immediately behind the CSS boxes created for the cue
elements, and the next immediately higher CSS box in the HTML page will lie in
front of all CSS boxes created by cues.
Create a set of CSS boxes in relation to the rendering area of the media element as follows:
This example places a transcript beside the video element, containing a transcript; and highlights sentences in the transcript as the video plays.
<tt xml:lang="en" xmlns="http://www.w3.org/ns/ttml"> <body> <div> <p begin="00:00:21.99" dur="00:00:24.36"> <metadata><![CDATA[]]> { slide: intro.png, title: "Really Achieving Your Childhood Dreams" by Randy Pausch, Carnegie Mellon University, Sept 18, 2007 } <![CDATA[]]></metadata> </p> </div> </body> </tt>
Converted HTML fragment equivalent at time = 00:00:21:99
<div> <div> <p data-metadata="{ slide: intro.png, title: "Really Achieving Your Childhood Dreams" by Randy Pausch, Carnegie Mellon University, Sept 18, 2007 }" /> </div> </div>
<tt xml:lang="en" xmlns="http://www.w3.org/ns/ttml"> <body> <div> <p begin="00:00:21.99" dur="00:00:24.36"> This picture: <svg:svg xmlns:svg="http://www.w3.org/2001/XMLSchema-instance"> <rect></rect> </svg:svg> </p> </div> </body> </tt>
Converted HTML fragment equivalent at time = 00:00:21:99
<div> <div> <p> This picture: <svg:svg xmlns:svg="http://www.w3.org/2001/XMLSchema-instance"> <rect></rect> </svg:svg> </p> </div> </div>
<?xml version="1.0" encoding="utf-8" ?> <tt xml:lang="en" xmlns="http://www.w3.org/2006/10/ttaf1" xmlns:tts="http://www.w3.org/2006/10/ttaf1#style" xmlns:ttm="http://www.w3.org/2006/10/ttaf1#metadata"> <head> <metadata> <ttm:title>Ruby</ttm:title> <ttm:desc>Example of how to apply ruby using HTML 5</ttm:desc> <ttm:copyright>Copyright (C) 2007 W3C (MIT, ERIM, Keio). All Rights Reserved.</ttm:copyright> </metadata> <styling> <style xml:id="base" tts:color="blue" tts:fontSize="14px" tts:fontFamily="MS Gothic" tts:textAlign="center" /> <style xml:id="textStyle" style="base" tts:fontSize="32px" /> </styling> <layout> <region xml:id="r1" tts:origin="0px 30px" tts:extent="440px 32px" /> </layout> </head> <body > <div> <p region="r1" style="textStyle"> 頭を<ruby xmlns="http://www.w3.org/TR/ruby"> <rb>股</rb> <rt>また</rt> </ruby>に突つ込んで祈るわ </p> </div> </body> </tt>
Converted HTML fragment equivalent:
<div xml:id="r1" style="position: absolute; left: 0px; top: 30px; width: 440px; height: 32px;"> <div> <div> <p style="font-size: 32px; color: blue; font-family: 'MS Gothic'; text-align: center"> 頭を<ruby> <rt>また</rt> <rb>股</rb> </ruby>に突つ込んで祈るわ </p> </div> </div> </div> |
html:pauseOnExit
- this attribute in the
HTML namespace
if specified on a TTML element, is mapped to the HTML 5 cue attribute of the
same name. It causes the media progress to halt when the media playback position
is most nearly equal to the event time that the element containing the attribute
becomes active. It takes any string value (or none?) the value is ignored, any number of such attributes may be present
in a TTML cue. If no such attribute is present then the value mapped on the cue is false,
otherwise it is true.