1 Timed Text Markup Language (TTML) applied to HTML5

The TTML format (Timed Text Markup Language) is a W3C format intended for marking up external timed track resources.

1.1 Document format

A TTML file referenced by an HTML5 track element must consist of a TTML file body and is labeled with the MIME type application/ttml+xml.

1.2 Display of TTML in HTML 5

Presenting TTML in HTML 5 consists of the following steps:

1.2.1 TTML Style resolution

TTTML offers three mechanisms for defining the equivalent of HTML 5 inline style. The nested and referential styles of TTML being used to avoid having large numbers of repeated attributes on each element, and allow groups of styles to be applied all at once; however this is merely a shorthand mechanism and is entirely equivalent to HTML 5 inline styles. TTML does not define an applicative mode of style application itself, but does not preclude the use of a mechanism such as CSS additionally being used for this purpose, where TTML style application would all have the same specificity as inline style.

The Specified Style Set of properties is computed for each element:

1.2.2 TTML cue object construction

A set of TTML cue objects are constructed from the referenced TTML file by evaluating the TTML document instance at the TTML cue event times, that is, the set of time coordinates where some element becomes temporally active or inactive. The TTML document instance is mapped once for each time coordinate in the TTML cue event times to a list of TTML cue objects as defined below, each TTML cue object is then converted into an HTML 5 cue object.

Each region active at the TTML cue event time in the source TTML will map to one TTML cue object in the list. If there is no region specified in the TTML document instance, then the default region is used, and there will be at most one TTML cue object in the list.

1.2.3 Evaluating the TTML cue event times

Map the TTML source document to a set of event times by recursively walking the DOM tree annotating each node with its absolute begin and end times, based on the begin, end and dur attributes; and recursively for each of the nodes children. The initial time containement context is seq, and the initial reference start and end times are that of the media to which the timed track applies.

Compute time intervals for an element based on the time containment context, a reference start time and a reference end time in the following manner:

        set computed start time and computed end time to the zero time
            
       Compute the beginning of the current element interval:
          set begin to the value of the "begin" attribute if present or the zero time otherwise
          set computed start time to the reference start time + begin;

       Compute the simple duration of the interval:  
          (Note that par children have indefinite default duration, while seq children have 
          zero default duration. indefinite is truncated to the reference end time)
           
          if the "dur" attribute is set and the "end" attribute is not set and the time container context is seq
	      set referenceDur to the zero time          else
                if computed start time is less than the reference end time
                  set referenceDur to the reference end time - computed start time 
              else
	          set referenceDur to the zero time
              if the "dur" attribute is set
	          set dur to the "dur" attribute value.
              if dur is greater than referenceDur
                  set dur to referenceDur;
              else
                  set dur to referenceDur;
              
          set computed end time to computed start time + dur;
           
          (note end can truncate the simple duration)
          set offsetEnd to the zero time and add reference start time

          if attribute "end" is set
              set end to reference start time + value of "end" attribute
          else
              set end to reference end time;

          set computed end time to the min of end and computed end time 
    
    Compute the child nodes:
          if the time container context is par
              for each child element of the node:
                  Compute Time Intervals for the child with par context, start time 
                  as computed start time and end time as computed end time 
          else
              for each child element of the node:
                  set s to computedStartTime 
                  Compute Time Intervals for the child with seq context, start time 
                  as s and end time as computed end time 
                  set s to the computed end time of child;

An element is temporally active at time t, if the computed start time of the element is less than or equal to t, and t is less than the computed end time of the element. The TTML cue event times are those times where some element changes state from temporally inactive to temporally active or vice versa; that is, the Set of computed begin and end times in the annotated tree placed in order.

1.2.4 Evaluating the TTML document instance at event time

Map the TTML source document to a set of active regions at each TTML cue event time as follows:

  1. For each temporally active region element replicate the sub-tree of the source document headed by the body element;
  2. Evaluating this sub-tree in a post-order traversal, prune elements if they are: not a content element, if they are temporally inactive, if they are empty, or if they aren't associated with the current active region;
  3. If the pruned sub-tree is non-empty, then reparent it to the current active region element and add the current active region to the output list.

A content element is associated with a region according to the following ordered rules, where the first rule satisfied is used and remaining rules are skipped:

  1. if the element specifies a region attribute, then the element is associated with the region referenced by that attribute;

  2. if some ancestor of that element specifies a region attribute, then the element is associated with the region referenced by the most immediate ancestor that specifies this attribute;

  3. if the element contains a descendant element that specifies a region attribute, then the element is associated with the region referenced by that attribute;

  4. if a default region was implied (due to the absence of any region element), then the element is associated with the default region;

  5. the element is not associated with any region.

Example

An example of the processing steps described above is elaborated below, starting with Example Sample Source Document.

Example Sample Source Document
<tt tts:extent="640px 480px" xml:lang="en"
  xmlns="http://www.w3.org/ns/ttml"
  xmlns:tts="http://www.w3.org/ns/ttml#styling">
  <head>
    <layout>
      <region xml:id="r1">
        <style tts:origin="10px 100px"/>
        <style tts:extent="300px 96px"/>
      </region>
      <region xml:id="r2">
        <style tts:origin="10px 300px"/>
        <style tts:extent="300px 96px"/>
      </region>
    </layout>
  </head>
  <body xml:id="b1">
    <div xml:id="d1" begin="0s" dur="2s">
      <p xml:id="p1" region="r1">Text 1</p>
      <p xml:id="p2" region="r2">Text 2</p>
    </div>
    <div xml:id="d2" begin="1s" dur="2s">
      <p xml:id="p3" region="r2">Text 3</p>
      <p xml:id="p4" region="r1">Text 4</p>
    </div>
  </body>
</tt>

The event times for this document are 0s, 1s, 2s and 3s. The result of performing the processing described above for eah of these times will be an  intermediate document containing a sequence of region elements; for example at media time of 0s the following intermediate document would be produced:

Example Intermediate Document at 0s
          <region xml:id="r1" 
	       tts:origin="10px 100px" 
	       tts:extent="300px 96px" /> 
	         <body xml:id="b1"> 
	         <div xml:id="d1"> 
	         <p xml:id="p1">Text 1</p>
	     </div>
	    </body> 
	  </region>
	  <region xml:id="r2"
	        tts:origin="10px 300px" 
	        tts:extent="620px 96px" /> 
	          <body xml:id="b1"> 
	          <div xml:id="d1"> 
	          <p xml:id="p2">Text 2</p> 
	      </div>
	    </body> 
	  </region>
	  	  

1.2.5 TTML cue to HTML cue construction rules

To support the timed track model of HTML, each region element in the intermediate document is converted to a timed track cue with the following assignments:

The timed track cue identifier

Is set to the value of xml:id of the region used to construct the cue, or "" if the default region is used.

The timed track cue pause-on-exit flag

Is set to false unless the html:pauseOnExit attribute is set anywhere in the region markup.

The timed track cue writing direction

Is set to the dominant writing direction used in the region markup if that is defined. "" otherwise.

The timed track cue snap-to-lines flag

Is set to false

The timed track cue line position

Is made equivalent to the y part of the origin of the region if set, 0 otherwise.

The timed track cue text position

Is made equivalent to the x part of the origin of the active region if set, 0 otherwise.

The timed track cue size

Is made equivalent to the x part of the extent of the active region if set, 0 otherwise. (height should be set similarly)

The timed track cue alignment

Set to zero.

The timed track cue voice identifier

Calculate the set of ttm:role attribute values used in the region markup. If it the set is a singleton set consisting of one of the following values, then the value of voice identifier is mapped as follows:

If value is not mapped above, or the set is multivalued, the value of voice is set to an integer; if the same set of roles is used in subsequent cues, then the same number shall be re-used, otherwise the number shall be unique over all cues in the track.

1.2.6 TTML cue text DOM construction rules

The body of the HTML5 cue is constructed from the markup of the region by converting the TTML Intermediate Document Object Tree into a DOM tree for the Document owner. User agents must create a DocumentFragment node for each HTML 5 cue, and populate it with a tree of DOM nodes that is isomorphous to the tree of TTML Intermediate Document Object Tree, using the following mapping of TTML Intermediate Document Object to DOM nodes:

TTML Intermediate Document Object DOM node
ttml:region element HTMLElement element node with localName "div" and the namespaceURI set to the HTML namespace.
ttml:body element HTMLElement element node with localName "div" and the namespaceURI set to the HTML namespace.
ttml:div element HTMLElement element node with localName "div" and the namespaceURI set to the HTML namespace.
ttml:p element HTMLElement element node with localName "p" and the namespaceURI set to the HTML namespace.
ttml:span element HTMLElement element node with localName "span" and the namespaceURI set to the HTML namespace.
ttml:set element The Specified Style Set of properties of the element is merged into the The Specified Style Set of properties of its parent.
ttml:br element HTMLElement element node with localName "br" and the namespaceURI set to the HTML namespace.
ttml:metadata node If the TTML source domain is not the same as the referencing HTML domain, then ignore. Otherwise, if the metadata contains only text elements, append a "data-metadata" attribute to the HTMLElement element associated with the containing TTML node, whose character data is the text of the metadata node, otherwise process child nodes of the metadata element and add to the HTMLElement element associated with the containing TTML node in an XML Island.
Anonymous span text Text node whose character data is the text of the anonymous span.
Elements in non ttml namespace If the TTML source domain is the same as the referencing HTML domain, then copy the nodes in their existing namespace; otherwise ignore. (TBD)

The ownerDocument attribute of all nodes in the DOM tree must be set to the given document owner.

For each HTMLElement in the document fragment constructed above; if the specified style set computed for the corresponding TTML element is not empty, create a CSSStyleDeclaration and add to it the styles as defined by the ordered rules below, finally add the CSSStyleDeclaration to the style attribute on the  HTMLElement.

  1. if the specified set contains the property backgroundColor call setProperty with propertyName="background-color", value=<color value>, priority="".
  2. if the specified set contains the property tts:color, call setProperty with propertyName="color", value=<color value>, priority="".
  3. if the specified set contains the property tts:direction, call setProperty with propertyName="direction", value=<direction value>, priority="".
  4. if the specified set contains the property tts:display, call setProperty with propertyName="display", value="", priority="".
  5. if the specified set contains the property tts:displayAlign, call setProperty with propertyName="", value="", priority="".
  6. (CSS3)
  7. if the specified set contains the property tts:extent and the TTML element was region, call setProperty with: propertyName="width", value=<width value>, priority="" and propertyName="height", value=<height value>, priority="". If extent is not set and the TTML element was region (e.g. the region is the default region), height and width of the div will be auto.
  8. if the specified set contains the property tts:fontFamily, call setProperty with propertyName="font-family", value=<font family value>, priority="".
  9. if the specified set contains the property tts:fontSize, call setProperty with propertyName="font-size", value=<font size value>, priority="".
  10. if the specified set contains the property tts:fontStyle, call setProperty with propertyName="font-style", value=<font style value>, priority="".
  11. if the specified set contains the property tts:fontWeight, call setProperty with propertyName="font-weight", value=<font weight value>, priority="".
  12. if the specified set contains the property tts:lineHeight, call setProperty with propertyName="line-height", value=<line height value>, priority="".
  13. if the specified set contains the property tts:opacity and the TTML element was region, call setProperty with propertyName="", value="", priority="" (CSS3).
  14. if the specified set contains the property tts:origin and the TTML element was region, call setProperty with: propertyName="position", value="absolute", priority="", propertyName="left", value=<left value>, priority="" and propertyName="top", value=<top value>, priority="".
  15. if the specified set contains the property tts:overflow and the TTML element was region, call setProperty with propertyName="", value="", priority="".
  16. if the specified set contains the property tts:padding and the TTML element was region, call setProperty with propertyName="padding", value=<padding value>, priority="".
  17. if the specified set contains the property tts:showBackground and the TTML element was region, then if the div has no children call setProperty with propertyName="background-color", value="transparent", priority="".
  18. if the specified set contains the property tts:textAlign, call setProperty with propertyName="text-align", value=<text align value>, priority="".
  19. if the specified set contains the property tts:textDecoration, call setProperty with propertyName="text-decoration", value=<text decoration value>, priority="".
  20. if the specified set contains the property tts:textOutline,  call setProperty with propertyName="", value="", priority="" (CSS3).
  21. if the specified set contains the property tts:unicodeBidi,  call setProperty with propertyName="unicode-bidi", value=<bidi value>, priority="".
  22. if the specified set contains the property tts:visibility, call setProperty with propertyName="visibility", value=<visibility value>, priority="".
  23. if the specified set contains the property tts:wrapOption with value noWrap, call setProperty with propertyName="whitespace", value="nowrap", priority="". (TBD)
  24. if the specified set contains the property tts:writingMode, call setProperty with propertyName="writing-mode", value=<writing mode value>, priority="" (CSS3).
  25. if the specified set contains the property tts:zIndex and the TTML element was region, call setProperty with propertyName="z-index", value=<z value>, priority="".

Map the following elements in the #metadata namespace to attributes on the parent HTMLElement as follows:

  1. ttm:title : copy text content to the title attribute

Map attributes in the #metadata namespace on the TTML DOM element to attributes on the HTMLElement as follows:

  1. ttm:agent : add the value of this attribute to the class attribute.
  2. ttm:role : add the value of this attribute to the class attribute.

Copy xml:id, xml:lang attributes if present on the TTML DOM element to the HTMLElement as the id, and lang attributes.

The xml:space attribute on an element, if the value is 'preserve'; then the content of the element should be contained within an  HTMLElement element node with localName "pre" and the namespaceURI set to the HTML namespace.

All characteristics of the DOM nodes that are not described above or dependent on characteristics defined above must be left at their initial values.

Continuing the above example, the HTML fragment equivalents for the two HTML cue objects will be as follows:

Example HTML Fragments Output
          <div xml:id="r1" 
	       style="position:absolute; left:10px; top:100px;
	              width:300px; height:96px" /> 
	         <div xml:id="b1"> 
	            <div xml:id="d1"> 
	               <p xml:id="p1">Text 1</p>
	            </div>
	         </div> 
	  </div>

	  <div xml:id="r2"
	        style="position:absolute; left:10px; top:300px;
	        width:620px; height:96px" /> 
	        <div xml:id="b1"> 
	           <div xml:id="d1"> 
	              <p xml:id="p2">Text 2</p> 
	           </div>
	        </div> 
	  </div>
Style values

The mapping from TTML style values into HTML 5 is as follows:

1.3 Rendering Rules

Create a set of CSS boxes in relation to the rendering area of the media element as follows:

  1. If the media element is a playback mechanism with no rendering area, abort these steps. There is nothing to render.
  1. Let video be the media element or other playback mechanism
  2. let textArea be a CSS containing block whose containing block is the rendering area for video, set the writing mode (CSS3) for textArea to lr-bt.
  3. Let tracks be the subset of video's list of timed tracks that have as their rules for updating the timed track rendering these rules, and whose timed track mode is showing.
  4. Let cues be an empty list of timed track cues.
  5. For each track in tracks, append to cues all the cues computed as above for each each TTML cue event time.
  6. For each timed track cue that is active, run the following substeps:
    1. Let nodes be the HTML fragment computed for the cue.
    2. Apply the terms of the CSS specifications to nodes to obtain a set  CSS boxes relative to the CSS box created for the root div element in the HTML fragment, which is in turn relative to textArea: [CSS].
    3. Add the CSS boxes in boxes to display.
    .

1.4 Examples

1.4.1 Illuminated Transcript Example

This example places a transcript beside the video element, containing a transcript; and highlights sentences in the transcript as the video plays.

1.4.1 Metadata Example

<tt xml:lang="en" xmlns="http://www.w3.org/ns/ttml">
  <body>
      <div>
        <p begin="00:00:21.99" dur="00:00:24.36">
          <metadata><![CDATA[]]>
          {
            slide: intro.png,
            title: "Really Achieving Your Childhood Dreams" by Randy Pausch,
            Carnegie Mellon University, Sept 18, 2007
          }           
          <![CDATA[]]></metadata>
        </p>
      </div>
  </body>
</tt>

Converted HTML fragment equivalent at time = 00:00:21:99

<div>
 <div>
  <p data-metadata="{
     slide: intro.png,
     title: "Really Achieving Your Childhood Dreams" by Randy Pausch,
     Carnegie Mellon University, Sept 18, 2007
   }" />
 </div>
</div>

1.4.2 SVG Object embedding

<tt xml:lang="en" xmlns="http://www.w3.org/ns/ttml">
 <body>
  <div>
   <p begin="00:00:21.99" dur="00:00:24.36">
      This picture:
      <svg:svg xmlns:svg="http://www.w3.org/2001/XMLSchema-instance">
      <rect></rect>
    </svg:svg>
   </p>
  </div>
 </body>
</tt>

Converted HTML fragment equivalent at time = 00:00:21:99

<div>
 <div>
  <p>
   This picture:
   <svg:svg xmlns:svg="http://www.w3.org/2001/XMLSchema-instance">
     <rect></rect>
   </svg:svg>
  </p>
 </div>
</div>

1.4.2 Ruby

<?xml version="1.0" encoding="utf-8" ?>
<tt xml:lang="en" xmlns="http://www.w3.org/2006/10/ttaf1" xmlns:tts="http://www.w3.org/2006/10/ttaf1#style" xmlns:ttm="http://www.w3.org/2006/10/ttaf1#metadata">
 <head>
  <metadata>
    <ttm:title>Ruby</ttm:title>
    <ttm:desc>Example of how to apply ruby using HTML 5</ttm:desc>
    <ttm:copyright>Copyright (C) 2007 W3C (MIT, ERIM, Keio). All Rights Reserved.</ttm:copyright>
  </metadata>
  <styling>
    <style xml:id="base" tts:color="blue" tts:fontSize="14px" tts:fontFamily="MS Gothic" tts:textAlign="center" />
    <style xml:id="textStyle" style="base" tts:fontSize="32px" />
  </styling>
  <layout>
   <region xml:id="r1" tts:origin="0px 30px" tts:extent="440px 32px" />
  </layout>
 </head>
 <body >
  <div>
   <p region="r1" style="textStyle">
      頭を<ruby xmlns="http://www.w3.org/TR/ruby">
      <rb>股</rb>
      <rt>また</rt>
    </ruby>に突つ込んで祈るわ
   </p>
  </div>
 </body>
</tt>

Converted HTML fragment equivalent:

HTML Fragment

<div xml:id="r1" 
     style="position: absolute; left: 0px; top: 30px; 
            width: 440px; height: 32px;">
 <div>
  <div>
   <p style="font-size: 32px; color: blue; 
             font-family: 'MS Gothic'; 
             text-align: center">
     頭を<ruby>
     <rt>また</rt>
     <rb>股</rb>
    </ruby>に突つ込んで祈るわ
   </p>
  </div>
 </div>
</div>
	  

1.5 HTML5 TTML Profile

1.5.1 Additional attributes

html:pauseOnExit - this attribute in the HTML namespace if specified on a TTML element, is mapped to the HTML 5 cue attribute of the same name. It causes the media progress to halt when the media playback position is most nearly equal to the event time that the element containing the attribute becomes active. It takes any string value (or none?) the value is ignored, any number of such attributes may be present in a TTML cue. If no such attribute is present then the value mapped on the cue is false, otherwise it is true.