Distributed Extensibility Submission from Microsoft - 30 September 2009

Please address feedback to the HTML Working Group mailing list (public-html@w3.org).

Why Distributed Extensibility Matters

HTML is a standard language used to mark-up hypertext documents. While it is most commonly thought of as the language used to define the browseable web, the set of user agents that process HTML is broader than web browsers. Ensuring interoperability between different user agents is the key goal of a HTML standard.

It is a common practice for authors, tool vendors, and library authors to want to extend languages to represent additional information that can't be adequately described by the standard grammar. This might be used to preserve metadata used by one tool in a chain of operations. It might be actual data to be processed by a user agent as an extension to the standard processing. Here are a few examples that apply to HTML:

A HTML document editor adds information about tool settings so that a subsequent editing session can continue with the same settings.
A JavaScript library processes custom tags in a browser and turns them into custom controls dynamically on the page.
A browser wants to allow custom behaviors to be defined in one module and attached automatically to custom elements.
An author includes processing instructions in the document that will be processed by a server before delivering the document to a user agent.
An author runs a tool on a document to add numbering to headings and a table of contents. Running this tool leaves custom metadata tags intact.

In many cases, the language customizations are used for small niche applications and don't require the burden of centralized standardization. Instead these extensions are defined in a distributed fashion among groups of interested developers or authors. Supporting distributed extensibility means providing a standard repeatable mechanism for creating these extensions without the need for centralized agreement.

When processing HTML documents, it is common for a set of user agents to be combined into a tool chain, each performing different operations on the same document. A Distributed Extensibility model for standard HTML is desirable because it means that user agents from different vendors that adhere to the standard can be assured of correctly processing mark-up that contains extensions without destroying the integrity of the document.

Proposed Solution: Namespaces in HTML Markup

A solution already exists in XML and the DOM: namespaces. However, namespaces should not be blindly adopted into HTML. Given the state of content on the web today the potential for serious compatibility issues exist with any addition to the language. This is especially true for features that are already part of another language, such as namespaces. Existing content could easily have been generated by simply copying from one document to another without regard for correctness.

Using research data gathered by Microsoft, we identified a number of these concerns and this proposal was altered to avoid serious issues. The proposal has also been broken into multiple components so parts of it can be adopted by themselves if necessary. Even so, continued scrutiny is encouraged to uncover other potential risks to compatibility with current web content.

1: Base Proposal

The base proposal is mostly a subset of the namespace support implemented in Internet Explorer. Support for the DOM feature of inserting namespaced elements into HTML is also broadly supported across other browsers today. The proposal allows namespace prefix declarations and prefixed element names to be included. In contrast to current IE behavior, it allows these declarations on any element and not just the root element.

1.1: Description

Allow xmlns:prefix="namespace" to bind a prefix to a namespace
Scoping for bound prefixes follows the rules defined in Namespaces in XML 1.0 (http://www.w3.org/TR/REC-xml-names/)
Elements and attributes with names of the form prefix:localname are assigned the namespace bound to the specified prefix
1. For elements in the HTML, SVG, and MathML namespaces, the appropriate element type is created based upon the localname.
2. For other namespaces, a generic instance of Element is created.
If an element or attribute with a name of the form prefix:localname is used with an unbound prefix, no special handling is applied
1. (See Optional Component 2 for an alternative to this behavior)

1.2: Examples

HTML Markup:

     <my:calendar xmlns:my="com.mycompany">

DOM:

     Element {
          localName    = "calendar",
          nodeName     = "my:calendar",
          prefix       = "my",
          namespaceURI = "com.mycompany"
     }

HTML Markup:

     <div xmlns:my="com.mycompany" my:data="foobar">

DOM for attribute "my:data":

     Attr {
          localName    = "data",
          nodeName     = "my:data",
          prefix       = "my",
          namespaceURI = "com.mycompany"
     }

1.3: Risks

Compatibility with Existing Web Content

The proposal as stated closely matches behavior that Internet Explorer has had for a number of releases, reducing compatibility concerns. While true that Internet Explorer does not currently allow prefixes to be defined anywhere other than the root element in the document, lifting this restriction is not believed to present any significant compatibility risk.

Using Element instead of HTMLUnknownElement

Representing prefixed elements as instances of Element instead of HTMLUnknownElement means that a few HTML specific members such as innerHTML will not be available. This could create compatibility issues for script expecting innerHTML and the like to be present on all elements.

Interaction with HTML+RDFa

This proposal should not negatively impact HTML+RDFa because attributes of the form "xmlns:prefix" will remain in the DOM.

1.4: Benefits

Regardless of the syntax used, mapping distributed extensibility to namespaces in the DOM has some key advantages.

Simplify Migration of Existing XML Content to HTML

One of the most notable benefits of adopting this approach is an increased level of success for those who try to copy content from an XML file into an HTML file. Specifically, fewer differences exist between the DOM that results from parsing a snippet of code as HTML vs. XML. This is particularly relevant in the case of SVG files where pieces of metadata markup have been inserted by various tools and authors.

Scripting

Querying for elements specific to a particular extension becomes cleaner.

     var myCustomElements =
          document.getElementsByTagNameNS("com.mycompany", "*");

Without mapping to namespaces, some form of filtering must be performed by script.

     var elms = document.getElementsByTagName("*");
     for(var i = 0; i < elms.length; i++)
     {
           if(elms[i].tagName.indexOf("my:") == 0)
           {
                // Do some work
           }
     }

CSS

Applying styles to a collection of extended elements is also quite simple.

     @namespace my "com.mycompany";
     my|*
     {
           display: none;
     }

This scenario becomes even more complex than the script example without the ability to assign namespaces. One way developers may work around this is by setting a special attribute or class name on all desired target elements.

2: Optional Component 1 - Default Namespaces

2.1: Description

Allow xmlns="namespace" to be used to declare a default namespace which will be applied to elements without a prefix
Scoping for default namespaces follows the rules defined in Namespaces in XML 1.0 (http://www.w3.org/TR/REC-xml-names/)
EXCEPTION: Default namespace declarations are ignored on the root <html> element. This is for compatibility - many documents declare the XHTML namespace on the root element, some incorrectly.

2.2: Examples

HTML Markup:

     <calendar xmlns="com.mycompany">

DOM:

     Element {
          localName    = "calendar",
          nodeName     = "calendar",
          prefix       = null,
          namespaceURI = "com.mycompany"
     }

HTML Markup:

     <div xmlns="com.mycompany">

DOM:

     Element {
          localName    = "div",
          nodeName     = "div",
          prefix       = null,
          namespaceURI = "com.mycompany"
     }

HTML Markup:

     <html xmlns="com.mycompany">
           <div>Test</div>
     </html>

DOM for <html>:

     HTMLHtmlElement {
          localName    = "html",
          nodeName     = "html",
          prefix       = null,
          namespaceURI = "http://www.w3.org/1999/xhtml"
     }

DOM for <div>:

     HTMLDivElement {
          localName    = "div",
          nodeName     = "div",
          prefix       = null,
          namespaceURI = "http://www.w3.org/1999/xhtml"
     }

2.3: Risks

Compatibility with Existing Web Content

This optional component is known to break some pages on the web. Affected pages will be those that assign a non-XHTML namespace to an element with a known HTML name, such as script, and expect it to continue to exhibit its HTML-specific behavior. The biggest risk is default namespace declarations on the <html> element, which Opera's MAMA database shows occurs on roughly 20% of the web. While most of these declarations actually are the XHTML namespace, the risk is avoided altogether by disallowing default namespace declarations on the <html> element.

Opera's MAMA data shows fewer than 10,000 out of 3.5 million pages use xmlns="something" on known HTML elements.

We researched data gathered from Bing looking at sites using xmlns="something-other-than-the-XHTML-namespace". About 20% of the cases we saw were using the rdf: prefix. The majority of others were on elements like <span> that have no inherent behavior and where the compatibility risk would be lower. Again, further research is encouraged.

The compatibility issues generated here is small and comparable to usage data used to support other breaking changes made for HTML5.

2.4: Benefits

The main benefit of including support for default namespaces is that it broadens the set of existing XML content that can be re-used in HTML document. It is much more common for existing XML documents to be written using a default namespace rather than prefixing every element.

3: Optional Component 2 - Unbound Prefixes Become Namespaces

3.1: Description

Allow elements and attributes with names of the form "prefix:localname" to be created in the namespace "prefix" if no namespace has been bound to "prefix".
In essence, all prefixes are pre-bound to the namespace with the same name.
In other words <foo:bar> is equivalent to <foo:bar xmlns:foo="foo">

3.2: Examples

HTML Markup:

     <foo:bar>

DOM:

     Element {
          localName    = "bar",
          nodeName     = "foo:bar",
          prefix       = "foo",
          namespaceURI = "foo"
     }

HTML Markup:

     <com.mycompany:calendar>

DOM:

     Element {
          localName    = "calendar",
          nodeName     = "com.mycompany:calendar",
          prefix       = "com.mycompany",
          namespaceURI = "com.mycompany"
     }

3.3: Risks

This is a completely new syntax. We don't currently have data that indicates what level of compatibility risk this would pose and investigation here would be required. This syntax increases the likelihood of differences between Element and HTMLUnknownElement (lack of innerHTML, etc.) causing a compatibility issue.

3.4: Benefits

This is purely a simplification for the benefit of developers. This allows developers that are not as concerned with naming conflicts to avoid some of the complexities of traditional namespace support.

4: Optional Component 3 - Short Namespaces

4.1: Description

Define the following short namespaces for namespaces in the HTML 5 spec
1. html - Designates HTML Elements
2. math - Designates MathML Elements
3. svg - Designates SVG Elements
4. xlink - Designates XLink Attributes
5. xml - Designates XML Attributes
Others could be added as desired

4.2: Examples

4.2.1: Example Usage as Namespaces

HTML Markup:

     <select xmlns="html">

DOM:

     HTMLSelectElement {
           localName    = "select",
           nodeName     = "select",
           prefix       = null,
           namespaceURI = "html"
     }

HTML Markup:

     <mrow xmlns="math">

DOM:

     MathMLPresentationContainer {
           localName    = "mrow",
           nodeName     = "mrow",
           prefix       = null,
           namespaceURI = "math"
     }

HTML Markup:

     <circle xmlns="svg">

DOM:

     SVGCircleElement {
           localName    = "circle",
           nodeName     = "circle",
           prefix       = null,
           namespaceURI = "svg"
     }

4.2.2: Example Usage as Unbound Prefixes (Relies on Optional Component 2)

HTML Markup:

     <html:select>

DOM:

     HTMLSelectElement {
           localName    = "select",
           nodeName     = "html:select",
           prefix       = "html",
           namespaceURI = "html"
     }

HTML Markup:

     <math:mrow>

DOM:

     MathMLPresentationContainer {
           localName    = "mrow",
           nodeName     = "math:mrow",
           prefix       = "math",
           namespaceURI = "math"
     }

HTML Markup:

     <svg:circle>

DOM:

     SVGCircleElement {
           localName    = "circle",
           nodeName     = "svg:circle",
           prefix       = "svg",
           namespaceURI = "svg"
     }

HTML Markup:

     <svg:a xlink:href="test">

DOM <svg:a>:

     SVGAElement {
           localName    = "a",
           nodeName     = "svg:a",
           prefix       = "svg",
           namespaceURI = "svg"
     }

DOM xlink:href:

     Attr {
           localName    = "href",
           nodeName     = "xlink:href",
           prefix       = "xlink",
           namespaceURI = "xlink"
     }

4.3: Risks

The main risk here is that there are now two different strings that represent the same namespace. Authors will typically have control over the namespace string they choose to use. Library developers will likely have to check for both the short name and the URI of each namespace they are concerned about.

This has unknown compatibility risk but is presumed to be fairly low. A problem would occur if an author used one of these well-known prefixes to mean a namespace other than the one typically intended.

4.4: Benefits

The most commonly used aspects of HTML5 would have namespace values that are much easier to remember and deal with than the URIs used in XML.

5: Evaluation Against HTML Design Principles

We have evaluated this proposal against the HTML Design Principles (http://www.w3.org/TR/html-design-principles/).

5.1: Compatibility

Namespace declarations and prefixed elements have been supported by Internet Explorer since IE5 and are used by a number of applications.
In general, without namespace support prefixed elements will be handled as unknown elements in HTML. This is simple to check for in script.
This proposal is an application of the existing Recommendation for Namespaces in XML but applied pragmatically to HTML.
Data has been used to research the impact on compatibility. Further research is encouraged.
The proposal is made in a modular way so that any components causing unacceptable compatibility concerns could be abandoned.

5.2: Utility

The introduction to this document outlines the need for distributed extensibility. The proposal provides a solution for this requirement.
By reusing existing DOM mechanisms, script and CSS support for extensions is simplified for authors.
This approach improves DOM Consistency between HTML and XHTML. This allows increased support for a round-trip from script created DOM elements (with namespaces) to HTML and back.

5.3: Interoperability

Support for DOM namespaces is implemented by many existing user agents. This would give a head start for interoperability between implementations.