Editor: Tony Ross (tross@microsoft.com)
Please address feedback to the HTML Working Group mailing list (public-html@w3.org).
HTML is a standard language used to mark-up hypertext documents. While it is most commonly thought of as the language used to define the browseable web, the set of user agents that process HTML is broader than web browsers. Ensuring interoperability between different user agents is the key goal of a HTML standard.
It is a common practice for authors, tool vendors, and library authors to want to extend languages to represent additional information that can't be adequately described by the standard grammar. This might be used to preserve metadata used by one tool in a chain of operations. It might be actual data to be processed by a user agent as an extension to the standard processing. Here are a few examples that apply to HTML:
In many cases, the language customizations are used for small niche applications and don't require the burden of centralized standardization. Instead these extensions are defined in a distributed fashion among groups of interested developers or authors. Supporting distributed extensibility means providing a standard repeatable mechanism for creating these extensions without the need for centralized agreement.
When processing HTML documents, it is common for a set of user agents to be combined into a tool chain, each performing different operations on the same document. A Distributed Extensibility model for standard HTML is desirable because it means that user agents from different vendors that adhere to the standard can be assured of correctly processing mark-up that contains extensions without destroying the integrity of the document.
A solution already exists in XML and the DOM: namespaces. However, namespaces should not be blindly adopted into HTML. Given the state of content on the web today the potential for serious compatibility issues exist with any addition to the language. This is especially true for features that are already part of another language, such as namespaces. Existing content could easily have been generated by simply copying from one document to another without regard for correctness.
Using research data gathered by Microsoft, we identified a number of these concerns and this proposal was altered to avoid serious issues. The proposal has also been broken into multiple components so parts of it can be adopted by themselves if necessary. Even so, continued scrutiny is encouraged to uncover other potential risks to compatibility with current web content.
The base proposal is mostly a subset of the namespace support implemented in Internet Explorer. Support for the DOM feature of inserting namespaced elements into HTML is also broadly supported across other browsers today. The proposal allows namespace prefix declarations and prefixed element names to be included. In contrast to current IE behavior, it allows these declarations on any element and not just the root element.
xmlns:prefix="namespace"
to bind a prefix to a namespaceprefix:localname
are assigned the namespace bound to the specified prefix
prefix:localname
is used with an unbound prefix, no special handling is applied
HTML Markup:
<my:calendar xmlns:my="com.mycompany">
DOM:
Element { localName = "calendar", nodeName = "my:calendar", prefix = "my", namespaceURI = "com.mycompany" }
HTML Markup:
<div xmlns:my="com.mycompany" my:data="foobar">
DOM for attribute "my:data":
Attr { localName = "data", nodeName = "my:data", prefix = "my", namespaceURI = "com.mycompany" }
The proposal as stated closely matches behavior that Internet Explorer has had for a number of releases, reducing compatibility concerns. While true that Internet Explorer does not currently allow prefixes to be defined anywhere other than the root element in the document, lifting this restriction is not believed to present any significant compatibility risk.
Representing prefixed elements as instances of Element instead of HTMLUnknownElement means that a few HTML specific members such as innerHTML will not be available. This could create compatibility issues for script expecting innerHTML and the like to be present on all elements.
This proposal should not negatively impact HTML+RDFa because attributes of the form "xmlns:prefix" will remain in the DOM.
Regardless of the syntax used, mapping distributed extensibility to namespaces in the DOM has some key advantages.
One of the most notable benefits of adopting this approach is an increased level of success for those who try to copy content from an XML file into an HTML file. Specifically, fewer differences exist between the DOM that results from parsing a snippet of code as HTML vs. XML. This is particularly relevant in the case of SVG files where pieces of metadata markup have been inserted by various tools and authors.
Querying for elements specific to a particular extension becomes cleaner.
var myCustomElements = document.getElementsByTagNameNS("com.mycompany", "*");
Without mapping to namespaces, some form of filtering must be performed by script.
var elms = document.getElementsByTagName("*"); for(var i = 0; i < elms.length; i++) { if(elms[i].tagName.indexOf("my:") == 0) { // Do some work } }
Applying styles to a collection of extended elements is also quite simple.
@namespace my "com.mycompany"; my|* { display: none; }
This scenario becomes even more complex than the script example without the ability to assign namespaces. One way developers may work around this is by setting a special attribute or class name on all desired target elements.
HTML Markup:
<calendar xmlns="com.mycompany">
DOM:
Element { localName = "calendar", nodeName = "calendar", prefix = null, namespaceURI = "com.mycompany" }
HTML Markup:
<div xmlns="com.mycompany">
DOM:
Element { localName = "div", nodeName = "div", prefix = null, namespaceURI = "com.mycompany" }
HTML Markup:
<html xmlns="com.mycompany"> <div>Test</div> </html>
DOM for <html>:
HTMLHtmlElement { localName = "html", nodeName = "html", prefix = null, namespaceURI = "http://www.w3.org/1999/xhtml" }
DOM for <div>:
HTMLDivElement { localName = "div", nodeName = "div", prefix = null, namespaceURI = "http://www.w3.org/1999/xhtml" }
This optional component is known to break some pages on the web. Affected pages will be those that assign a non-XHTML namespace to an element with a known HTML name, such as script, and expect it to continue to exhibit its HTML-specific behavior. The biggest risk is default namespace declarations on the <html> element, which Opera's MAMA database shows occurs on roughly 20% of the web. While most of these declarations actually are the XHTML namespace, the risk is avoided altogether by disallowing default namespace declarations on the <html> element.
Opera's MAMA data shows fewer than 10,000 out of 3.5 million pages use xmlns="something" on known HTML elements.
We researched data gathered from Bing looking at sites using xmlns="something-other-than-the-XHTML-namespace". About 20% of the cases we saw were using the rdf: prefix. The majority of others were on elements like <span> that have no inherent behavior and where the compatibility risk would be lower. Again, further research is encouraged.
The compatibility issues generated here is small and comparable to usage data used to support other breaking changes made for HTML5.
The main benefit of including support for default namespaces is that it broadens the set of existing XML content that can be re-used in HTML document. It is much more common for existing XML documents to be written using a default namespace rather than prefixing every element.
<foo:bar>
is equivalent to <foo:bar xmlns:foo="foo">
HTML Markup:
<foo:bar>
DOM:
Element { localName = "bar", nodeName = "foo:bar", prefix = "foo", namespaceURI = "foo" }
HTML Markup:
<com.mycompany:calendar>
DOM:
Element { localName = "calendar", nodeName = "com.mycompany:calendar", prefix = "com.mycompany", namespaceURI = "com.mycompany" }
This is a completely new syntax. We don't currently have data that indicates what level of compatibility risk this would pose and investigation here would be required. This syntax increases the likelihood of differences between Element and HTMLUnknownElement (lack of innerHTML, etc.) causing a compatibility issue.
This is purely a simplification for the benefit of developers. This allows developers that are not as concerned with naming conflicts to avoid some of the complexities of traditional namespace support.
HTML Markup:
<select xmlns="html">
DOM:
HTMLSelectElement { localName = "select", nodeName = "select", prefix = null, namespaceURI = "html" }
HTML Markup:
<mrow xmlns="math">
DOM:
MathMLPresentationContainer { localName = "mrow", nodeName = "mrow", prefix = null, namespaceURI = "math" }
HTML Markup:
<circle xmlns="svg">
DOM:
SVGCircleElement { localName = "circle", nodeName = "circle", prefix = null, namespaceURI = "svg" }
HTML Markup:
<html:select>
DOM:
HTMLSelectElement { localName = "select", nodeName = "html:select", prefix = "html", namespaceURI = "html" }
HTML Markup:
<math:mrow>
DOM:
MathMLPresentationContainer { localName = "mrow", nodeName = "math:mrow", prefix = "math", namespaceURI = "math" }
HTML Markup:
<svg:circle>
DOM:
SVGCircleElement { localName = "circle", nodeName = "svg:circle", prefix = "svg", namespaceURI = "svg" }
HTML Markup:
<svg:a xlink:href="test">
DOM <svg:a>:
SVGAElement { localName = "a", nodeName = "svg:a", prefix = "svg", namespaceURI = "svg" }
DOM xlink:href:
Attr { localName = "href", nodeName = "xlink:href", prefix = "xlink", namespaceURI = "xlink" }
The main risk here is that there are now two different strings that represent the same namespace. Authors will typically have control over the namespace string they choose to use. Library developers will likely have to check for both the short name and the URI of each namespace they are concerned about.
This has unknown compatibility risk but is presumed to be fairly low. A problem would occur if an author used one of these well-known prefixes to mean a namespace other than the one typically intended.
The most commonly used aspects of HTML5 would have namespace values that are much easier to remember and deal with than the URIs used in XML.
We have evaluated this proposal against the HTML Design Principles (http://www.w3.org/TR/html-design-principles/).