Change Proposal for ISSUE-120

From WHATWG Wiki

Summary

Simplify the specification by removing features that are documented to be confusing to users.

Rationale

The premise of this rationale, which is argued in detail below, is that mechanisms that bind arbitrary strings ("prefixes") to other arbitrary strings ("bases"), which can then be used in conjunction with a third set of arbitrary strings ("values") to form identifiers ("terms") that are never explicitly stated in the source, are a language design anti-pattern in the context of technology intended for broad Web deployment (e.g. in text/html).

In the context of RDFa, there are a number of mechanisms to define the mappings of "prefixes" and "bases": the prefix="" attribute, the profile="" attribute (which has an additional layer of indirection), and for legacy reasons the xmlns="" attribute. This change proposal proposes to remove all three of these features based on the same rationale. (At a high level, these features are essentially equivalent, being little more than syntactic sugar for each other.) As a result, the places where RDFa accepts a "value" that could have used one of these arbitrary "prefixes" to create a "term" can no longer do so, and is limited to either giving the "term", or using a predefined syntax with known prefixes (specifically, the empty prefix ":foo" which is short for "http://www.w3.org/1999/xhtml/vocab#foo", and the bnode syntax "_:foo").

Why arbitrary prefix mechanisms are bad

Copy and paste

Copy-and-paste of the source becomes very brittle when two separate parts of a document are needed to make sense of the content. Copy-and-paste is how the Web evolved, so I think it is important to keep it functional and easy.

Cognitive difficulty

Fundamentally, prefixes are an indirection model. Indirection models are very, very hard for people to understand. However, arbitrary prefixes have proved even harder to understand than most indirection mechanisms. The most widely known arbitrary prefix mechanism on the Web is the XML namespaces feature, which is very similar to the three prefix mechanisms in RDFa. It can thus be used as a case study for the problem:

As far back as 2004, Micah wrote "As the author of an O'Reilly book on XForms, I can report that 90% of the technical questions from readers involve confusion related to namespaces".

http://www.w3.org/2004/04/webapps-cdf-ws/papers/verity.html

Parand Darugar has said similar things: "Experience shows XML namespaces can be a common cause of confusion and a major complicating factor in XML adoption."

http://www.ibm.com/developerworks/library/x-abolns.html

Derek Denny-Brown, who had been the lead developer of MSXML and System.Xml: "If there is any one of the W3C's family of XML specifications, that has caused me the most grief, XML Namespaces is probably it."

http://nothing-more.blogspot.com/2004/10/loving-and-hating-xml-namespaces_21.html

Maciej has also said things to this effect: "Namespaces are an example of the Fundamental Software Engineering Error, which is that something too terrible to actually use can be fixed by adding a level of indirection. Sometimes that is true but software engineers try to do it even when it clearly is not."

http://krijnhoetmer.nl/irc-logs/whatwg/20080801#l-160

Henri too: "I've spent *a lot* of time writing code that is Namespace-wise excruciatingly correct. Yet, Namespaces have never actually solved a problem for me. My software developer friends complain to me about how Namespaces cause them grief. No one can remember Namespaces solving a real problem. It's like feeding a white elephant."

http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2008-August/015905.html

Questions about namespaces come up again and again, over many years:

Prefixes are notoriously hard for implementors to get right:

(This covers bugs by such vendors as Sun, Google, Yahoo!, MySpace — and these aren't just bugs that don't affect end-users, like forgetting to quote attributes in HTML.)

Prefixes are notoriously hard for implementors to document:

Prefixes have been notoriously hard for even people in the standards community to understand (note that in these threads it doesn't matter who is right or wrong; the point is that people get confused by the prefix mechanism):

Importance of simplicity

HTML documents are frequently maintained by different people than the original authors. If the original author is more knowledgeable than the maintainer, and uses features that the maintainer does not understand, then the quality of the document will suffer dramatically. In the context of RDFa, for instance, the original author might use prefixes, when the maintainer doesn't know RDFa — the maintainer might then move nodes around and break the relationship between the declaration of the prefix and the use of the prefix, breaking the page's RDFa annotations. What's worse, with metadata annotation formats the maintainer likely won't notice that anything broke.

Other technologies

It is sometimes suggested that other technologies are evidence that arbitrary prefix mechanisms of the kind discussed here (using rebindable prefixes that are combined with a second string to form a third string whose value matters in a way that it could be constructed in other arbitrary ways) are not fundamentally flawed. While this is a kind of straw man argument, here are some of the technologies sometimes listed in such a defence and a preemptive explanation of why they do not apply:

RDF/XML, N3: Not in wide use, which could be evidence of the problems described above or could be coincidental.
Office Open XML: No public testing about the namespace-awareness of OOXML implementations seems to have been done, so it's hard to comment on this. Interoperability between OOXML implementations is known to be somewhat lacking, and while that does not appear related to namespaces, it does make it hard to make any clear determinations about the issue above.
URLs: Do not use rebindable prefixes.
RSS: Doesn't use prefixes.
Atom: Bugs in Atom implementations around namespaces have been documented (e.g. in Google Reader), despite the fact that Atom barely uses namespaces at all and doesn't require the use of prefixes at all.
WebDAV: The most widely-deployed WebDAV implementation has namespace-related bugs.
Java, JavaScript, PHP, Perl: These languages do not have the kind of prefix mechanism being discussed here.
SVG: SVG implementations have had all kinds of namespace issues — this was in fact one of the big problems with getting SVG onto the Web in the early days of SVG.
CSS: The indirection mechanisms in CSS do cause confusion, but CSS as used in the wild doesn't use prefix binding mechanisms.

It is difficult to conclude from the above that there is no problem with the aforementioned prefix mechanisms and that XMLNS is somehow an anomaly.

Other than Atom, no hand-authored format uses namespaces and is anywhere near as widely deployed as HTML (and Atom doesn't really use namespace prefixes much). Indeed, no technology that uses the anti-pattern described above is as widely used as HTML. Plenty of other technologies that don't use the anti-pattern are, even just on the Web, like CSS, JS, HTTP, DOM, etc.

Dynamic changes

Arbitrary prefixes in dynamically changing content (like HTML) are even worse because they require than an observing software agent not only track the value that they are concerned about, but also all possible ways for the value's prefixes to change meaning. So for instance, here:

 <test prefixes="a=http://example.com/">
  <foo>
   <bar>
    <baz content="a:b"/>
   </bar>
  </foo>
 </test>

...if a software agent wants to see when <baz>'s content="" attribute changes to include the value http://example.net/b, he has to not only watch the content="" attribute, but also the prefixes="" attribute of all ancestor elements up the tree, just in case they redefine the prefix "a" to mean "http://example.net/".

Intentional misimplementation

At least one implementation that is frequently cited as an argument for keeping the xmlns="" feature (though not the prefixes="" and profile="" features, which are new) is that Google implements it. However, Google's implementation of xmlns="" in RDFa is intentionally crippled to work around authoring mistakes; it only recognises some well-known prefixes. Google's experience in fact has been that Web designers are frequently unable to deploy RDFa on major sites without mistakes, in part due to the prefix mechanisms (but also in part due to other complexities in the RDFa format). Google's implementation and Google's experience with getting large sites deploying RDFa argues against arbitrary prefix mechanisms.

Quoting Othar Hansson, lead developer for Google's RDFa work: "we will also deviate from the standard [...] we expect that some webmasters will forget the xmlns attribute entirely".

http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/2009Sep/0126.html

Arbitrary prefix mechanisms are unnecessary

In a usability study for microdata, it was discovered that authors in fact have no difficulty dealing with straight URLs rather than shortening them with prefixes:

http://blog.whatwg.org/usability-testing-html5

Details

This change proposal describes changes to this draft, relative to the specification as it stood on January 11th 2011: http://dev.w3.org/html5/rdfa/

In "2.2 RDFa Processor Conformance", add to the first bullet point an exception for features overridden by the Extensions section.

In the "3. Extensions to RDFa Core 1.1" section, add a section requiring that when processing CURIEs in attributes on elements is the HTML namespace, the set of mappings from prefixes to URIs must always be treated as empty.

In the "3. Extensions to RDFa Core 1.1" section, add a section requiring that the "prefix" attribute not be used in conforming documents and add a section requiring that user agents ignore prefix="" attributes on elements in the HTML namespace.

In the "3. Extensions to RDFa Core 1.1" section, add a section requiring that the "profile" attribute not be used in conforming documents and add a section requiring that user agents ignore profile="" attributes on elements in the HTML namespace.

In the "3. Extensions to RDFa Core 1.1" section, remove all the sections that refer to the "xmlns:" attributes and replace them with a single section saying that HTML+RDFa does not use the "xmlns:" attributes and that user agents must not let their processing model be affected by "xmlns:" attributes in no namespace (such as those found in text/html).

Updates examples and other text accordingly, at the editor's discretion.

Impact

Positive Effects

Simplifies RDFa, potentially letting more people use it.

Negative Effects

Removing xmlns="" support would introduce incompatibility with legacy RDFa content (this doesn't apply to profile="" and prefix=""). (However, note that xmlns="" is deprecated in RDFa+HTML already.)

Conformance Class Changes

Documents, conformance checkers, and user agents are all affected.

Risk

See "negative effects".