- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Fri, 27 Feb 2009 14:57:31 +0200
- To: Julian Reschke <julian.reschke@gmx.de>, Mark Nottingham <mnot@mnot.net>
- Cc: HTMLWG WG <public-html@w3.org>, "www-tag@w3.org WG" <www-tag@w3.org>, RDFa mailing list <public-rdf-in-xhtml-tf@w3.org>, public-xhtml2@w3.org
Mark Nottingham wrote: > Creative Commons just released a new spec: > http://wiki.creativecommons.org/Ccplus > that has markup in this form: > <a xmlns:cc="http://creativecommons.org/ns#" > rel="cc:morePermissions" href="#agreement">below</a> > (in HTML4, one assumes, since they don't specify XHTML, and this is > what the vast majority of users will presume). http://wiki.creativecommons.org/images/0/06/Ccplus-technical.pdf says "html". The syntax is not valid in any of HTML 2.0, HTML 3.2, HTML 4.0, HTML 4.01 or HTML5 as currently drafted. > However, it appears that they adopted this practice from RDFa; > http://www.w3.org/TR/rdfa-syntax/#relValues > which, in turn, *does* rely upon XHTML. Indeed, RDFa is not a REC over text/html. > However, XHTML does *not* > specify the @rel value as a QName (or CURIE, as RDFa assumes); > http://www.w3.org/TR/2008/REC-xhtml-modularization-20081008/abstraction.html#dt_LinkTypes > > "Note that in a future version of this specification, the Working > Group expects to evolve this type from a simple name to a Qualified > Name (QName)." In HTML5, as currently drafted, rel is a space character-separated list of tokens that are compared ASCII-case-insensitively. It is noteworthy that the token may look like URIs, although HTML5 processing itself ascribes no URI semantics to tokens that look like URIs. > So, that's an expectation, not a current specification. It's not a current or drafted specification for text/html, either. [...] > A few observations and questions; > > 1) I'm more than happy to specify in the Link that in XHTML, a link > rel value is indeed a QName, if XHTML chooses to take that position > (although I believe a URI is a better fit than a QName here, as in > most other places). Can we get a current reading from the XHTML world > on this? In XHTML5, as currently drafted, rel is a space character-separated list of tokens that are compared ASCII-case-insensitively. This matches current HTML 4.01 and XHTML 1.0 implementations. > 2) However, it seems like RDFa is jumping the gun by assuming @rel is > a CURIE right now. This is not promoting interoperablity or shared > architecture, because no XHTML processor that isn't aware of RDFa can > properly identify these link relations. I agree. > My preference would be an > erratum to RDFa removing this syntax, replacing them with a self- > contained identifier (i.e. a URI). Thoughts? More generally, I think it would make sense to issue an erratum that replaces all CURIEs in RDFa with the corresponding full URIs, since this would both 1) Remove the reliance on attributes spelled "xmlns:foo" which are special in XML but not special in text/html (as text/html parsing is currently implemented out there and drafted in HTML5). 2) Avoid introducing a novel prefix-based indirection mechanism with many of the same problems that Namespaces in XML have been observed to have over the last decade. Examples of problems: http://lists.xml.org/archives/xml-dev/200502/msg00306.html http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6475032 http://dev.ctor.org/soap4r/ticket/179 http://sourceforge.net/tracker/?func=detail&atid=454391&aid=924041&group_id=48863 > 3) CC's adoption of *proposed* XHTML conventions from RDFa into HTML4 > via CURIEs further muddies the waters; xmlns has no meaning whatsoever > in HTML4, so they're promoting bad practice there by circumventing the > specified Profile mechanism. I find this aspect of this the most > concerning, and it needs clarification (more colourful words come to > mind, but I'll leave it there for now). I also find the use of xmlns:foo the most concerning aspect, but not just because it has no special HTML 4.01 on the theoretical level but on the practical repercussions for software architecture. I develop a text/html parser that implements the HTML5 parsing algorithm and targets five APIs for the application layer: JDK DOM Level 2, Java SAX2 in the namespace-aware mode, XOM, Web DOM (the one browsers expose via JS; targeted via Google Web Toolkit) and the internal content tree API of Gecko (nsINode/nsIContent; targeted via automated translation of the Java code into C++). These are all namespace-aware APIs. (Note that DOM Level 1 and the DOM Level 1-ish Python minidom aren't namespace-aware and they are the APIs typically used to demonstrate RDFa interop.) Gecko, WebKit and Presto use a namespace-aware DOM for both text/html and application/xhtml+xml. Thus, we can gain understanding of the implemented mapping of text/html into a namespace-aware representation from these implementations. Since attributes of the form xmlns:foo are not special in any way in HTML 4.01 (or 4.0, 3.2 or 2.0 for that matter), an attribute spelled "xmlns:foo" in text/html parses into ["", "xmlns:foo"] as the [namespace, local] pair. (Note that the local name is not an XML 1.0 + Namespaces NCName.) For compatibility with the behavior of these existing browsers, HTML5, as drafted, specifies that "xmlns:foo" in text/html parses into ["", "xmlns:foo"]. Demo: http://hsivonen.iki.fi/test/moz/xmlns-dom.html DOM Level 2 XML, on the other hand, represents an attribute spelled "xmlns:foo" in application/xhtml+xml as ["http://www.w3.org/2000/ xmlns/", "foo"]. Demo: http://hsivonen.iki.fi/test/moz/xmlns-dom.xhtml Furthermore, SAX2 in the namespace-aware mode and XOM do not represent what are spelled "xmlns:foo" in XML as attributes at all in the API. Instead, there's dedicated API surface for exposing namespace mappings to the application layer. If we use the explicit mapping of DOM Level 3 to Infoset, the mapping of XML onto Infoset and the mappings from XML into XOM or namespace- aware SAX2, we have to conclude that when a DOM-oriented spec talks about an attribute in the "http://www.w3.org/2000/xmlns/" namespace, the concept maps to the namespace mapping API surface of SAX2 and XOM and, on the other hand, when an attribute is not in the "http://www.w3.org/2000/xmlns/ " namespace according to a DOM-oriented spec, it doesn't map to the namespace mapping API surface of XOM and namespace-aware SAX2. The above paragraph is relevant, because the dominant design of text/ html parsers for non-browser applications established by John Cowan's TagSoup and adopted by HTML5 parsers is that they expose an XML API so that the application-level code is written as if working with an XML parser parsing an equivalent XHTML 1.0 or XHTML5 file (for HTML 4.01 and HTML5 respectively). This design of sharing above-parser application-level code between text/html and application/xhtml+xml is also in use in Gecko, WebKit and (based on black-box guess) Presto. The internal API of Gecko differs from the DOM slightly: The DOM has three datums: namespace URI, qname (aka. Level 1 node name) and local name. Gecko's internal API also has three datums but slightly differently: namespace URI, *prefix* and local name. None of these are string data types in Gecko. The namespace URI is interned into a 32- bit integer and prefix & local name are interned into a specific interned string type that cannot be used directly where string types can be used. It follows, that for any natively implemented feature, it would be highly undesirable to have to look 'inside' these values as strings as opposed to merely comparing pointers or integers. I'm not suggesting that there were any foreseeable native implementation of RDFa-sensitive functionality in any Gecko-based browsers. However, I am suggesting that language design that would be a bad match for established browser internals is architecturally unsound design in case there's the slightest chance that the language might one day be browser-sensitive. Going back to the design of exposing text/html as if it were XML: As I pointed out earlier, xmlns:foo in text/html parses, in existing browsers and in the HTML5 parsing algorithm as drafted today, into a [namespace, local] pair where the local part is not an NCName. This characteristic alone (i.e. without even considering the part that is spelled "xmlns") is enough to render the [namespace, local] pair unrepresentable in XML 1.0 + Namespaces. This poses the following problems: 1) A local name that is not an NCName cannot be serialized as XML 1.0 in such a way that parsing the resulting XML document with a namespace-aware parser round-trips the non-NCName local name properly. 2) Namespace-wise strictly correct XML tree implementations throw if you try to set an attribute that can't be serialized as XML 1.0 + Namespaces. (A demo that makes XOM throw is included below my signature.) 3) Even if the API contract of an XML API could be violated and a local name that is impossible in XML 1.0 + Namespaces could be passed through, this representation would be *different* from the way an XML parser would expose an attribute spelled "xmlns:foo" though the same API. Thus, the application-layer code would have to differ for text/ html and application/xhtml+xml. The options are thus: 1) Letting the application-layer code differ for text/html and application/xhtml+xml (provided that you can make the infrastructure not to throw). This would violate the DOM Consistency design principle in HTML Design Principles. (For the general purpose of application- layer code reuse, "DOM" here should be understood to mean any API between the parser and application layers.) Experience with dealing with the lang vs. xml:lang issue should show that going down this road leads to divergent code paths in many places, which is bug-prone and bad software architecture. 2) Changing text/html parsing to parse "xmlns:foo" into ["http://www.w3.org/2000/xmlns/ ", "foo"]. This would be inconsistent with the behavior of existing Gecko, WebKit and Presto releases. 3) Changing RDFa not to use attributes spelled "xmlns:foo" in either text/html or application/xhtml+xml. (Failing to do this for application/xhtml+xml would still lead to the problem of different code paths in application-layer code.) This could be achieved with an erratum changing CURIEs to full URIs. 4) Not using RDFa in text/html at all. - - Due to the above considerations, I think that a vocabulary that uses attributes spelled "xmlns:foo" on (X)HTML elements is in architectural error. > P.S., I realise that this involves at least three additional > communities, but the TAG seems like the logical place for the initial > discussion and eventual coordination of this issue. Since Steven already CCed two of those three and Julian forwarded your email to the third, I've CCed all three in addition to the TAG here. -- Henri Sivonen hsivonen@iki.fi http://hsivonen.iki.fi/ import nu.xom.Attribute; import nu.xom.Element; public class XomTest { public static void main(String[] args) { Element elt = new Element("html", "http://www.w3.org/1999/ xhtml"); elt.addAttribute(new Attribute("xmlns:foo", "bar")); } }
Received on Friday, 27 February 2009 12:58:21 UTC