- From: Mark Birbeck <mark.birbeck@webbackplane.com>
- Date: Fri, 4 Sep 2009 12:54:45 +0100
- To: James Graham <jgraham@opera.com>
- Cc: Henri Sivonen <hsivonen@iki.fi>, Anne van Kesteren <annevk@opera.com>, Manu Sporny <msporny@digitalbazaar.com>, HTML WG <public-html@w3.org>, RDFa Developers <public-rdf-in-xhtml-tf@w3.org>
Hi James, I think you're really going have to be more specific. You say things like "one will soon run into the following problem", but you don't say what the problem is. You say "clearly the tree[s] produced...will require different processing", but you don't say why they require different processing. (Let's leave aside that you think something is being swept under some carpet, otherwise we'll never get through this discussion.) So let's take this in two steps. First I'll show a typical JavaScript algorithm for obtaining prefix mappings from an element in an RDFa parser, and then we'll look at whether that algorithm can be justified from the point of view of the RDFa spec. Here's some typical code: function getMappingsFromElement(element, mappingList) { var attributes = element.attributes, attrName, i; if (attributes) { for (i = 0; i < attributes.length; i++) { attrName = attributes[i].nodeName; if (attrName.substring(0, 5) === "xmlns") { if (attrName.length === 5) { mappingList.add("", attributes[i].nodeValue); } else if (attrName.substring(5, 6) === ':') { mappingList.add(attrName.substring(6), attributes[i].nodeValue); } } } } It takes as input an [Element] (i.e., a DOM element), and a list of mappings to keep updated. The code iterates through the element's attributes, adding any mappings to the list. Duplicate values are simply overwritten. Now, since this algorithm only requires DOM Level 1 features (the attributes property on element, and the nodeName and nodeValue properties on an attribute), it's safe to say that it will work regardless of whether it's HTML4, HTML5, XHTML, SVG, and so on. Of course, it's written in JavaScript, but since it's using standard DOM1 features, it's also clear that the language is irrelevant, and this could be implemented in any language that has a DOM1 library. However, the big issue is whether we *should* be doing something more if we are in XML mode. In other words, am I using a sleight-of-hand here to achieve compatibility across different kinds of DOM by only using DOM1 features, when actually the RDFa spec requires that I should be using DOM2 features if possible? The answer is no, as I've explained in the other threads, but I'll emphasise the main points here. The RDFa parsing algorithm does not say "go get the currently in-scope namespace mappings", it simply says "get the values of xmlns-based attributes and crack them open". As it happens, even if it did say "go get the currently in-scope namespace mappings", you couldn't do that, even in DOM2. But in any case, the spec simply says "get the values of xmlns-based attributes and crack them open, and we'll keep track of scoping ourselves". This is no surprise, since the algorithm was written with the express purpose of allowing other prefixing mechanisms to be used; if a new prefix mechanism were added, such as @prefix or @token, then if we'd used namespaces explicitly then we'd suddenly have to track the scope of the mappings ourselves. However, by adding this to the algorithm from the start we retained flexibility. Which means that to support @prefix or @token (or any other prefix mapping proposal), the only thing that would need to change in out 'typical' RDFa parser would be the function that I wrote above, which maintains the list of prefix mappings. And since the algorithm for converting a CURIE to a full URI uses the currently in-scope URI mappings -- and *not* the currently in-scope namespaces -- then the rest of the RDFa parsing algorithm 'just works'. Regards, Mark On Fri, Sep 4, 2009 at 11:52 AM, James Graham<jgraham@opera.com> wrote: > Mark Birbeck wrote: > >> The original objection was that different processing is required for >> different DOMs, and I think we've shown that's not the case; all that >> is required is to iterate through the list of atttributes, and pull >> out those that begin "xmlns:". > > It seems to me this is empirically untrue. Consider the case where one tries > to write an RDFa processor in python using lxml and html5lib with the lxml > treebuilder. One will soon run into the following problem: > >>>> from lxml import etree >>>> root = etree.fromstring("<html xmlns='http://www.w3.org/1999/xhtml' >>>> xmlns:foo='http://foo.example'></html>") >>>> root.tag > '{http://www.w3.org/1999/xhtml}html' >>>> root.attrib > {} >>>> root.nsmap > {None: 'http://www.w3.org/1999/xhtml', 'foo': 'http://foo.example'} > > >>>> import html5lib >>>> tree = html5lib.parse("<html xmlns='http://www.w3.org/1999/xhtml' >>>> xmlns:foo='http://foo.example'></html>", treebuilder="lxml") >>>> root = tree.getroot() >>>> root.tag > '{http://www.w3.org/1999/xhtml}html' >>>> root.attrib > {'xmlns': 'http://www.w3.org/1999/xhtml', 'xmlnsU0003Afoo': > 'http://foo.example'} >>>> root.nsmap > {None: 'http://www.w3.org/1999/xhtml'} > > Clearly the tree produced using XML and the tree produced using html5lib > will require different processing. Using a non-namespace aware XML processor > would still result in problems since the tag name would be different in the > two cases. > > Obviously this is not, as stated, strictly a "DOM" consistency issue since > it uses lxml rather than DOM for its tree model. Nevertheless, it does > demonstrate why one cannot pretend that the use of xml namespaces to > establish prefix bindings is an unimportant detail that can be swept under > the carpet. > -- Mark Birbeck, webBackplane mark.birbeck@webBackplane.com http://webBackplane.com/mark-birbeck webBackplane is a trading name of Backplane Ltd. (company number 05972288, registered office: 2nd Floor, 69/85 Tabernacle Street, London, EC2A 4RR)
Received on Friday, 4 September 2009 11:55:30 UTC