- From: Mark Birbeck <mark.birbeck@x-port.net>
- Date: Thu, 13 Mar 2008 14:00:51 +0000
- To: "W3C RDFa task force" <public-rdf-in-xhtml-tf@w3.org>
- Cc: www-rdf-interest@w3.org, "Jeremy J. Carroll" <jjc@hpl.hp.com>
Hello all, During our discussions last week, I suggested that there are a number of ways that we could tackle the rdf:XMLLiteral question. However, the more I've delved into this, the more I've had to conclude that we can't solve it, at least in a very straightforward way. I've presented the details below, and I'm also copying to the RDF interest list, because I believe there is an issue of interpretation here, in relation to RDF Concepts [1], that may impact our resolution. (In particular, there may be a view that we can be more liberal than I am being, in which case we might be able to add more explicit support after all.) I'm also CCing Jeremy because he wrote some interesting comments on XML literals in the context of reviewing the early RDFa drafts, and if anyone can find a way through this, it will be him! (No pressure... ;) ) CONTEXT If we run a Last Call conformant RDFa parser over the following: <h2 property="dc:title" datatype="rdf:XMLLiteral"> E = mc<sup>2</sup>: The Most Urgent Problem of Our Time </h2> we get an XML literal that obviously contains XHTML, but doesn't have the XHTML namespace anywhere. To be correct according to RDF Concepts, the parsed output would need to be: <> dc:title "E = mc<sup xmlns="http://www.w3.org/1999/xhtml">2</sup>: ... ... The Most Urgent Problem of Our Time"^^rdf:XMLLiteral . Note the addition of the default namespace. EXCLUSIVE CANONICALISATION The RDF Concepts document says that an XML literal needs to be "exclusive Canonical XML". The algorithm for this is obtained from the Exclusive XML Canonicalization spec [2], and essentially dictates that currently in-scope namespaces must be placed on the apex node, and that all 'visibly utilised' namespaces must appear on the most appropriate start tag, if that namespace has not been defined on an ancestor. For example, the Exclusive Canonicalization of this: <div> <svg:rect ...> <xf:input ...>...</xf:input> <img ... /> </svg:rect> </div> would be this <div xmlns="..."> <svg:rect xmlns:svg="..." ...> <xf:input xmlns:xf="..." ...>...</xf:input> <img ... /> </svg:rect> </div> The root <div> is the 'apex node'. PROBLEMS FOR IMPLEMENTATIONS The problems that we have with this in RDFa parsers fall into two categories; those that simply involve implementing the algorithm, and those that relate to the data having to be interpreted as an XPath data model. PROBLEMS: ALGORITHM >From the algorithm's point of view, the easy part is that the apex node must contain all currently active namespaces; we have these, because they are the currently in-scope prefix mappings in our processing rules. We could therefore easily 'dump' those onto the apex node. However, the next part is slightly more tricky, in that any "visibly utilised" namespace must be added to the correct start tag, if it's not already on an ancestor. Actually, it's stronger than that in that the namespace must *not* appear if it has been defined by an ancestor. The following would therefore be incorrect: <div xmlns="..."> <svg:rect xmlns:svg="..." ...> <xf:input xmlns:xf="..." ...> <xf:label xmlns:xf="..." ...>...</xf:label> </xf:input> <img ... /> </svg:rect> </div> The reason why this would be 'wrong' (so to speak) is that the XForms label element does not need the XForms namespace, since it is already present on the XForms input control. (As explained at the end, I think this is an unnecessary restriction, and has unfortunate consequences.) PROBLEMS: XPATH DATA MODEL But the bigger problem I foresee, is that the XML literal must be processed using the XPath data model, which means sorting out things like entities, removing comments, and so on. This seems to imply that an RDFa parser would need to support an XML parser, which seems an unfortunate requirement. ARE THERE ANY EASY SOLUTIONS? I'm afraid that I don't believe there are any easy solutions. If we explicitly say that we are creating XML literals, then I don't see any way that they can't be 'proper' XML literals, as laid down by the RDF Concepts document, and that means Exclusive Canonicalisation. In turn, that means namespaces have to be sorted out, entities have to be encoded/decoded/etc., and so on. So...my gut feeling is that RDFa should not 'support' XML literals in this release. However, we _should_ reserve all of the necessary architecture, such as saying that @datatype="rdf:XMLLiteral" is reserved but undefined, that @property with no @content but with child elements is undefined, and so on. Of course, for the sake of producing useful software, implementers would be advised to create a 'dumb' XML literal, by simply copying the inner content of the child elements. We can say something like "we'll look for implementer experience to help guide this part of the spec in a future version". But the main point is that I don't think we can say we are properly supporting XML literals unless we support Exclusive Canonicalisation, and that is quite a burden. SIDE NOTES My feeling is that this is not a problem of our making, and that XML literals are just pretty badly defined. The problme in my view is not that they rely on Exclusive Canonicalisation, but that they do so in the wrong way. Any comparison that takes place between values would have to achieved by parsing those values in an XML parser anyway (as RDF Concepts also says), and making a comparison at the level of the infoset. Which means that these two fragments of XML would cause a match when compared in this way: <div xmlns="..."> <svg:rect xmlns:svg="..." ...> <xf:input xmlns:xf="..." ...> <xf:label xmlns:xf="..." ...>...</xf:label> </xf:input> <img ... /> </svg:rect> </div> <div xmlns="..." xmlns:svg="..." xmlns:xf="..."> <svg:rect ...> <xf:input ...> <xf:label ...>...</xf:label> </xf:input> <img ... /> </svg:rect> </div> However, the first fragment is not strictly 'exclusively canonicalised', due to the extra namespace. So the process should be to canonicalise, and then compare. But what RDF Concepts does is to say (effectively) that we should canonicalise the XML, and then store it. And then later on, if we want to compare, we already have the canonicalised form. But the big problem with this is that we are no longer able to simply store structured mark-up that we want to round-trip, without comparing it to anything. What RDF Concepts should have done, in my opinion, is used the idea of an XML literal to simply indicate the datatype, as a kind of flag, and then leave the Exclusive Canonicalisation stuff to the act of comparison. If data is simply being stored for later retrieval then why go to lots of effort to store it in an 'unambiguous' way? In particular, why require that all RDF applications must support an XML parser? But since this is not in our power to control, I think punting it to a future version of RDFa makes some sense. And in the short-term, implementers can add 'dumb' support to their parsers. (I've not really discussed other possible solutions such as inventing our own XHTML datatype, since I think they are the wrong way to go, and I didn't get the sense that anyone was completely enthusiastic about that route, on the call. But there are some angles to it, if people really feel we must have a solution now, rather than postponing this to a future version of RDFa.) Regards, Mark [1] <http://www.w3.org/TR/rdf-concepts/> [2] <http://www.w3.org/TR/xml-exc-c14n/> -- Mark Birbeck mark.birbeck@x-port.net | +44 (0) 20 7689 9232 http://www.x-port.net | http://internet-apps.blogspot.com x-port.net Ltd. is registered in England and Wales, number 03730711 The registered office is at: 2nd Floor Titchfield House 69-85 Tabernacle Street London EC2A 4RR
Received on Thursday, 13 March 2008 14:01:35 UTC