- From: Ben Adida <ben@adida.net>
- Date: Tue, 13 Nov 2007 10:00:15 -0800
- To: public-rdf-in-xhtml-tf@w3.org
Earlier, Ivan said: > A full canonicalization in Python is also not that easy. Getting the > repeated white space characters out and stripping the first and last > whitespace is a breeze. The rest becomes a real headache unless the > underlying XML library does it (eg, ordering the attributes). I wonder > whether we should really require that. What do we gain? A good question, I wonder what we gain... Manu said: > This means that we can (and should, IMHO) preserve all of the formatting > in the original document for XML Literals. > > Sorry for the previous post stating that we didn't have a choice, I had > not considered getting at the original document using XMLHTTPRequest. I think this breaks down if the page is the result of a POST. And you don't want to resubmit a POST, of course. Having read the full thread, let me first write down what we agree on: plain literals should be canonicalized according to XPath normalize-space(), which is Mark's proposal. Now, what to do with XMLLiterals. Here's my proposal, which is going to sound a lot like punting: "Where possible, an RDFa parser should preserve the exact white space and characters of the XML Literal. However, it is also acceptable for an RDFa parser to apply browser-based canonicalization." The assumption is that we're dealing with the host language here, XHTML1.1, and if an XML Literal is canonicalized in a way that preserves how it renders in XHTML, then who cares? I understand this may limit the round-trippiness of RDFa->RDF->RDFa, but that may simply be a limitation of what browsers and the DOM does in XHTML1.1. I suppose this makes writing test cases problematic... I suspect we should write the tests to preserve white space and characters, and judge each browser canonicalization individually. Thoughts? -Ben
Received on Tuesday, 13 November 2007 18:00:33 UTC