Re: [Fwd: ISSUE-63: White-Space Canonicalization of XML Literals]

Earlier, Ivan said:
> A full canonicalization in Python is also not that easy. Getting the
> repeated white space characters out and stripping the first and last
> whitespace is a breeze. The rest becomes a real headache unless the
> underlying XML library does it (eg, ordering the attributes). I wonder
> whether we should really require that. What do we gain?

A good question, I wonder what we gain...

Manu said:
> This means that we can (and should, IMHO) preserve all of the formatting
> in the original document for XML Literals.
> 
> Sorry for the previous post stating that we didn't have a choice, I had
> not considered getting at the original document using XMLHTTPRequest.

I think this breaks down if the page is the result of a POST. And you
don't want to resubmit a POST, of course.

Having read the full thread, let me first write down what we agree on:
plain literals should be canonicalized according to XPath
normalize-space(), which is Mark's proposal.

Now, what to do with XMLLiterals. Here's my proposal, which is going to
sound a lot like punting:

"Where possible, an RDFa parser should preserve the exact white space
and characters of the XML Literal. However, it is also acceptable for an
RDFa parser to apply browser-based canonicalization."

The assumption is that we're dealing with the host language here,
XHTML1.1, and if an XML Literal is canonicalized in a way that preserves
how it renders in XHTML, then who cares? I understand this may limit the
round-trippiness of RDFa->RDF->RDFa, but that may simply be a limitation
of what browsers and the DOM does in XHTML1.1.

I suppose this makes writing test cases problematic... I suspect we
should write the tests to preserve white space and characters, and judge
each browser canonicalization individually.

Thoughts?

-Ben

Received on Tuesday, 13 November 2007 18:00:33 UTC