Re: [Fwd: ISSUE-63: White-Space Canonicalization of XML Literals]

Hi Ben!


On 11/13/07, Ben Adida <ben@adida.net> wrote:

> Manu said:
> > This means that we can (and should, IMHO) preserve all of the formatting
> > in the original document for XML Literals.
> >
> > Sorry for the previous post stating that we didn't have a choice, I had
> > not considered getting at the original document using XMLHTTPRequest.
>
> I think this breaks down if the page is the result of a POST. And you
> don't want to resubmit a POST, of course.

Very true.


> Having read the full thread, let me first write down what we agree on:
> plain literals should be canonicalized according to XPath
> normalize-space(), which is Mark's proposal.

Yes.

To be a little picky though, do we mean *plain literals* as in
non-typed (but possibly with language), or also e.g. xsd:string
literals? With plain literals I only see this as a feature (a very
reasonable one). But I think "formally" any typed literal should be
left untouched.. (or am I bringing back the entire problem?)


> Now, what to do with XMLLiterals. Here's my proposal, which is going to
> sound a lot like punting:
>
> "Where possible, an RDFa parser should preserve the exact white space
> and characters of the XML Literal. However, it is also acceptable for an
> RDFa parser to apply browser-based canonicalization."
>
> The assumption is that we're dealing with the host language here,
> XHTML1.1, and if an XML Literal is canonicalized in a way that preserves
> how it renders in XHTML, then who cares? I understand this may limit the
> round-trippiness of RDFa->RDF->RDFa, but that may simply be a limitation
> of what browsers and the DOM does in XHTML1.1.

Yes, I think this may be good enough.

Although I wonder if our problem isn't in fact that this only occurs
in non-*x*html-aware implementations. So the actual issue is with IE
only, since it doesn't properly handle the xhtml as xml. In practise,
isn't original whitespace available in all of Firefox, Safari and
Opera?

If so, client code running in IE is effectively an "HTML+RDFa" parser,
making it only partially capable of handling of "XHTML 1.1 + RDFa"
anyway. This is then mostly input to what can be required of such
"HTML + RDFa"-parsers, and shouldn't directly affect the current
goal.. Or am I missing something?

(That said, if someone wants to continue testing (I have no time right
now I'm afraid): if there is *any* way in IE to get the plain source
of the current page, resorting to IE+ActiveX-specific XML-components
may be another solution.. Or even trying what tricks like
<http://www.w3.org/MarkUp/2004/xhtml-faq#ie> can do..)


> I suppose this makes writing test cases problematic... I suspect we
> should write the tests to preserve white space and characters, and judge
> each browser canonicalization individually.

I agree.


Best regards,
Niklas

Received on Tuesday, 13 November 2007 20:13:06 UTC