Re: ISSUE-13: History of rdf:XMLLiteral from Richard Cyganiak on 2011-11-10 (public-rdf-wg@w3.org from November 2011)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Thu, 10 Nov 2011 19:10:06 +0000
To: Ivan Herman <ivan@w3.org>
Cc: Andy Seaborne <andy.seaborne@epimorphics.com>, Jeremy Carroll <jeremy@topquadrant.com>, RDF Working Group WG <public-rdf-wg@w3.org>
Message-Id: <29E6FC16-589F-433F-B923-1EBAB539D157@cyganiak.de>

On 10 Nov 2011, at 18:19, Ivan Herman wrote:
> If we keep xml literals, my preferred approach would be that the canonicalization should be done by the parser. In other words, the lexical space is any valid xml, the value space is its canonicalized equivalent.

Those are different things. Currently, canonicalization is either done by the parser or by the author, depending on the syntax spec. (RDF/XML has the parser do it, the other syntaxes have the author do it.)

What you describe – lexical form is any valid XML, value space is canonicalized – means that parsers (or authors) do *not* have to canonicalize. Only whoever wants to do a *value-based comparison* of two XML literals would have to canonicalize. This is just how all the XSD datatypes work, btw.

> It puts soem burden on parser writers, but the burden should be theirs and not the authors.

If you really want *parsers* to carry the burden, then we'd have to update all syntax specs to demand that they canonicalize when parsing. The thing is, this would make rdf:XMLLiteral handling the single most complicated part in a Turtle parser. And the thought of requiring a J-Triples or N-Triples parser to canonicalize XML strikes me as absurd – it's just not going to happen.

I think that the burden should be only on those who actually want to compare XML literals.

>> As it stands, they are *entirely unusable* in any non-XML-based format, including Turtle and SPARQL. So why should *anyone* bother implementing it?
> 
> For the sake of arguments (without being a great fan of xml literals) I am not sure I agree. If I take the example of RSS, it makes perfect sense that the object of the content predicate would contain an html extract, with all the elements and their attributes included.

Sure.

> Whether this is in Turtle or anything else is besides the point.

How is that beside the point? Any XML literal in Turtle *MUST* be in canonical form as things stand today. Which explains why no one publishes XML literals in Turtle (at least not correctly).

> But RSS producers should not go through the hurdle of performing canonicalization, 

I agree.

> the Jena and RDFlib-s of this world should do it when they store the value in their internal representation.

Well, but there are different ways of doing this. For example, Jena currently doesn't *have* to implement xsd:dateTime equality rules *unless* it wants to implement D-Entailment. But it *has* to implement rdf:XMLLiteral equality rules in order to be able to claim basic RDF conformance.

Best,
Richard

Received on Thursday, 10 November 2011 19:10:35 UTC