Re: ISSUE-13: History of rdf:XMLLiteral

On 10 Nov 2011, at 18:59, Richard Cyganiak <richard@cyganiak.de> wrote:

> Ivan,
> 
> On 10 Nov 2011, at 16:44, Ivan Herman wrote:
>> I think we need clarification. I remember a long discussion in the RDFa WG a few years ago. The question arising was: what is exactly the XML Literal an RDFa processor should produce on its output. And it was not clear from the document.
>> 
>> *My* interpretation was that if a processor outputs an RDF graph in a serialized format, then it can be any valid XML, not necessarily in canonical form (ie, the attributes can be in any order), because canonicalization comes into the picture only when the datatype values are compared, ie, when graphs are compared. Others had a different reading of the document.
> 
> I'm pretty sure you are mistaken on this.
> 
> From the point of view of a serialization format, rdf:XMLLiteral is a typed literal like any other. That means, the string that goes into the serialized document is exactly the lexical form. The lexical form of rdf:XMLLiteral must be canonicalized – and so must be the string in the serialized document.
> 

You might be right, I have not checked lately (and I am not close to my machine to do it now). But all this emphasizes that there is a place for misunderstandings.

If we keep xml literals, my preferred approach would be that the canonicalization should be done by the parser. In other words, the lexical space is any valid xml, the value space is its canonicalized equivalent. It puts soem burden on parser writers, but the burden should be theirs and not the authors.

> RDF/XML is an exception because it has “syntactic sugar” for rdf:XMLLiteral, and it explicitly states that canonicalization happens when that sugar is used. Therefore, in RDF/XML, you can write any valid XML.
> 
> This does *not* apply to any other serialization format, unless it explicitly handles rdf:XMLLiteral in a special way.
> 
> The current design of rdf:XMLLiteral leaves the choice to the serialization format: Either you define that the parser performs canonicalization. Otherwise, the document author has to perform canonicalization. To the best of my knowledge, everyone format except RDF/XML does the latter, making rdf:XMLLiteral totally unusable.
> 
>> I do not think we should go into the mess of changing the XML Literals. Clearly they are not widely used, although there are cases when they are (typical case is the content in an RSS 1.0 feed). But we need a clearer description on when, under what circumstances canonicalization is necessary.
> 
> As it stands, they are *entirely unusable* in any non-XML-based format, including Turtle and SPARQL. So why should *anyone* bother implementing it?
> 

For the sake of arguments (without being a great fan of xml literals) I am not sure I agree. If I take the example of RSS, it makes perfect sense that the object of the content predicate would contain an html extract, with all the elements and their attributes included. Whether this is in Turtle or anything else is besides the point. But RSS producers should not go through the hurdle of performing canonicalization, the Jena and RDFlib-s of this world should do it when they store the value in their internal representation.

> I'd say it either needs to be fixed, or it needs to go on the archaic list. As it stands, it's nothing but a useless burden to implementers (and much worse than reification or Alt/Bag/Seq in that regard, because implementing it properly is actually costly).

I do not think I would loose sleepless nights over this, but I am not sure it is unused. So we have to be careful. I would prefer to fix it to make things clearer. I think it is possible.

Ivan


> 
> Best,
> Richard

Received on Thursday, 10 November 2011 18:19:35 UTC