Re: ISSUE-13: History of rdf:XMLLiteral from Charles Greer on 2011-11-10 (public-rdf-wg@w3.org from November 2011)

From: Charles Greer <cgreer@marklogic.com>
Date: Thu, 10 Nov 2011 10:57:10 -0800
To: Ivan Herman <ivan@w3.org>
CC: Richard Cyganiak <richard@cyganiak.de>, Andy Seaborne <andy.seaborne@epimorphics.com>, Jeremy Carroll <jeremy@topquadrant.com>, RDF Working Group WG <public-rdf-wg@w3.org>
Message-ID: <4EBC1E86.4090405@marklogic.com>
I am a huge fan of XMLLiteral and have used it extensively, but from 
this thread it looks as though I may have misunderstood its purpose.

There's plenty of XML content out there that stands on its own as such, 
marked-up content.  You can't canonicalize it.  You can't assume it's 
graph data.  But you can assume it to be parse-able well-formed XML, a 
most excellent thing if you're managing documents.

Charles

On 11/10/2011 10:19 AM, Ivan Herman wrote:
> On 10 Nov 2011, at 18:59, Richard Cyganiak<richard@cyganiak.de>  wrote:
>
>> Ivan,
>>
>> On 10 Nov 2011, at 16:44, Ivan Herman wrote:
>>> I think we need clarification. I remember a long discussion in the RDFa WG a few years ago. The question arising was: what is exactly the XML Literal an RDFa processor should produce on its output. And it was not clear from the document.
>>>
>>> *My* interpretation was that if a processor outputs an RDF graph in a serialized format, then it can be any valid XML, not necessarily in canonical form (ie, the attributes can be in any order), because canonicalization comes into the picture only when the datatype values are compared, ie, when graphs are compared. Others had a different reading of the document.
>> I'm pretty sure you are mistaken on this.
>>
>>  From the point of view of a serialization format, rdf:XMLLiteral is a typed literal like any other. That means, the string that goes into the serialized document is exactly the lexical form. The lexical form of rdf:XMLLiteral must be canonicalized – and so must be the string in the serialized document.
>>
> You might be right, I have not checked lately (and I am not close to my machine to do it now). But all this emphasizes that there is a place for misunderstandings.
>
> If we keep xml literals, my preferred approach would be that the canonicalization should be done by the parser. In other words, the lexical space is any valid xml, the value space is its canonicalized equivalent. It puts soem burden on parser writers, but the burden should be theirs and not the authors.
>
>> RDF/XML is an exception because it has “syntactic sugar” for rdf:XMLLiteral, and it explicitly states that canonicalization happens when that sugar is used. Therefore, in RDF/XML, you can write any valid XML.
>>
>> This does *not* apply to any other serialization format, unless it explicitly handles rdf:XMLLiteral in a special way.
>>
>> The current design of rdf:XMLLiteral leaves the choice to the serialization format: Either you define that the parser performs canonicalization. Otherwise, the document author has to perform canonicalization. To the best of my knowledge, everyone format except RDF/XML does the latter, making rdf:XMLLiteral totally unusable.
>>
>>> I do not think we should go into the mess of changing the XML Literals. Clearly they are not widely used, although there are cases when they are (typical case is the content in an RSS 1.0 feed). But we need a clearer description on when, under what circumstances canonicalization is necessary.
>> As it stands, they are *entirely unusable* in any non-XML-based format, including Turtle and SPARQL. So why should *anyone* bother implementing it?
>>
> For the sake of arguments (without being a great fan of xml literals) I am not sure I agree. If I take the example of RSS, it makes perfect sense that the object of the content predicate would contain an html extract, with all the elements and their attributes included. Whether this is in Turtle or anything else is besides the point. But RSS producers should not go through the hurdle of performing canonicalization, the Jena and RDFlib-s of this world should do it when they store the value in their internal representation.
>
>> I'd say it either needs to be fixed, or it needs to go on the archaic list. As it stands, it's nothing but a useless burden to implementers (and much worse than reification or Alt/Bag/Seq in that regard, because implementing it properly is actually costly).
> I do not think I would loose sleepless nights over this, but I am not sure it is unused. So we have to be careful. I would prefer to fix it to make things clearer. I think it is possible.
>
> Ivan
>
>
>> Best,
>> Richard


-- 
Charles Greer
Senior Engineer
MarkLogic Corporation
charles.greer@marklogic.com
Phone: +1 707 408 3277
www.marklogic.com

This e-mail and any accompanying attachments are confidential. The information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this e-mail communication by others is strictly prohibited. If you are not the intended recipient, please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation.
Received on Thursday, 10 November 2011 18:57:43 UTC