- From: Richard Cyganiak <richard@cyganiak.de>
- Date: Thu, 10 Nov 2011 17:59:55 +0000
- To: Ivan Herman <ivan@w3.org>
- Cc: Andy Seaborne <andy.seaborne@epimorphics.com>, Jeremy Carroll <jeremy@topquadrant.com>, RDF Working Group WG <public-rdf-wg@w3.org>
Ivan, On 10 Nov 2011, at 16:44, Ivan Herman wrote: > I think we need clarification. I remember a long discussion in the RDFa WG a few years ago. The question arising was: what is exactly the XML Literal an RDFa processor should produce on its output. And it was not clear from the document. > > *My* interpretation was that if a processor outputs an RDF graph in a serialized format, then it can be any valid XML, not necessarily in canonical form (ie, the attributes can be in any order), because canonicalization comes into the picture only when the datatype values are compared, ie, when graphs are compared. Others had a different reading of the document. I'm pretty sure you are mistaken on this. From the point of view of a serialization format, rdf:XMLLiteral is a typed literal like any other. That means, the string that goes into the serialized document is exactly the lexical form. The lexical form of rdf:XMLLiteral must be canonicalized – and so must be the string in the serialized document. RDF/XML is an exception because it has “syntactic sugar” for rdf:XMLLiteral, and it explicitly states that canonicalization happens when that sugar is used. Therefore, in RDF/XML, you can write any valid XML. This does *not* apply to any other serialization format, unless it explicitly handles rdf:XMLLiteral in a special way. The current design of rdf:XMLLiteral leaves the choice to the serialization format: Either you define that the parser performs canonicalization. Otherwise, the document author has to perform canonicalization. To the best of my knowledge, everyone format except RDF/XML does the latter, making rdf:XMLLiteral totally unusable. > I do not think we should go into the mess of changing the XML Literals. Clearly they are not widely used, although there are cases when they are (typical case is the content in an RSS 1.0 feed). But we need a clearer description on when, under what circumstances canonicalization is necessary. As it stands, they are *entirely unusable* in any non-XML-based format, including Turtle and SPARQL. So why should *anyone* bother implementing it? I'd say it either needs to be fixed, or it needs to go on the archaic list. As it stands, it's nothing but a useless burden to implementers (and much worse than reification or Alt/Bag/Seq in that regard, because implementing it properly is actually costly). Best, Richard
Received on Thursday, 10 November 2011 18:00:26 UTC