Re: Exact format for XML Literals? from Bijan Parsia on 2009-09-14 (public-rdf-dawg@w3.org from July to September 2009)

From: Bijan Parsia <bparsia@cs.man.ac.uk>
Date: Mon, 14 Sep 2009 09:40:08 +0100
To: Ivan Herman <ivan@w3.org>
Cc: "Seaborne, Andy" <andy.seaborne@hp.com>, Axel Polleres <axel.polleres@deri.org>, W3C SPARQL Working Group <public-rdf-dawg@w3.org>
Message-Id: <BB73A70B-9509-4084-9300-596F16A8B1C9@cs.man.ac.uk>

On 14 Sep 2009, at 05:58, Ivan Herman wrote:

> Andy,
>
> Here is a concrete example. Say our data is:
>
> <rdf:RDF xmlnsrdf="..." xmlns:ex="...">
> <rdf:Description rdf:about="">
>   <ex:p rdf:parseType="Literal">
>      <ex:bla1   a="something" q="and" b="something else"    />
>   </ex:p>
> </rdf:Description>
> </rdf:RDF>
>
> My question is: what is the result of
>
> PREFIX ex: <...>
> ASK WHERE {
>   ?a ex:p
>     "<ex:bla1 q="and"
>         b="something else"     a="something"/>^^rdf:Literal .
> }
>
> My feeling is that the answer should be 'true', regardless of the  
> fact that the two literals are different in the order of the  
> attributes and the usage of white spaces.

Since comparisons are normally in "term" space, i.e., lexical space,  
my feeling is different.

> The RDF/XML spec explicitly says that, in the case above, the XML  
> part is transformed into the 'correct' lexical form when creating  
> the abstract RDF triple (which is defined in the term of  
> canonicalized XML).

That seems to be a bug in RDF/XML, frankly. The lexical space of  
XMLLiteral is *not* the canonicalized form and I don't see why the  
parse phase should say anything about it. (Do systems generally adhere  
to this part of the spec?) No other datatype, to my knowledge,  
*requires* canonicalization (though XML Schema 1.1 provides for a  
canonicalization for all of them, I believe).

http://www.w3.org/TR/2003/WD-rdf-concepts-20030123/#dfn-rdf-XMLLiteral

"""The lexical spacecontains all pairs ( string, lang ) where lang is  
any language identifier [RFC-3066] in lowercase, and string is well- 
balanced, self-contained XML element content [XML], for which the XML  
document corresponding to the pair is a well-formed XML document [XML]  
that also conforms to XML Namespaces [XML-NS]."""

But even if you buy that coming from RDF/XML you'll end up with  
canonicalized lexical forms, not every source must do that. AFAICT,  
SPARQL is silent on canonicalization...XMLLiteral is just another  
datatyped literal. So those would definitely not match.

> Does the SPARQL spec says the same?
>
> Note that this is _not_ the case as if we replaced the two literals  
> with, say, 1.0 and 1.00 declaring both to be floats. The way XML  
> Literal is currently defined is such that the lexical form (not the  
> value space!) is the canonical XML version.

This is false. See above.

If it were true, then semantically the first graph would have a not- 
well-formed literal, thus, semantically, would not be an instance of  
rdfs:Literal.

> Ie, by referring to the fact that the comparison of literal should  
> be done in the value space does not cover the XML Literal case.

? Er...you mean that the comparison should be done in the lexical  
space cuts no ice? But surely it does :)

How about errata on RDF XML syntax and RDF concepts to change, in the  
former case, the parsing to simply check for well formedness (with  
namespaces) and the latter to make the value and the lexical spaces  
identical. We can then add functions such as  
"equalUnderCanonicalization" which would apply to *any* datatype,

I know your default reply ("it's impossible") but is it really? Does  
anyone really think these things *aren't* bugs (at the very least, the  
generality of the lexical form is in tension with the strictness of  
the parsing spec)? Plus, it *removes* code.

Cheers,
Bijan.

Cheers,
Bijan.

Received on Monday, 14 September 2009 08:40:50 UTC