- From: Bijan Parsia <bparsia@cs.man.ac.uk>
- Date: Mon, 14 Sep 2009 12:25:27 +0100
- To: Ivan Herman <ivan@w3.org>
- Cc: "Seaborne, Andy" <andy.seaborne@hp.com>, Axel Polleres <axel.polleres@deri.org>, W3C SPARQL Working Group <public-rdf-dawg@w3.org>
On 14 Sep 2009, at 11:06, Ivan Herman wrote: > Hi Bijan, > > Bijan Parsia wrote: >> On 14 Sep 2009, at 05:58, Ivan Herman wrote: [snip] > (I just realized that I wanted to use rdf:XMLLiteral in the example. > Sorry about that...) No worries. >>> My feeling is that the answer should be 'true', regardless of the >>> fact >>> that the two literals are different in the order of the >>> attributes and >>> the usage of white spaces. >> >> Since comparisons are normally in "term" space, i.e., lexical >> space, my >> feeling is different. >> > > Hm. We really do have different feelings:-). > > So if I have the data as > > <> ex:a "1.00"^^xsd:float . > > then > > ASK WHERE { ?a ex:b "1.0"^^xsd:float . } > > should return false? One needs to pick whether one is going to be picky or coercing about such matters. XPath, for example, aggressively atomizes, so you can be sloppy in lots of places and it'll compare things at "the right" level. Perl too (I think). This is good for some things and bites you for others. I believe graph matching is defined on the literal form, which, if I understand it correctly, at least *involves* comparing on the lexical form. So yes. Some operators act on the value space, so would behave differently. Furthermore, in an entailment regime, the value space plays a role. > Is it then in the realm of the entailement regimes > in the sense that it would require D-entailement to be able to say > 'true'? Sure. The answer is TRUE under an OWL2 entailment regime, for example. Consider the following query: ASK WHERE (?a ex:b ?lv FILTER ?lv = "1.0"^^xsd:float.} I believe the answer is TRUE there as well. (But false in the prior one....I hope I'm right...not achieving controlling spec text joy at the moment...still weekended). It partially depends, as well, on whether your triple store canonicalizes the input. But also consider the following graph: :a :p "abc"^^rdf:XMLLiteral. :a :p :b. :a :q :c. :b rdf:type xsd:positiveInteger. :c rdf:type xsd:negativeInteger. And the following queries: ASK WHERE {:a :p ?x. FILTER isLiteral(?x)} <--returns TRUE in SPARQL 1.0; under RDF entailment...it's not a literal. I.e., ASK WHERE {:a :p ?x. ?x rdf:type rdfs:Literal} <-- returns FALSE, but I expect that the first query still returns TRUE. ASK WHERE {:a :p ?x. :a :q ?y. FILTER ?x > ?y} <-- Not sure what it returns...perhaps an error? Under an appropriate D-entailment (or OWL Full) it presumably could be true, as :b is greater than :c. > That may well be the answer (and we may want to think about this > when discussing entailement regimes)... [snip] > (you seem to have referred to the WD). Yeah. Sucks to be me. > I am not arguing at this point whether this is right or wrong (see > below). I am :) > And indeed you are right that no other datatype requires some > sort of a canonicalization. > >> But even if you buy that coming from RDF/XML you'll end up with >> canonicalized lexical forms, not every source must do that. AFAICT, >> SPARQL is silent on canonicalization...XMLLiteral is just another >> datatyped literal. So those would definitely not match. >> > > O.k., I agree with your analysis that SPARQL is silent on that. > Then my > question is, in fact: is this o.k.? Shouldn't SPARQL do the same as > RDF/XML that explicitly refers to canonicalization? No. RDF/XML should change. > If not, the only way I could get a 'yes' answer to my original > question > would be to canonicalize the whole thing myself, ie: A function could do the job as well. >>> Does the SPARQL spec says the same? >>> >>> Note that this is _not_ the case as if we replaced the two literals >>> with, say, 1.0 and 1.00 declaring both to be floats. The way XML >>> Literal is currently defined is such that the lexical form (not the >>> value space!) is the canonical XML version. >> >> This is false. See above. > > I am not sure what 'This' refers to here... My bad. >> If it were true, then semantically the first graph would have a >> not-well-formed literal, thus, semantically, would not be an >> instance of >> rdfs:Literal. >> > > I am not sure I understand that.:-( Ok, in RDF/XML it all works out due to the horrible hack in the parse phase. My guess is that other serializations get this wrong. NTriples, for example. Consider the following XMLLiteral serialization: 1) "abc"^^rdf:XMLLiteral "abc" is not well formed, ergo 1 does *not* denote an XMLLiteral (according to RDF interpretations). Now consider: 2) "<a b='foo' c='bar' />"^^rdf:XMLLiteral If this appears inside a parseType=Literal, it will be coerced by the parser into an XMLLiteral. However, if you cut and paste it into an NTriples (or Turtle?) file, it will *not* be coerced (since the parser doesn't touch what's between the "s) and thus will *not* denote an XMLLiteral (or, indeed, any literal at all). Actually, it may be the case that if you make rdfs:Resource rdfs:subClassOf rdfs:Literal. and something including 2 that you'll get a contradiction. Actually, this is another way that RDF/XML cannot enocode RDF graphs. 3) :a :p "<a b='foo' c='bar' />"^^rdf:XMLLiteral. is a legal RDF graph that is not RDF equivalent to: 4) <rdf:Description rdf:about="a"> <p rdf:parseType="Literal"> <a b='foo' c='bar' /> ... and, in fact, I don't think can be represented by any RDF/XML document. We need a clear story, in the end. The story can be a bit complicated, but it should be clear. Ideally, a list of rules ;) [snip] > Reporting a bug to the RDF document is perfectly possible. Or we could create a rec that supercedes the definition of rdf:XMLLiteral. > But this > should be done by trying to understand how this part of the REC was > created, ie, contacting the original editors, Meh. If the utility isn't apparent to practitioners today, then I'd say that whatever reasons there were then are moot. > and probably refer to the > community in some way or other. This is key. I'd like to know what extant triplestores do and if they *do* canonicalize, whether they'd be willing to loosen it. The smallest possible change would be to relax the lexical space, but leave RDF/XML parsing untouched. That wouldn't even change the number of graphs that RDF/XML can't serialize (correctly :)). > It is, however, on the borderline > whether this is a bug or a change in the Rec; the latter may become > more > touchy indeed. XML 5th edition opens a world of possibilities ;) > Personally, I do not really understand the reasons of this definition > either. I am not a very good experts in XML, but I would have expected > the lexical space of XMLLiteral to be well formed XML, and the value > space to be the canonical XML version, or maybe even the Infoset as an > abstract representation of the XML content. But, again, I am not an > expert on all the details of XML:-( The real killer is getting rid of "extra" namespaces, which means you can't round trip e.g., XSLT through RDF/XML successfully. Cheers, Bijan.
Received on Monday, 14 September 2009 11:20:53 UTC