- From: Nuutti Kotivuori <naked@iki.fi>
- Date: Sat, 05 Jul 2008 18:06:09 +0300
- To: public-sparql-dev@w3.org, public-rdf-dawg@w3.org
*** I'm apologize, I pressed the send key too soon accidentally, please ignore my earlier mail on the subject. *** Hello, I am having some trouble matching literals with SPARQL. It seems that just about every implementation I tried manages to give me a differing set of answers for a very simple query. I have tried to verify this against the specification, but I haven't been able to find a conclusive answer there. I believe this could be an important interoperability blocker for several applications as the problem is easily triggered by the simplest of graph patterns. The issue is very simple do describe, but to be exact so it can be discussed reasonably, I will write it here somewhat verbosely: Assume the following RDF graph: *** @prefix dt: <http://example.org/datatype#> . @prefix ns: <http://example.org/ns#> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . ns:a ns:p "value" . ns:b ns:p "value"^^xsd:string . ns:c ns:p "value"^^dt:datatype . ns:d ns:p "value"@en . *** Now, if this graph is queried with the simple SPARQL query: SELECT ?x where { ?x ns:p "value" } What should the result set be? It seems that some implementations return just ns:a here, where as some implementations return ns:a and ns:b. Second case is where the SPARQL query is: SELECT ?x where { ?x ns:p "value"^^xsd:string } In this case too, we get implementations that return just ns:b and implementations that return ns:a and ns:b. There does not seem to be a clear ruling on this by the specification, even though the feel of the specification in general (in my opinion) seems to indicate that returning both ns:a and ns:b to these queries would not be the right solution. The third case is a bit more complex SPARQL query: SELECT ?x where { ?x ns:p ?y FILTER (?y = "value") } This is somewhat trickier. The SPARQL specification defines that operator fn:compare is used to match between plain literal pairs and also between xsd:string pairs. I am not sure if the specification defines how a string literal in a filter clause should be interpreted - that is, if "value" is actually a plain literal or just some ephemeral string type. If the specification defines that "value" is a plain literal, then when comparing ns:b, we have a comparison between xsd:string and a plain literal - which is not defined by SPARQL. The common comparison operator is RDFTerm-equal, which states that for literals, the comparison is done by RDF Concepts literal equality, which clearly defines that "value" and "value"^^xsd:string should not compare equal. But it also defines that comparisons between two literals can result in type errors. I am at a loss here as to what the specification actually signifies in the case of a plain literal against a typed literal. But again in this case, implementations differ in whether they return ns:a or both ns:a and ns:b. The fourth case is again similar to the one before: SELECT ?x where { ?x ns:p ?y FILTER (?y = "value"^^xsd:string) And likewise, implementations differ in whether they return only ns:b or both ns:a and ns:b. The fifth case uses a yet another new operator: SELECT ?x where { ?x ns:p ?y FILTER (sameTerm(?y, "value")) } This case seems to be a clear cut decision in my opinion. The specification clearly defines sameTerm to use the RDF Concepts comparsion, which compares "value" and "value"^^xsd:string as not equal. The only confusion can be from the question if "value" is to be interpreted exactly as a plain literal, or if it could be just a string argument to a function without being an RDF term at all. However, even in this case, I found some implementations which return both ns:a and ns:b instead of just ns:a. These I would personally classify as non-conforming implementations. The sixth case is again a variation of the one before: SELECT ?x where { ?x ns:p ?y FILTER (sameTerm(?y, "value"^^xsd:string)) } In this case, there should be no question as to whether "value"^^xsd:string is a typed literal or not. Even still, some implementations return both ns:a and ns:b instead of just ns:b in this case. Summary: I have no clue as to how the specification wants string literals matching done in SPARQL implementations - and it seems that neither do many of the implementors. Hopefully some clarity can be brought in to this matter. A similar issues remain on matching other datatypes as well - but those issues are more easily discussed once this issue has been dealt with. Thank you for your time, -- Naked
Received on Saturday, 5 July 2008 15:06:47 UTC