RE: SPARQL and string literal matching woes - spec inconclusive - try 2 from Seaborne, Andy on 2008-07-06 (public-rdf-dawg-comments@w3.org from July 2008)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Sun, 6 Jul 2008 18:07:56 +0000
To: Nuutti Kotivuori <naked@iki.fi>, "public-sparql-dev@w3.org" <public-sparql-dev@w3.org>, "public-rdf-dawg-comments@w3.org" <public-rdf-dawg-comments@w3.org>
Message-ID: <B6CF1054FDC8B845BF93A6645D19BEA34272AE6076@GVW1118EXC.americas.hpqcorp.net>
> -----Original Message-----
> From: public-sparql-dev-request@w3.org [mailto:public-sparql-dev-
> request@w3.org] On Behalf Of Nuutti Kotivuori
> Sent: 05 July 2008 16:06
> To: public-sparql-dev@w3.org; public-rdf-dawg@w3.org
> Subject: SPARQL and string literal matching woes - spec inconclusive - try 2
>
> Hello,

Hi there,

>
> I am having some trouble matching literals with SPARQL. It seems that
> just about every implementation I tried manages to give me a differing
> set of answers for a very simple query. I have tried to verify this
> against the specification, but I haven't been able to find a
> conclusive answer there.

The formal definition is section 12.3
http://www.w3.org/TR/rdf-sparql-query/#BasicGraphPattern


It is based on simple entailment.  SPARQL is defined for simple entailment but it is also possible that the engines you try provide more expressive entailments, such as partial or full D-entailment, that allow value matching.  It's important to note that, in general, more expressive entailments additional answers become possible.

Extending SPARQL Basic Graph Matching:
http://www.w3.org/TR/rdf-sparql-query/#sparqlBGPExtend


Simple entailment and D-entailment:
http://www.w3.org/TR/rdf-mt/


>
> I believe this could be an important interoperability blocker for
> several applications as the problem is easily triggered by the
> simplest of graph patterns.
>
> The issue is very simple do describe, but to be exact so it can be
> discussed reasonably, I will write it here somewhat verbosely:
>
> Assume the following RDF graph:
>
> ***
> @prefix dt:   <http://example.org/datatype#> .
> @prefix ns:   <http://example.org/ns#> .
> @prefix xsd:  <http://www.w3.org/2001/XMLSchema#> .
>
> ns:a ns:p "value" .
> ns:b ns:p "value"^^xsd:string .
> ns:c ns:p "value"^^dt:datatype .
> ns:d ns:p "value"@en .
> ***
>
> Now, if this graph is queried with the simple SPARQL query:
>
>   SELECT ?x where { ?x ns:p "value" }
>
> What should the result set be?

Under simple entailment:

--------
| x    |
========
| ns:a |
--------

> It seems that some implementations
> return just ns:a here, where as some implementations return ns:a and
> ns:b.

Because they go beyond simple entailment and match "value" and "value"^^xsd:string.

Rules xsd 1a and xsd 1b in RDF semantics.

--------
| x    |
========
| ns:b |
| ns:a |
--------

>
> Second case is where the SPARQL query is:
>
>   SELECT ?x where { ?x ns:p "value"^^xsd:string }
>
> In this case too, we get implementations that return just ns:b and
> implementations that return ns:a and ns:b.
>
> There does not seem to be a clear ruling on this by the specification,
> even though the feel of the specification in general (in my opinion)
> seems to indicate that returning both ns:a and ns:b to these queries
> would not be the right solution.
>
> The third case is a bit more complex SPARQL query:
>
>   SELECT ?x where { ?x ns:p ?y FILTER (?y = "value") }
>
> This is somewhat trickier.

Not trickier - uses a different part of the spec :-)

See sec 11 operator dispatch table.
http://www.w3.org/TR/rdf-sparql-query/#OperatorMapping


Under (minimal) SPARQL,

=   (value"^^xsd:string , "value" )

has no entry in that table.  So it's a type error and the FILTER is false.

Extensibility of the operator table:
http://www.w3.org/TR/rdf-sparql-query/#operatorExtensibility


so an implementation can add a row for

A = B     A: xsd:string  B: simple literal

and return true causing a match.  This is an extended SPARQL implementation but it is a valid one.

> The SPARQL specification defines that
> operator fn:compare is used to match between plain literal pairs and
> also between xsd:string pairs. I am not sure if the specification
> defines how a string literal in a filter clause should be interpreted
> - that is, if "value" is actually a plain literal or just some
> ephemeral string type. If the specification defines that "value" is a
> plain literal, then when comparing ns:b, we have a comparison between
> xsd:string and a plain literal - which is not defined by SPARQL. The
> common comparison operator is RDFTerm-equal, which states that for
> literals, the comparison is done by RDF Concepts literal equality,
> which clearly defines that "value" and "value"^^xsd:string should not
> compare equal. But it also defines that comparisons between two
> literals can result in type errors. I am at a loss here as to what the
> specification actually signifies in the case of a plain literal
> against a typed literal.
>
> But again in this case, implementations differ in whether they return
> ns:a or both ns:a and ns:b.

In a safe manner - more solutions become possible.  The operator extension framework is monotonic and previous true answers don't become false.

>
> The fourth case is again similar to the one before:
>
>   SELECT ?x where { ?x ns:p ?y FILTER (?y = "value"^^xsd:string)
>
> And likewise, implementations differ in whether they return only ns:b
> or both ns:a and ns:b.

As above - add a row to the operator table for (simple literal, xsd:string).

>
> The fifth case uses a yet another new operator:
>
>   SELECT ?x where { ?x ns:p ?y FILTER (sameTerm(?y, "value")) }
>
> This case seems to be a clear cut decision in my opinion. The
> specification clearly defines sameTerm to use the RDF Concepts
> comparsion, which compares "value" and "value"^^xsd:string as not
> equal. The only confusion can be from the question if "value" is to be
> interpreted exactly as a plain literal, or if it could be just a
> string argument to a function without being an RDF term at all.

Definition of sameTerm:
http://www.w3.org/TR/rdf-sparql-query/#func-sameTerm

which points to RDF Concepts.

sameTerm can't be extended - only new entries in the operator table do that.

"value" is a plain literal in RDF.

One answer

--------
| x    |
========
| ns:a |
--------

> However, even in this case, I found some implementations which return
> both ns:a and ns:b instead of just ns:a. These I would personally
> classify as non-conforming implementations.

I can't speak for other implementers; you'll have to engage in a dialogue with them.  It would probably help if you said who you mean.

(For ARQ - use strict mode if you want the exact, unextended spec otherwise the graph may be doing plain liter/xsd:string value indexing and various other things.).

>
> The sixth case is again a variation of the one before:
>
>   SELECT ?x where { ?x ns:p ?y FILTER (sameTerm(?y, "value"^^xsd:string)) }

One answer

--------
| x    |
========
| ns:b |
--------

sameTerm does not permit anything else (that's the whole point of it).

>
> In this case, there should be no question as to whether
> "value"^^xsd:string is a typed literal or not.
>
> Even still, some implementations return both ns:a and ns:b instead of
> just ns:b in this case.

There is another factor: an implementation may do some processing when the data is loaded, and the query does not specify the dataset (FROM or protocol request) and so the dataset comes from the execution environment. This is nothing to do with SPARQL because the scope of SPARQL does not extend to how a graph is loaded by some system before query is issued (so the data is there beforehand).

>
> Summary:
>
> I have no clue as to how the specification wants string literals
> matching done in SPARQL implementations - and it seems that neither do
> many of the implementors. Hopefully some clarity can be brought in to
> this matter.
>
> A similar issues remain on matching other datatypes as well - but
> those issues are more easily discussed once this issue has been dealt
> with.

Same issues arise - entailment and data processing are factors.  Extension can in in the graph matching and in the operator table.

>
> Thank you for your time,
> -- Naked

I hope I have clarified the situation for you,

        Andy

I changed the address line: please send comments on the spec to:
public-rdf-dawg-comments@w3.org where they can be tracked.
rather than the working group list (public-rdf-dawg)
Received on Sunday, 6 July 2008 18:08:29 UTC