W3C home > Mailing lists > Public > www-rdf-interest@w3.org > November 2003

Re: Semantic Web Phase 2 Activity - Protocol - Query Language

From: Jeen Broekstra <jeen@aduna.biz>
Date: Wed, 12 Nov 2003 18:50:53 +0100
Message-ID: <3FB272FD.6070502@aduna.biz>
To: "Seaborne, Andy" <Andy_Seaborne@hplb.hpl.hp.com>
Cc: 'Steve Harris' <S.W.Harris@ecs.soton.ac.uk>, www-rdf-interest@w3.org, sesame-devel@lists.sourceforge.net

Seaborne, Andy wrote:

>>From: Jeen Broekstra <mailto:jeen@aduna.biz>
>>Date: 12 November 2003 15:40
[snip]
>>I think the example query is ill-defined: regardless of the actual
>>dataset being queried it is not possible to assign an unambiguous
>>semantics to it. The problem is that a variable is shared across
>>several optional path matches, neither of which definitely assign
>>the variable a value.
> 
> 
> The query is only ill-defined because of the data - if the data had the
> pattern to match then it would be fine.

I disagree - the query itself can be interpreted in more than one way, 
this is not dependent on the data. Even if the data contains a matching 
pattern, e.g. we have the following triples:

<x> <p> <y>.
<x> <q> <y>.
<x> <q> <z>.

It is still not clear, IMHO, what the query should return (Should it 
return the third triple? It matches the second optional path, and while 
that makes the first one fail, that one is optional, so that shouldn't 
matter for the query result. Or should it?).

 > The idea of "ill-defined"
 > depending on the data isn't a way I'd like to think about it.
> I'd like to be able to tell if a query is legal (will not throw a parsing
> error at least) before issuing the query.  It should also be that order
> should not matter and that for any given solution, all the elements must be
> true independently and together so the induced graph is a subgraph of the
> original data.

I agree.

>>>?? What does it return?
>>
>>I'm heavily leaning towards saying that the query engine should
>>return a "malformed query" error.
>>
>>In the discussion on sesame-devel, Jacco van Ossenbruggen came up
>>with a constraint on optional path expressions: reuse of a variable
>>across optional paths is allowed if and only if the variable is also
>>used in a non-optional path in the same query. This disambiguates
>>the use of the shared variable in optionals, since optional paths
>>will no longer have to instantiate the shared variable, they will
>>only have to validate the current instantiation.
> 
> That is compatible with Steve's reply which was (my paraphrasing) to
> interpret:
> 
> [ <x> <p> ?a ]
> [ <x> <q> ?a ]
> 
> as a single optional expression
> 
> [ ( <x> <p> ?a ) ( <x> <q> ?a ) ] 
> 
> because of the shared variable and which has no matches (although I think
> Steve had one match of ?a = undefined).

Steve's answer is indeed a slightly different solution to the same 
problem: disambiguation of the original query pattern.

>>In the example case, the query could be reformulated in several
>>ways, but the most obvious one would be:
>>
>>[<x> <p> ?y]
>>[<x> <q> ?z]
>>?y = ?z
> 
> I don't see why that should be different from the original case.  A query
> optimizer might very well wish to use ?y = ?z (if that means "the same") to
> eliminate one variable from the query execution.

It is different because it is, like Steve's solution, a disambiguation: 
clearly the equation of ?y and ?z is not optional now (which was 
ambiguous in the original). Regardless of the order in which you 
instantiate the patterns, the query should give the same answer. How an 
optimizer handles this is a seperate issue from the semantics of the 
query, I think.

But perhaps I'm thinking about this slightly too much with my 
SeRQL-engine-engineering cap on :)

Jeen
Received on Wednesday, 12 November 2003 12:52:49 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 7 December 2009 10:52:03 GMT