RE: Semantic Web Phase 2 Activity - Protocol - Query Language

> From: Jeen Broekstra <mailto:jeen@aduna.biz>
> Date: 12 November 2003 15:40
> 
> [Cc also sent to sesame-devel, since we discussed this problem there
>   today]
> 
> Seaborne, Andy wrote:
> 
> > Hi there - I have an example query with optional triples and I
> > wondered what the various systems do with it:
> > 
> > Thanks to Jeremy Carroll for this example.
> > 
> > Consider the data:
> > 
> > <x> <p> <y> .
> > <x> <q> <z> .
> > 
> > and the query:
> > 
> > [ <x> <p> ?a ]
> > [ <x> <q> ?a ]
> > 
> > where [] is an optional match.
> > 
> > ?? Does the query match the data?
> 
> I have earlier sent a reply on this to Andy alone, on how SeRQL
> currently handles it, but I've spent some time thinking about this
> and discussed it on the sesame-devel list a bit further.
> 
> I think the example query is ill-defined: regardless of the actual
> dataset being queried it is not possible to assign an unambiguous
> semantics to it. The problem is that a variable is shared across
> several optional path matches, neither of which definitely assign
> the variable a value.

The query is only ill-defined because of the data - if the data had the
pattern to match then it would be fine.  The idea of "ill-defined" depending
on the data isn't a way I'd like to think about it.

I'd like to be able to tell if a query is legal (will not throw a parsing
error at least) before issuing the query.  It should also be that order
should not matter and that for any given solution, all the elements must be
true independently and together so the induced graph is a subgraph of the
original data.

> 
> > ?? What does it return?
> 
> I'm heavily leaning towards saying that the query engine should
> return a "malformed query" error.
> 
> In the discussion on sesame-devel, Jacco van Ossenbruggen came up
> with a constraint on optional path expressions: reuse of a variable
> across optional paths is allowed if and only if the variable is also
> used in a non-optional path in the same query. This disambiguates
> the use of the shared variable in optionals, since optional paths
> will no longer have to instantiate the shared variable, they will
> only have to validate the current instantiation.

That is compatible with Steve's reply which was (my paraphrasing) to
interpret:

[ <x> <p> ?a ]
[ <x> <q> ?a ]

as a single optional expression

[ ( <x> <p> ?a ) ( <x> <q> ?a ) ] 

because of the shared variable and which has no matches (although I think
Steve had one match of ?a = undefined).

> 
> In the example case, the query could be reformulated in several
> ways, but the most obvious one would be:
> 
> [<x> <p> ?y]
> [<x> <q> ?z]
> ?y = ?z

I don't see why that should be different from the original case.  A query
optimizer might very well wish to use ?y = ?z (if that means "the same") to
eliminate one variable from the query execution.

Not for this case but what is the truth value of "undef = undef" :-)  I
wouldn't want the number of results expanded with solutions with the
optional parts all undefined.

> 
> (which would unambiguously return no results)
> 
> As far as I can tell, this constraint does not limit expressivity of
> the language, but then again, I can't really tell for sure since I
> am not quite certain what the original query was supposed to express
> in the first place :)

It was just a test case reduced to it essential point.

> 
> Jeen

Received on Wednesday, 12 November 2003 11:34:48 UTC