Re: Refining Optionals from Seaborne, Andy on 2005-07-07 (public-rdf-dawg@w3.org from July to September 2005)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Thu, 07 Jul 2005 16:27:28 +0100
To: Geoff Chappell <geoff@sover.net>
CC: "'RDF Data Access Working Group'" <public-rdf-dawg@w3.org>, Pat Hayes <phayes@ihmc.us>
Message-ID: <42CD49E0.3090609@hp.com>
Geoff Chappell wrote:
> 
>>-----Original Message-----
>>From: Seaborne, Andy [mailto:andy.seaborne@hp.com]
>>Sent: Tuesday, July 05, 2005 5:00 AM
>>To: Geoff Chappell
>>Cc: 'RDF Data Access Working Group'
>>Subject: Re: Refining Optionals
>>
> 
> [...] 
> 
>>
>>FILTERs could be made to attach to a graph pattern but that doesn't really
>>chnage anything because that pattern could be group.  FILTERs could be
>>attached
>>to basic patterns but that would be a chnage in the possible answers (e.g.
>>FILTER on a UNION where all branches define the variable tested).
>>
> 
> 
> I think something like this might make work.... come up with a set of
> rewrite rules to transform the query into something like disjunctive normal
> form, but that treats basic patterns with constraints as atomic (call it
> disjunctive graph normal form or ...) -  i.e. something like this:
> 
> P1 P2 = P2 P1
> P1 UNION P2 = P2 UNION P1
> OPTIONAL(A) = A UNION NOT(A)
> (P1 UNION P2)P3 = P1 P3 UNION P2 P3
> NOT(P1 UNION P2) = NOT(P1) NOT(P2)
> NOT(P1 NOT(P2)) = NOT(P1) UNION P2
> 
> The last rule is really the difference since it separates the positive
> factors from the negative without breaking apart the positives (i.e. it
> keeps graphs together rather than treating them as a conjunction of triple
> patterns and filters). 
> 
> The result hopefully is that you don't end up with negated filters in
> isolation, but they're instead kept together with the graphs they're
> intended to constrain. E.g:
> 
> 	A OPTIONAL (B FILTER C)
> 
> ->
> 	A (B FILTER C UNION NOT(B FILTER C))
> ->
> 	A B FILTER C UNION A NOT(B FILTER C)
> 
> As opposed to:
> 
> 	A B FILTER C UNION A NOT(B) UNION A NOT(FILTER C)
> 
> which is a case that introduces the unwanted solutions since NOT (FILTER C)
> will always succeed if B is needed to bind the vars used in FILTER C (as in
> the OPTIONAL-FILTER test case).
> 
> So the rules then for determining valid solutions would be (as opposed to a
> prescription for how to come up with them):
> 
> 1. transform into DGNF using rewrite rules
> 2. each disjunct has a solution set; for each disjunct, any var that is not
> mentioned in a positive pattern is unbound for solutions of that disjunct
> 
> I haven't tried it in code, but is seems to work on paper :-) 
> 
> See any obvious holes in it?
> 
> -Geoff

The more I think about it, the more I'm unsure we can sort this out in time.

We have a way of talking about the "Junk" case - although as ?v may appear in a
different position, creating a set of spurious terms from the model, I am a 
little nervous of this still.

It's the unbound case that I still can't get straight.  We are talking about
whether a negated pattern involving unbound variables does or does not match a
graph.  As match is defined as subgraph, anything involving variables never
matches.  The negation always is true - the !P case always admits a solution.

As I read your expansion rules above:
optional { :x :p ?v }
{ :x :p ?v } UNION NOT { :x :p ?v }

but ?v = unbound is not a solution to  { :x :p ?v } so NOT { :x :p ?v } is true
and hence that is a solution.  I must be missing something.


This approach is based on filtering the space of all possible solutions to find
the solutions that match the query.  There was the alternative approach for
defining SPARQL which to build the solutions up from basic patterns.  Unbound
variables never appear in basic pattern matches.

Instead of approaching the set of solutions from above (considering all possible
solutions and narrowing it down to the ones that match), it is approaching the
set of solutions from below (starting with nothing and creating more complex
sets of solutions).

Both approachs should get to the same place.

The biggest difference is in the treatment of optionals.  They are better
handled as a binary operator "A opt B", capturing the idea that optionals extend
another solution where possible.

A compositional approach would be:

--------------

Basic Patterns (no change)
Subgraph match.  Only include the variables that matter in a solution.

Union: (no change)
Union of the sets of solutions of subpatterns.

Optionals:
S is a solution to A opt B if:
     S is a solution of A.B else S is a solution n of A

The use of "else" here forces the choice.

Group:
The composition of the subpatterns.

The empty group {} has one solution with no variable bindings.
This is the seed condition and defines { optional {...} }

A group of one pattern has solutions of that one pattern.

Define G(A,B) as the group combination of two patterns.
    if B is a basic pattern, union, another group:
       G(A,B) has solution S if S is a solution to A and to B.
    if B is an optional Opt(P)
       G(A,B) has solution S
          if is a solution of G(A,P)
          else S is a solution of A and not of P.
    if B is a filter F
       G(A,F) has solution S if S is a solution to A and F(S) is true.

Define a solution S of group pattern GP = G1, G2, ... as G(group{G1, G2, ..},Gn)

--------------

That is order dependent.  That could be relaxed by defining it as

      G(group{Gi i!=j}, Gj)

which is every order.  Implementation like that is not advised :-) 
Realistically, doing all the fixed parts, then the optionals, spotting any order 
dependences (optionals with shared variables that do not occur in a fixed part) 
is doable.

Filters "out of order" would give nothing (undefined variables) with the 
exception of FILTER bound(?x).


It leaves it open to provide an alternative set of definitions that are based on
the framework you outline.  It will take time to get consensus around such an
approach (especially with negation and unbound variables being very significant
in how they work).

	Andy
Received on Thursday, 7 July 2005 15:29:19 UTC