Re: Updated definitions from Seaborne, Andy on 2005-05-19 (www-archive@w3.org from May 2005)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Thu, 19 May 2005 11:43:22 +0100
To: Dan Connolly <connolly@w3.org>
CC: www-archive@w3.org, Pat Hayes <phayes@ihmc.us>, "Eric Prud'hommeaux" <eric@w3.org>
Message-ID: <428C6DCA.4000804@hp.com>
Dan,

Thanks for spending the time on this - your comments are very helpful.

This message responses to the graph pattern comments.  I'll rework the solution 
processing sections along the lines you suggest.

Dan Connolly wrote:
> On Wed, 2005-05-18 at 14:53 +0100, Seaborne, Andy wrote:
> 
>>Formal Definitions from SPARQL Query Language for RDF
> 
> 
>>Definition: Query Variable 
>>
>>A named query variable is a member of the set V where V is infinite
>> and disjoint from RDF-T. V is otherwise arbitrary.
>>
>>An unnamed query variable is a blank node in the query pattern and is
>>disjoint from any blank node in the RDF graphs being matched.
> 
> 
> That 2nd sentence seems like it belongs elsewhere.
> Ah... I think it's best to leave it out of the abstract
> definitions, and note in/near the grammar (or just
> outside the box) that a parser
> needs to come up with a variable name that's distinct
> from all the others when it parses the [] construct.

Agreed.

A general point that comes up several times is that the definitions and 
description shoudl largely be independent of the concrete syntax and the parser. 
  There may be other concrete syntaxes; queries may come from other places (e.g. 
build programmatically).

Here, the key point is that blank nodes ([] or _:a form) must be disjoin from 
the dataset.  It kinda falls out naturally because the fundamental matching is 
entailment so I will move the second part out of the definition.   I do think 
that the definition of "query variable" needs to cover blanks nodes (hence 
"named").  Does that work for you?

> 
> 
> 
>>Definition: Graph Pattern 
>>
>>A Graph Pattern is constructed from triple patterns and value
>>constraints, using the following operations to combine patterns:
>>
>>      * Basic Graph Pattern, set set of triple patterns 
>>      * Group Pattern, where each of a set of graph patterns must all
>>        match 
>>      * Optional Graph Pattern, where a graph patterns may extend the
>>        solution 
>>      * Union Graph Pattern, where one of a set of graph patterns must
>>        match 
>>      * Patterns on Named Graphs, where patterns are matched against
>>        named graphs
> 
> 
> I think it suffices to say:
> 
> a Graph Pattern is either a Basic Graph Pattern,
> a Group Pattern, and Optional Graph Pattern, ... 
> or a Pattern on Named Graphs.

OK - I think you advocate a terse style of definition.  I can move the list 
outside the definition (in the last working draft it was but that was only 
because the HTML was broken!).

> 
> add to this list: a Constraint Pattern; more below.

See comments below.  Not sure about the combination with another pattern.

> 
> 
> 
>>Definition: SPARQL Query 
>>
>>A SPARQL query is a tuple consisting of a graph pattern, an RDF
>>dataset description, a number of solution sequence modifiers and a
>>result form.
>>
> 
> Dataset description? Not just a dataset?
> hmm. I wonder where that comes in...
> It doesn't seem to show up anywhere else.

Could do - the query has dataset, and FROM/FROM NAMED are a description.
How the description gets turns into the datset for the query is left open.  I 
want to avoid talking about GET (if http://) or what aggregators might do with URIs.

> As I think I suggested earlier, the semantics of FROM and FROM NAMED
> should be specified by reference to the protocol, not as part of
> the query language formalism.

The QL should not depend on the protocol:

1/ The QL can be used without the protocol
2/ If the protocol does not specify the dataset, it comes from the query syntax 
so we end out looping here.

> 
> That reminds me... the protocol spec needs to say how
> to get from the default-graph-uri and the
> named-graph-uri's to the corresponding graphs.
> I'll try to remember to send mail about that separately.
> 
> 
>>Definition: Query Pattern 
>>
>>A query has one main graph pattern called the Query Pattern.
>>
> 
> A SPARQL Query has...
> 
> Or:
> 
> If (GP, D, M, R) is a SPARQL Query, then GP is its Query Pattern.
> 
> This whole definition might be left out, as it's reasonably
> clearly implicit in the defintion of SPARQL Query.

Agreed.

> 
> 
>>Definition: Triple Pattern 
>>
>>A triple pattern is member of the set:
>>(RDF-T union V) x (RDF-U union V) x (RDF-T union V)
>>
>>Definition: Solution 
> 
> 
> Solution or Substitution?

Mixup - I'll read for consistency.

> 
>>A substitution is a function from a subset of the set of variables to
>>the set of RDF terms, RDF-T.
>>
>>The result of replacing every v in a graph pattern P by S(v) is
>>written S(P).
>>
>>We say that S is a solution of P if P matches G with substitution S.
>>
>>Definition: Basic Graph Pattern 
>>
>>A Basic Graph Pattern is a set of Triple Patterns.
>>
>>A basic graph pattern matches on graph G with substitution S if S(GP)
>>is entailed by G.
>>
>>Definition: Query Solution 
>>
>>A Query Solution is a Pattern Solution for the Query Pattern. A
>>substitution in a query solution only contains variables mentioned in
>>the query.
>>
> 
> suggest:
> 
> Given a Query Pattern, a Query Solution is one of its Pattern Solutions
> whose substitutions only contains variables mentioned in the Query
> Pattern.

I don't see the need for a restriction to only contain variables mentioned in 
the query.  Is there a technical reason for this?

Allowing other variables/values in a solution does no harm as far as I can see 
and allows for later extension to nested queries.

I have to change the Query Results from saying plain "all" because of variable 
inference capabilities.

> 
> or:
> 
> Given a Query Q = (QP, D, M, R), S is a Query Solution for Q iff
> S is a Pattern Solution for QP and every variable in the domain
> of S is mentioned in QP.

I'll adopt that tuple-defintion of query.

> 
> 
> 
>>Definition: Query Results 
>>
>>The Query Results, for a given graph pattern GP on graph G, is written
>>R(GP,G), and is the set of all query solutions S such that S is a
>>solution for GP on G.
> 
> 
> hmm... rather: of all pattern solutions S, no?
> 
> I'm looking for the definition of Pattern Solution and I don't see it.
> Hmm.

It was merged into substitutions because substitutions are solutions - different 
words for different points of view.  Must be some junk around still.  I see if I 
can pick one over the other - if not, I'll put back a defn for pattern solution

> 
> 
>>R(GP, G) may be the empty set. 
>>
>>Definition: Value Constraint 
>>
>>A graph pattern may involve a value constraint, which is a
>>boolean-valued expression of variables and RDF Terms that restricts
>>query solutions.
> 
> 
> Umm... "may involve"... the definition of Graph Pattern given
> didn't involve value constraints.

It did in the first sentence:

A Graph Pattern is constructed from triple patterns and value
constraints, using the following operations to combine patterns:

(list of pattern operators follows).

If I put the list outside the defn, have a sentence list of all possibilities, 
that should be clearer by being more uniform in the treatment of constraints.


> Suggest rather adding
> Constraint Pattern above and:
> 
> Definition: Value Constraint
> 
> A Value Constraint is a boolean-valued expression of variables
> and RDF Terms.
> 
> Definition: Constraint Pattern
> 
> A Constraint Pattern is a pair of a Value Constraint
> and a Graph Pattern. S is a solution for a Constraint
> Pattern (C, GP) iff C(S) is true and S is a solution for GP.

Hmm ... thinks ...

How about defining a value constraint C as accepting S if C(S) is true?

----
Definition: Value Constraint

A Value Constraint is a boolean-valued expression of variables
and RDF Terms.

For constriant C, a solution S matches C is C(S) is true.
----

The problem is that we have used the term "matches" - a solution causes a 
pattern to match and it's a bit odd using "match" but using the same work seems 
better.

Tieing to a pattern does not work out completely:

Example 1:

SELECT WHERE
{
   ?x rdf:type :foo
   :y :q ?v
   { ?x :p ?v } UNION { FILTER ?v < 3 }
}

either the triple { ?x :p ?v } is in the graph or the filter is true.

Example 2:

SELECT * WHERE {
   { ?x :p ?v }
   { FILTER ?v < 3 }
}

where the FILTER is in a group of one, no pattern in the group.

Otherwise, it might be better to have a grammar to require FILTER to be 
associated with a pattern of some kind.  Something like:

     pattern constraint*

[unchecked if that grammar change works]

but now we have order in the query and this is syntactically illegal:

SELECT * WHERE {
     FILTER ?v < 3
     ?x :p ?v
}

and the UNION example also fails.

Our filters are not quite general patterns:
   1/ Evaluation is according to XPath/XQuery F&O rules
   2/ May be n-ary

you don't write binary operators as
    (?x 3) math:lessMath ?result
which is what it woudl be for the generalization to n-ary

Constraints may be implemented as graph patterns - but they may not.

> 
>>Definition: Group Graph Pattern 
>>
>>A Group Graph Pattern GP is a set of graph patterns, GPi.
>>
>>A solution of Group Graph Pattern GP on graph G is any solution S such
>>that, for every element GPi of GP, S is a solution of GPi.
>>
>>Definition: Optional Graph Pattern 
>>
>>An Optional Graph Pattern is a graph pattern that can create new
>>solutions, but will always match.
>>
>>Given graph pattern GP, the Optional Graph Pattern Opt(GP) matches
>>with solution S if S is a solution for a match of GP, or S is not a
>>solution for GP.
> 
> 
> That doesn't work with my suggested clarification of Graph Pattern.
> I think you had it better earlier, when an Optional Graph Pattern
> had two parts:
> 
> An Optional Graph Pattern is a pair of graph patterns (R, O).
> S is a solution for the optional graph pattern (R, O) iff
>   - S is a solution for R and S is not a solution for O, or
>   - S is a solution for the merge of R and O

Tieing to another pattern was something I was trying to avoid when making a change.

Maybe explaining this matching can be explained outside definition.


> 
> 
> 
>>Definition: Union Graph Pattern 
>>
>>A union graph pattern is a set of graph patterns GP i .
>>
>>A union graph pattern matches a graph G with substitution S if there
>>is some GP i such that GP i matches G with substitution S.
> 
> 
> OK.
> 
> 
>>Definition: RDF Dataset 
>>
>>An RDF dataset is a set = { G, (u1, G1), (u2, G2), . . . (un, Gn) }
>>where G and each Gi are graphs, and each ui is a URI. Each ui is
>>distinct.
>>
>>G is called the default graph. Gi are named graphs.
> 
> 
> OK
> 
> 
>>Definition: DataSet Graph Pattern 
>>
>>If D is a dataset {G, (<u1> G1), ...}, and P is a graph pattern then S
>>is a pattern solution of GRAPH(g, P) if:
>>
>>g is a URI where g = <ui> for some i, and S is pattern solution of P
>>on Gi or g is a variable, S maps the variable g to <ui> and S is a
>>pattern solution of P on Gi.
> 
> 
> OK
> 
> Hmm... all the other definitions of graph pattern solution are
> given w.r.t. just a graph, not a whole dataset. Ooops.
> We should specify how they work with a whole DataSet.

Agreed.  Need to say that is a graph that is being matched against and GRAPH 
chnages it.

I'll deal with the solution sequence comments separately.

<snip/>

 Andy
Received on Thursday, 19 May 2005 10:46:17 UTC