W3C home > Mailing lists > Public > public-rdf-dawg@w3.org > January to March 2011

Re: Review SPARQL Query 1.1, Section 18 (algebra)

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Fri, 18 Mar 2011 13:52:55 +0000
Message-ID: <4D8363B7.7050104@epimorphics.com>
To: Birte Glimm <birte.glimm@comlab.ox.ac.uk>
CC: SPARQL Working Group <public-rdf-dawg@w3.org>, Steve Harris <steve.harris@garlik.com>
> The algebra operations in this section should be moved to
> Section 18, where it should be clarified how these algebra operations
> are introduced in the translation from the abstract syntax to algebra
> objects. I still have a major concern here and that is really a no-go
> issue IMO: The pattern in FILTER [NOT] EXISTS pattern is not
> translated into algebra objects. Thus, the Substitute and Exists
> functions are not even used yet.

Added a translation step for (NOT) EXISTS in FILTERs.

Moved the definitions into sec 18 (algrebra and evaluation)

> I also believe the Substitute should be defined such that it only
> applies to terms in the domain of μ. E.g.,
> substitute(pattern, μ) = the pattern formed by replacing every
> occurrence of a variable v in pattern for which μ is defined by μ(v).


> In general, there is quite a mix of different pseudo code style:
> IF ... THEN
> ...
> FI
> If ... Then
> ...
> End If
> If ... Then
> ...
> (no end for the if)
> If ... (no then)
> ...
> Totally mixed use of upper and lower case and even mix of styles within
> one pseudo code section. This should really be unified.

It should be "If Then .. End" except where it is sufficiently short that 
indentation makes it clear.

> 18.1.1: IRIs are a subset of RDF URI References that omit (no s I
> believe) the use of spaces.

I read it as "subset ... omits"

> 18.1.7 Property Path (no s) Patterns


> *A* Property Path is a path in *a* graph G if each ti is a triple of
> G.


> Definition: Property Path Expression
> A property path expression is an expression used to match properties
> in a graph formed after translation of the path syntax as defined
> <link>above</link>.
> The link just links to that very definition.

It's a broken link so your browser does not move focus.  Fixed link.

> I also assume you mean
> below ( Even then, this definition is not clear to me. I
> would define a Property Path Expression (PPE) inductively as follows:
> iri, !(iri_1|...|iri_n), !(^iri_1|...|^iri_n),
> !(iri_1|...|iri_i|^iri_i+1|...|^iri_n)
> are PPEs and, for P, P1, and P2 PPEs:
> ^P, !P, P_1 | P_2, P_1 / P_2, ...
> are PPEs.

It now links to  I don't want to duplicate the translation table.

> Also note that Section 9.1 is called Property Path expressions,

Changed to "Property Path Syntax".  The text already says that elt can 
be composed of path syntax constructs.

> 18.2 The SPARQL query string is parsed and the abbreviations for IRIs
> and triple patterns given in *a* (delete) section 4 are applied.


> The table for the syntax tree misses BINDINGS as solution modifier and
> Extends should be Extend (no s).

I put it under "patterns" -- it isn't a solution modifier.  It is a 
concrete RDF terms table so it is most like a pattern but I added an 
"Other" column.

> 18.2.1 We define a variable to be in-scope if there is a way for a
> variable to be in the domain *of* a solution mapping...


> In-scope definition:
> I suggest to be specific and use "Property Path Expression (PPP)" and
> "v occurs in PPP" for the second entry.

Variable scope is defined in terms of syntax.

Property Path Expression is after syntax translation.

> OPTIONAL{P} misses a space: OPTIONAL {P}


> Below the table: In SELECT expressions, the variable may *be* use*d*
> in an expression later in the same SELECT clause and may not be *be* (duplicate)
> assigned again in the same SELECT clause. May and may not could/should
> be marked as RFC 2119 terms. I would even tend to say MUST NOT be
> assigned again.


> Notes: ...The order of forms URI and ^URI in negated
> property sets is not relevant.
> URI should be IRI
> The parsing step interprete*s* (not interpreted) triple patterns as
> property paths of length 1.


> I suggest the notation of the table is unified with the earlier
> example section, where you use elt instead of path.

I think it is OK as it is - thje emphasis is changing to the path being 

> Also :iri in the
> table needs no :, but apart from row 1 all other iris are :iris.
> In general, I don't believe you can get away with not having a
> recursive translation. I suggest to define something like:
> Given a PPE X pp Y, we define the translation T(X, pp, Y) as follows
> and then give a table with left-hand side listing possible T(x, pp, Y)
> and the right-hand side the result, which is recursive:
> T(X, iri, Y) = X iri Y
> T(X, !(iri_1|...|iri_n), Y) = NPS(X, {iri_1, ..., iri_n, Y)
> T(X, !(^iri_1|...|^iri_n), Y) = T(X, ^(!(iri_1, ..., iri_n)), Y)
> T(X, !(iri_1|...|iri_i | ^iri_i+1|...|^iri_n), Y) = { T(X, !(iri_1,
> ..., iri_i), Y) } UNION { T(X, !(^iri_i+1|...|^iriiri_nn), Y) }
> T(X, ^path, Y) = T(Y, path, X)
> T(X, path1 / path2, Y) = T(X, path1, ?v) . T(?v, path2, Y)
> etc

This is what it is trying to say, using the column headings:

Syntax Form (path)	translate(path)

Added "It is applied recursively to the path syntax."

> If E is *of* the form MINUS {P}
> IMO, the patterns in [NOT] EXISTS filters must be translated here
> too.


> extra spacing before the second simplification form

Done (the wonders of xmlpsec in Emacs)

> 18.2.3
> Some examples have just one translation whereas others have two. I
> suggest to add a sentence explaining this, e.g.:
> If just one algebra translations is given, then this is the simplified
> one, whereas if two translations are given, the first one is the
> non-simplified one and the second one is the simplified one.

Already has
The second form of a rewrite example is the first with empty group joins 
removed by the simplification step.

> Forth example: group consisting *of* a union of a union and a basic
> graph pattern:
> Fifth example: extra brace after the second join

Closes the inner union

> Example: Pattern involving BIND, Extend(...) is written all lower case
> Minus example: closing brace missing
> Subquery example, should be:
> Join(
>    Bgp(...),
>    ToMultiSet(
>      Distinct(Project(Bgp(...), {?o}))
>    )
> )
> Currently it misses the closing brace and the proper translation of
> Project.
> Note that I use a set here for the project variables, which is what
> the Project operation expects, but the algebra translation produces a
> list at the moment.


> P should probably be a set not a list since tis is expected later in
> the algebra evaluation and makes more sense. On that note, is SELECT
> ?x ?x allowed in SPARQL? Should it just skip one ?x, produce two
> result columns for ?x or give an error?

Defn of project is a set of variables so changed.

> The note following the algorithm should also mention that an error
> arises when variable as named target occurred in a previous AS
> assignment. Extend in the algorithm is written all lower case (extend(...)
> instead of Extend(...)).


> has still [@@ link]


> also note that here the projection vars are expected to be a set not a
> list and i should say "all named variables *that are in-scope* in the
> query if SELECT * is used


> If the clause is ... in the co*r*responding position


> 18.3.1
> Basic graph patterns form the basis of SPARQL pattern matching.
> That is no longer true since now also PPEs are basic elements that
> require matching and generate solution mappings.

I think the sentence is still true in intent.  But as this section is 
more formal, it's been removed.

More on the path implications later (separate email).

> It might make sense to move the PPE stuff from 18.4 to Section 18.3
> too since so far the structure is to first treat basic elements,
> i.e., operations that generate solutions, and then introduce the
> higher level operations that work with the solutions generated from
> the basic steps.
> It might also make sens to label 18.3 and 18.4 "Basic Algebra
> Expressions" and "Complex Algebra Expressions", respectively.

BGPs are being made special for entailment extensions.  Specifically, 
they are not combinations of more basic operations.  This structure is 
from SPARQL 1.0 and I think its better to leave it as it is.

> 18.4 Definition: Filter. This no longer works for [NOT] EXISTS since
> evaluating the expressions needs the dataset with its active graph
> D(G).

See earlier.

> "There are 4 property path operators in the SPARQL algebra"
> but then only three definitions follow: NegatedPropertySet is
> missing.

Added.  Defn was in the eval part.

> I find the  given definitions not very clear. They are not really
> defining anything; it is more an intuitive explanation. I suggest to
> define the evaluation properly here (or in 18.3 if they are moved to
> 18.3), i.e., move the definitions of the evaluation of these functions
> here. This would make the structure more consistent.

It's a balance. It is consistent to define the names here, even if the 
evaluation has the details, along with all SPARQL operators

> Definition: Extend
> The function in the definition and in some other places is written all
> lower case, to be consistent it should be Extend(...)
> extend is undefined when var in dom(μ).
> is the only item not indented as the others
> @@ Define the case for var in dom(μ) (does not arise in SELECT
> expressions)
> is done as far as I can tell.

> 18.5
> We define eval(D(G), graph pattern)
> should probably be
> We define eval(D(G), algebra object) since we don't work on graph
> patterns here. Even of that was to mean algebra object for a graph
> pattern that's not really correct since the section also defines the
> evaluation of Project etc.


> I also find the order very random, I suggest to start with Bgp(...)
> and the PPEs, just referring back to Section 18.3 (where I would
> define not only BGP evaluation but also PPE evaluation)

Partly done.  It's from SPARQL 1..  Moved BGP first.

PPE are not treated like BGPs - see separate thread.

> Also note that for Bgp(...) the link says
> See section 12.3 Basic Graph Patterns
> whereas it should say 18.3 (the link is correct)


> Definition: Evaluation of ZeroLengthPath
> I suggest to define Nodes(G) for a graph G above the ZeroLengthPath
> definition since it is also used elsewhere, e.g., in the evaluation of
> OneOrMorePath.

Good idea - done.

It would be so much easier is an RDF graph were defined in the same way 
as a mathematical graph : node set and edges.

> The notion xxx:type is used here for the first time and it might b
> worth saying something like: For a term (or variable) x, x:t
> denotes that x is of type t.
> term in nodes(G) should be term in nodes(D(G))
> card[ ] is not clear
> is the last card[ ] not 0 assuming it denotes the cardinality for each
> mapping in the solution?

Yes - corrected.

(well, strictly it can be anything, as the domain is empty!)

> I can't fully understand the function ALP. I guess eval(x, path) needs
> D(G) and then does something like Bgp(x path ?y) evaluated over
> D(G). The path here is a normal predicate I think and it might help to
> make that clear.

I've added that evaluation is in the active graph, rather than write 
that out each time.

> Definition: Evaluation of ZeroOrMorePath
> Should produce multisets of solution mappings, i.e.,
> { { (vy, n) } | n in ALP(x, path) }
> (note extra curly brackets) and not
> { (vy, n) | n in ALP(x, path) }
> same for all eval defs in the definition


> Definition: Evaluation of OneOrMorePath
> eval(D(G), OneOrMorePath(X, path, Y)
> is missing the final closing brace


> still has
> @@Change to algorithmic form??


> Definition: Evaluation of NegatedPropertySet
> where X and Y *are* (not for) variables or RDF terms


> In this definition and the remaining ones, D(G) is required, but only
> D is written


> The second
> Definition: Evaluation of ToList
> should be
> Definition: Evaluation of Subquery

Steve? Is SubQuery used anywhere?  Isn't this ToMultiSet?
@@Steve inserted.

> 18.6
> The overall SPARQL design can be used for queries which assume a more
> elaborate form of entailment than simple entailment, by re-writing the
> matching conditions for basic graph patterns.
> This is no longer true due to PPEs. I am not happy about this at all
> and I assumed that PPEs are optional features. If they are not, it is
> quite unfortunate that the so far existing extension point no longer
> really is one and something has to be done at least to clarify this!

Separate email.

Received on Friday, 18 March 2011 13:53:31 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:01:03 UTC