Re: SPARQL / Language spec ready for review

Reviewing

http://www.w3.org/2001/sw/DataAccess/rq23/

$Log: Overview.html,v $
Revision 1.77  2004/10/03 13:06:28  eric

Completed.  I'll now take a look at 1.77->1.79 changes


Dave


Items that I think must be fixed before publication
---------------------------------------------------

See also MUSTFIX in detailed notes below.  Summarising:

* First sentence in 1. Introduction is wrong.  RDF is a set of triples.

* Consistency in use of individuals, sets of individuals examples:
  b in B used ok however T defined as a set and used as a member of
  that set, also defined as tp.  T in GP should be tp in GP.

  See comments on definitions of Triple Pattern, Triple Pattern
  Matching, Graph Pattern, Graph Pattern Matching

* Initial Binding definition baffles me, I need more explanation.


General Comments
----------------

A thorough spellcheck is needed.

Label all examples with Numbers, titles and add anchors.
Add all example queries, data files as separate files with URIs, link
to them.  Add them to the test suite.
Add labels and anchors to all definitions.

Do not use underlining in the html style when it isn't a link.

In query results, some of the tables use ?x and some use bare x.
Some results use both!

Suggest global s/<tt>OPTIONAL</tt>/optional/ since the OPTIONAL
keyword is never explained in the document and only appears in the
grammar.


Detailed Comments
------------------
These should be fixed but are not critical.


Title: SPARQL title does not mention protocol despite the 'P' in the
name.  

Later on the document suggests that protocol is a separate document.


Abstract
typo: "end users [missing words] to write"

ToC
missing 4.3
8 "Chosing What to Query" to match document capitals
12.2 ditto
Appendices labelled 1,2 actually A, B in doc

suggest removing see also, old material.  It's not ToC.


1 Introduction

MUSTFIX: First sentence is wrong.

The abstract syntax for RDF is not a "graph of nodes and arcs, often
expressed as triples".  It is a set of triples called an RDF graph
formally defined in RDF semantics.  It can be and is often described
as a graph of nodes and arcs but RDF is not nodes+arcs; that was an
RDF core decision closely argued.

preference to graph "created dynamically" than "partly calculated on demand"


(un-numbered section) Document Outline
@@variables bound@@, @@bindings@@ can be linked to forward references

"10 - Summary" doesn't match the style of the other paragraphs - no
explanation


2 Making Simple Patterns
last sentence preference to "[Simple] patterns can be ..."

[All graph pictures are unreadable when printed out, too dark.
Please re-compose on a light background or with much greater
contrast. black on gray doesn't work.]

First example.  I suggest not using _:1 _:2 since it's not legal in
N3, Turtle, N-Triples for blank node labels.  I think a small edit
can make the first example executable, testable.

I'd prefer full names for variables, for easy of readability
especially by non-native english speakers.  So 'address' not 'addr'
and something else instead of 'addrm'


2.1
P2
URIref expand to URI Reference for first use. Or use the
correct definition RDF URI Reference and link to it.
grammar - "XML. Qname" - delete the "."
Link to QName in XML sepcs.
datatype URIRef not URI

Para "Because.."
here and later I see "URIs used" - check for consistency.  I suggest
s/URI/URIref/ throughout

N3/Turtle used without a reference, explanation.

Spellings "intpretted"

Para "Prefixes are..."
refering to an earlier query, but it doesn't say which of the three
previous it means.  Suggest "same query as the previous one"


2.2 Triple Examples

P1 grammar s/for for/for/

P2 "bnodes" introduced without explanation.  Should
be "blank node labels" [ref RDF docs] abbreviated to BNodes.
Doesn't say which positions that bnodes can be used in.


Definition RDF Term

This implies that query variables are in the RDF data model since
they are along with U, L and BN.  I suggest moving to another
block since V is not used till later.  Maybe after/near Query Variable?

Definition Query Variable
This defines an individual, all the RDF Term definitions are sets.
No letter is assigned to typically use it.
Suggest "A query variable qv".  OR define the set Q.

Defn. Triple Pattern
(spelling, grammar)
"A triple pattern is [a] triple of 3 slots subject, predicate, object .."

MUSTFIX: "union Q" <- Q is never defined.  Q presumably is a set of
  Query Variables, in which case it is NOT Q, but a set of qv, or
  define Q as a set of qv.

This also defines 'ground' but that is not pulled out.  Suggest
make it a separate 'Definition: Ground' block.


Definition Binding
suggest use B for variable, as they are used uppercase elswhere too.
Suggest give an example for the convention for writing down a binding
such as (f, "value") or ?f="value" or the tabular form
---------
|  ?f   |
---------
|"value"|
---------

Suggest give an example of a set of bindings such as
{?f="value", ?g="value2"} or the tabular form given later.

Definition A substitution
suggest uppercase "Substitution"
Suggest not using B as a set of Bindings, but use SB or something
to differ from lowercase 'b' as an individual binding.
So this is a mapping S(set of b)

How can a set of bindings define a substitution?
Suggest rewording
"A substitution S(B) on a set of bindings B maps a triple pattern ..."
suggest ... "by the corresponding [variable] value"

Suggest putting a subst() example.


Definition Triple Pattern Matching

MUSTFIX: I think there is a triple pattern/set of triple pattern
  issue here unless you are solely comparing a graph with one triple.

  T was earlier defined as a set of triple pattern. So subst(T, b in
  B) is not a substitution of a triple pattern, but of a set of
  triple patterns (and a binding b in B).  Could re-use tp in T which
  was used in defining ground, and define subst(tp in T, b in B).
  Then edit to match such as 'Triple Pattern tp matches ...'

Use of entails, reference/link to RDF entailment.

rdfs: prefix is used in the second data, this was not defined as
convention earlier.  brql/sparql predefines rdf: but not rdfs:?


2.3 Graph patterns

P1 "There are bNodes"  No, there is 1.
grammar: "not in the RDF graph [nor in] any query"

Para "The next query.." but there is no query following.  Confused.
Does that mean the query just given
Also grammar:
  "one or more triple patterns which must all match for the graph
  pattern to match."
- the 'all' and 'one or more' say different things.  Is it all or 1?

Maybe the definition following explains better, remove?


Definition: Graph pattern

MUSTFIX:
 "A conjunctive Graph Pattern GP is a set of triple patterns T."

  T was earlier defined as;
    "let T be the set of triple patterns := A x A x A"

  So GP=T ?

  Not quite what was meant.  GP is set of tp, where 
  tp is a Triple Pattern in T.

Maybe triple pattern & triple patterns are too hard to use and make
nice sentences.  Other suggestions ; triple pattern set.


Defn: Graph Pattern - Conjunction

Defines "conjunctive Graph Pattern" not the title of the definition.
html - underlining doesn't match too


Defn: Graph pattern Matching

Hmm, confused by "same" in:
"For a graph pattern to match, each triple pattern must match with
each query variable having the same value whereever it occurs."

suggestions

"For a graph pattern GP to match, all triple patterns tp in GP must
 match with all query variables in all tp having the same value."

This actually defines "Graph Pattern GP matches", not
"Graph Pattern Matching"

Using T in GP which is a (set of triple patterns).  Probably should
be tp in GP.

MUSTFIX:
  [[ 
  For all T in GP, subst(T, B) is a triple entailed by G.
  subst(GP, B) is the graph pattern formed by subst(T, B) for all T in GP.
  subst(GP, B) is a subgraph entailed by G if all triple patterns are grounded.
  ]]

  This is reusing subst(t in TP, b in B) redefined over graphs
  I suggest changing the name to graphsubst(GP, B) to distinguish it.
  subst(T in TP, b in B) returns a triple pattern, may not be ground.

  Suggestion:
    For all tp in GP, subst(tp, B) is a triple pattern entailed by G.
    graphsubst(GP, B) is the graph pattern formed by subst(tp, B) for
      all tp in GP. 
    graphsubst(GP, B) is a subgraph entailed by G if all triple
      patterns are grounded.


2.4 Multiple Matches

  "The results of query are all the ways a query can match the graph
  being queried.  Each match is one solution to the query and there
  may be zero, one or multiple solutions to a query, depending on the
  data."

This uses "results", "solutions" and "matches", not in the same was
as previously defined. I suggest using results only, and use match
to mean graph matches, triple matches as used above:

  "2.4 Multiple results

  The results of query are all the ways a query can match the graph
  being queried.  Each result is one solution to the query and there
  may be zero, one or multiple results to a query, depending on the
  data."

Aside: A query actually hasn't been defined yet.  It's hinted that it
is something to do with graph pattern, but it hasn't been said so
far. i.e. no.

Or if sticking with "matching" make it clearer what the difference
between a result and a solution is.

Example query has commas between variables.  Die.

  "When the query can match the data in more than one way, each
  possibility is returned as a solution to the query.  In addition, we
  have more than one selected variable so each solution contains two
  bindings of variables to values."

so now there are results, query matches, solutions and possibilities :)
Query matching data hasn't been discussed.  Graph patterns matching
Graphs has been, could be reused. Could also refer to sets of bindings.

... and now Query Solution is given.

definition Query Solution:
  "For conjucntion graph pattern GP, subst(GP, B), has no variables."
spelling: conjunction. 
Also could add ".. and is a set of ground triple patterns" or possibly
define a Ground Graph Pattern.


3 Constraining Values

(Here the query uses selected variables without a comma)


Definition: Value Constraint
  "A value constraint is a boolean expression that can be applied to
  restrict graph pattern solutions."
For me that doesn't read as an expression that can refer to
non-boolean things as parts of the expression but which has a boolean
value.


Definition: Query Stage (partial definition).

  "Graph Pattern (set of triple patterns) + set of Value
  Constraints. QS : GP x VR"

+ and x ? + doesn't mean addition here but...?  You cannot
join/merge a set of triple patterns and a set of value constraints.

VR is not defined.  Presumably means a set of value constraints.
Later on VC seems to be used for that.

spelling in comment: [[ operations [like] "source"  ]]

I prefer Query Block.


4 Including Optional Values

grammar
"The graph matching and value constraints [presented] so far ..."

[here select vars have no commas]

html/spelling "there is [an] mbox" - make mbox <tt> too, like in
previous para

"Failure to match does not ..."
suggest
"failure to match any of the triples in the optional block does not ..."

spelling "optional block" not bock


4.2 Multiple Optional Blocks

"Multiple OPTIONAL blocks "
so far the OPTIONAL keyword has not been mentioned, and indeed it is
not given in this section either.  Suggest s/<tt>OPTIONAL</tt>/optional/
in 4.2

The constraints on variables seem to allow the same optional variable
to be bound in different nested optional blocks, as long as they are
not at the "given level of nesting" or "in the same containing block".

Those two constraints seem to clash or at least constrain it in two
ways of which I'm not sure is complete.  Level of nesting presumably
doesn't mean, anywhere inside 2 []s. 

How about these:

Graph Pattern 1:
( ?q :a :a )
[ ( ?q :b ?x) ]
[ ( ?q :b ?y) ]
[ ( ?q :b ?x) ] <- same level of nesting, same containing block FORBIDDEN

Graph Pattern 2:
( ?q :a :a )
[
 [ ( ?q :b ?x) ]
 [ ( ?q :b ?y) ]
]
[ ( ?q :b ?x) ] <- different level of nesting, containing block, allowed?



4.3 Optional Matching

Definition: Initial Binding

  "The result of a query stage,QS = (GP, VC), with an initial binding
  B, has Query Result where all the bindings in B are valid (the graph
  pattern and any value constraints in QS).

  B extended with addition bindings given by matching subst(GP, B)
  and constraining with VC."

VC is used here, never defined.  Presumably refers to Value Constraint
However Query Stage was earlier (partially) defined as QS: GP x VR

grammar: "has [a] Query Result", "B [is] extended with addition[al] .."

MUSTFIX: More substantially; after several re-readings, I don't
understand this definition.  Can I ask for some more explanation
please?


Definition: Graph Pattern - Optional Match

  "An optional match of QS, with initial binding B, the match of QS
  with initial binding B if there exists at least one solution, and is
  B otherwise."

grammar: "binding B, [is] the match..."

That seems to define an optional match of a query stage, not of a
graph pattern.  Is the definition title correct?


5 Nested Patterns

Nesting was already mentioned in 4.2.

Definition: Graph Pattern - Nesting

This definition I note, excludes nested VC  - good!

The example query uses ()s for nesting (you should mention it before the
example what the extra ones are for (which is like lisp (like this)))

  "Since this definition makes a inner pattern just be a conjunctive
  element of the outher pattern, and because a graph patterns of
  triple patterns is also the conjunction, this is the same as:"

spelling: outher=>outer
grammar: "because [] graph patterns of [graph] patterns [are] also []
  conjunctions ..."


  "Optional blocks can be nested. The outer optional must match for any
  inner ones to apply.  That is, the outer optional triple patterns is
  fixed for the purposes of any inner optional block."

s/triple patterns/graph pattern/
grammar: "optional [block]"
Let me use that to read from:
  "Optional blocks can be nested. The outer optional block must match
  for any inner ones to apply.  That is, the outer optional graph
  pattern is fixed for the purposes of any inner optional block."

So it means, using nested optional patterns are essentially
subqueries where the outer optional graph pattern is used as a
must-match graph pattern and the inner optional blocks relative to
that as optional graph patterns

Query result has typo in gname result #3: "EveE should be "Eve"

grammar: "... query only access[es] these ..."

This example does hint at the usefulness of the nested patterns
however I think the details of the operation and restrictions on
binding with optionals are incomplete.  Maybe add more words to the
intro status for this section re completedness.


Sections 6-7: Placeholders
Not reviewed


8 Choosing What to Query

Definition: Target graph

"The target graph of a query."

Ok, this must be a sketch.  Especially with the current discussion of
graphs.  Maybe expand a little,  "... to which a query may be applied".
I recall that we discussed these words and ended up pruning them.

[[
SELECT ...
FROM <uri1>, <uri2>
]]

Commas, die

grammar: "Implementations [may] provide "


9 Querying the Origin of Statements

The status here probably needs expanding to "under discussion and will change"

"the following term."
I guess "term" should be triple pattern or nested graph pattern?
Those are the two choices I think.


Note " As with OPTIONAL, a variable that is bound to NULL must not
  match another variable that is bound to NULL. "

seems to be worthy of being in the body rather than parenthetical to
the main text.

Can you delete the red text?  All that was notes from 2 FTFs in the
past, we've discussed a lot more things since then and have an issues
list to track things too.


10 Summary of Query Patterns

Link to the definitions of all the terms here

Suggest you use QP for query pattern rather than GP - confuses with
graph pattern.

I don't think it's possible to apply the term 'matches' to
all the elements given here.  match is only defined for triple
patterns and graph patterns.

Could just add a status note to this section that it is initial
draft.


11 Query Forms

+ status note?

  "These result forms use the bindings in the query results to form
  result sets or RDF graphs."

what's a result set?  there are Query Results (set of bindings)
and Query Solution.  This is the first mention of result set.
Is it not a set of solutions?


spelling: "Returns either [an] RDF graph that ..."


11.1 Choosing which Variables to Return
SELECT DISTINCT

  "The result set can be modified by adding the DISTINCT keyword which
  ensures that every set of variables for a query solution is different
  from the other sets of variables returned.  Thought of as a table,
  each row is differen"

"set of variables" should be Query Result; it's the variable names
and values that matter (Bindings)


11.2 Constructing an Output Graph

  "If no pattern is supplied, instead "*" is used,"

s/pattern/graph template/

That might be better as
  "*" indicates an empty graph template is supplied.

however that isn't quite right, as when an empty graph template is
used, the variables are instead substituted into the query pattern.
So maybe should be
  "*" indicates that the graph template is the query pattern.
2 paragraphs later, this is spelt out in more detail.

"... each matching of the query pattern."
=> each solution?

  "The form CONSTRUCT * WHERE {query pattern} is shorthand for
  CONSTRUCT {pattern} WHERE {pattern}, that is, the query pattern is
  the same as the construct pattern.

Consistency here and elsewhere in 11.2 - use of graph template and
construct pattern for the same thing.

WHERE {.. }s should be real examples and not using {}s

Prefer re-ordering to:
 "... signifies the construct pattern[graph template] is the query  pattern"


11.3 Descriptions of Resources
placeholder text.

syntax - n3 needs adding proper example namespace URIs


11.4 Asking "yes or no" questions

Add a Query Result with either YES or NO suggested format


12 Testing Values
placeholder text.


12.2 Extending Value Testing
placeholder text.



A. SPARQL Grammar

Some of my previous comments in [1] still apply such as:
 * Die CommaOpt
 * Use FOO+ not FOO FOO? for one or more
 * OPTIONAL keyword
 * A ::= B with only one use of A (all non-terminals) should be inlined
 * E/BNF used has no reference.  Preference to XML's

Additional:

What does SOURCE * mean ?

Add some comments to say why NCCAME, NCCHAR1 is done like this.
Pattern Literal needs expanding too

No idea what (~[">"," "])* means without consulting some EBNF
documentation; where's that from?  complement of set?


B. References

W3C style fixes needed - expanding to have URIs, latest versions,
dates, organisations.

Check they are cited in the document

Received on Monday, 4 October 2004 14:23:51 UTC