Re: SPARQL / Language spec ready for review

On Fri, 01 Oct 2004 17:28:35 +0100, "Seaborne, Andy" <andy.seaborne@hp.com> wrote:

> Dave, Steve, Howard,
> 
> The SPARQL language doc is ready for review in preparation for the telcon next 
> Tuesday.  Version v1.73 (or later) of:
> 
>      http://www.w3.org/2001/sw/DataAccess/rq23/
> 
> The intention is to publish this rough-and-ready version, complete with editors 
> notes and comments.  This will enable early review, showing where we are going 
> and enable feedback from the more dedicated part of the community.  There is 
> still much to do in the document but we hope that this public working draft will 
> indicate the directions we are taking even if the detail is still to be done.

This is my initial set of comments, not complete but requested by Eric
as I had made it available to him as I drafted it.  I may/will change my
opinion on some items after I have a readthrough when I get to the end
in a few more hours.

Dave

----

General:
A thorough spellcheck is needed.

Label all examples with Numbers, titles and add anchors.
Add all example queries, data files as separate files with URIs, link
to them.  Add them to the test suite.
Add labels and anchors to all definitions.

Do not use underlining in the html style when it isn't a link.

In query results, some of the tables use ?x and some use bare x.
Some results use both!


Must Fix

Consistency in use of individuals, sets of individuals
examples:
  b in B used ok
  T defined as a set and used as a member of that set, also defined
  as tp.  T in GP should be tp in GP.

See also MUSTFIX below


Editorial comments - should fix


Title: SPARQL title does not mention protocol despite the 'P' in the
name.  

Later on the document suggests that protocol is a separate document.


Abstract
typo: "end users [missing words] to write"

ToC
missing 4.3
8 "Chosing What to Query" to match document capitals
12.2 ditto
Appendices labelled 1,2 actually A, B in doc

suggest removing see also, old material.  It's not ToC.


1 Introduction

MUSTFIX: First sentence is wrong.

The abstract syntax for RDF is not a "graph of nodes and arcs, often
expressed as triples".  It is a set of triples called an RDF graph
formally defined in RDF semantics.  It can be and is often described
as a graph of nodes and arcs but RDF is not nodes+arcs; that was an
RDF core decision closely argued.

preference to graph "created dynamically" than "partly calculated on demand"


(un-numbered section) Document Outline
@@variables bound@@, @@bindings@@ can be linked to forward references

"10 - Summary" doesn't match the style of the other paragraphs - no
explanation


2 Making Simple Patterns
last sentence preference to "[Simple] patterns can be ..."

[All graph pictures are unreadable when printed out, too dark.
Please re-compose on a light background or with much greater
contrast. black on gray doesn't work.]

First example.  I suggest not using _:1 _:2 since it's not legal in
N3, Turtle, N-Triples for blank node labels.  I think a small edit
can make the first example executable, testable.

I'd prefer full names for variables, for easy of readability
especially by non-native english speakers.  So 'address' not 'addr'
and something else instead of 'addrm'


2.1
P2
URIref expand to URI Reference for first use. Or use the
correct definition RDF URI Reference and link to it.
grammar - "XML. Qname" - delete the "."
Link to QName in XML sepcs.
datatype URIRef not URI

Para "Because.."
here and later I see "URIs used" - check for consistency.  I suggest
s/URI/URIref/ throughout

N3/Turtle used without a reference, explanation.

Spellings "intpretted"

Para "Prefixes are..."
refering to an earlier query, but it doesn't say which of the three
previous it means.  Suggest "same query as the previous one"


2.2 Triple Examples

P1 grammar s/for for/for/

P2 "bnodes" introduced without explanation.  Should
be "blank node labels" [ref RDF docs] abbreviated to BNodes.
Doesn't say which positions that bnodes can be used in.


Definition RDF Term

This implies that query variables are in the RDF data model since
they are along with U, L and BN.  I suggest moving to another
block since V is not used till later.  Maybe after/near Query Variable?

Definition Query Variable
This defines an individual, all the RDF Term definitions are sets.
No letter is assigned to typically use it.
Suggest "A query variable qv".  OR define the set Q.

Defn. Triple Pattern
(spelling, grammar)
"A triple pattern is [a] triple of 3 slots subject, predicate, object .."

MUSTFIX: "union Q" <- Q is never defined.  Q presumably is a set of
  Query Variables, in which case it is NOT Q, but a set of qv, or
  define Q as a set of qv.

This also defines 'ground' but that is not pulled out.  Suggest
make it a separate 'Definition: Ground' block.


Definition Binding
suggest use B for variable, as they are used uppercase elswhere too.
Suggest give an example for the convention for writing down a binding
such as (f, "value") or ?f="value" or the tabular form
---------
|  ?f   |
---------
|"value"|
---------

Suggest give an example of a set of bindings such as
{?f="value", ?g="value2"} or the tabular form given later.

Definition A substitution
suggest uppercase "Substitution"
Suggest not using B as a set of Bindings, but use SB or something
to differ from lowercase 'b' as an individual binding.
So this is a mapping S(set of b)

How can a set of bindings define a substitution?
Suggest rewording
"A substitution S(B) on a set of bindings B maps a triple pattern ..."
suggest ... "by the corresponding [variable] value"

Suggest putting a subst() example.


Definition Triple Pattern Matching

MUSTFIX: I think there is a triple pattern/set of triple pattern
  issue here unless you are solely comparing a graph with one triple.

  T was earlier defined as a set of triple pattern. So subst(T, b in
  B) is not a substitution of a triple pattern, but of a set of
  triple patterns (and a binding b in B).  Could re-use tp in T which
  was used in defining ground, and define subst(tp in T, b in B).
  Then edit to match such as 'Triple Pattern tp matches ...'

Use of entails, reference/link to RDF entailment.

rdfs: prefix is used in the second data, this was not defined as
convention earlier.  brql/sparql predefines rdf: but not rdfs:?


2.3 Graph patterns

P1 "There are bNodes"  No, there is 1.
grammar: "not in the RDF graph [nor in] any query"

Para "The next query.." but there is no query following.  Confused.
Does that mean the query just given
Also grammar:
  "one or more triple patterns which must all match for the graph
  pattern to match."
- the 'all' and 'one or more' say different things.  Is it all or 1?

Maybe the definition following explains better, remove?


Definition: Graph pattern

MUSTFIX:
 "A conjunctive Graph Pattern GP is a set of triple patterns T."

  T was earlier defined as;
    "let T be the set of triple patterns := A x A x A"

  So GP=T ?

  Not quite what was meant.  GP is set of tp, where 
  tp is a Triple Pattern in T.

Maybe triple pattern & triple patterns are too hard to use and make
nice sentences.  Other suggestions ; triple pattern set.


Defn: Graph Pattern - Conjunction

Defines "conjunctive Graph Pattern" not the title of the definition.
html - underlining doesn't match too


Defn: Graph pattern Matching

Hmm, confused by "same" in:
"For a graph pattern to match, each triple pattern must match with
each query variable having the same value whereever it occurs."

suggestions

"For a graph pattern GP to match, all triple patterns tp in GP must
 match with all query variables in all tp having the same value."

This actually defines "Graph Pattern GP matches", not
"Graph Pattern Matching"

Using T in GP which is a (set of triple patterns).  Probably should
be tp in GP.

MUSTFIX:
  [[ 
  For all T in GP, subst(T, B) is a triple entailed by G.
  subst(GP, B) is the graph pattern formed by subst(T, B) for all T in GP.
  subst(GP, B) is a subgraph entailed by G if all triple patterns are grounded.
  ]]

  This is reusing subst(t in TP, b in B) redefined over graphs
  I suggest changing the name to graphsubst(GP, B) to distinguish it.
  subst(T in TP, b in B) returns a triple pattern, may not be ground.

  Suggestion:
    For all tp in GP, subst(tp, B) is a triple pattern entailed by G.
    graphsubst(GP, B) is the graph pattern formed by subst(tp, B) for
      all tp in GP. 
    graphsubst(GP, B) is a subgraph entailed by G if all triple
      patterns are grounded.


2.4 Multiple Matches

  "The results of query are all the ways a query can match the graph
  being queried.  Each match is one solution to the query and there
  may be zero, one or multiple solutions to a query, depending on the
  data."

This uses "results", "solutions" and "matches", not in the same was
as previously defined. I suggest using results only, and use match
to mean graph matches, triple matches as used above:

  "2.4 Multiple results

  The results of query are all the ways a query can match the graph
  being queried.  Each result is one solution to the query and there
  may be zero, one or multiple results to a query, depending on the
  data."

Aside: A query actually hasn't been defined yet.  It's hinted that it
is something to do with graph pattern, but it hasn't been said so
far. i.e. no.

Or if sticking with "matching" make it clearer what the difference
between a result and a solution is.

Example query has commas between variables.  Die.

  "When the query can match the data in more than one way, each
  possibility is returned as a solution to the query.  In addition, we
  have more than one selected variable so each solution contains two
  bindings of variables to values."

so now there are results, query matches, solutions and possibilities :)
Query matching data hasn't been discussed.  Graph patterns matching
Graphs has been, could be reused. Could also refer to sets of bindings.

... and now Query Solution is given.

definition Query Solution:
  "For conjucntion graph pattern GP, subst(GP, B), has no variables."
spelling: conjunction. 
Also could add ".. and is a set of ground triple patterns" or possibly
define a Ground Graph Pattern.


3 Constraining Values

(Here the query uses selected variables without a comma)


Definition: Value Constraint
  "A value constraint is a boolean expression that can be applied to
  restrict graph pattern solutions."
For me that doesn't read as an expression that can refer to
non-boolean things as parts of the expression but which has a boolean
value.


Definition: Query Stage (partial definition).

  "Graph Pattern (set of triple patterns) + set of Value
  Constraints. QS : GP x VR"

+ and x ? + doesn't mean addition here but...?  You cannot
join/merge a set of triple patterns and a set of value constraints.

spelling in comment: [[ operations [like] "source"  ]]

I prefer Query Block.


4 Including Optional Values

.... Review to continue from here ...

Received on Monday, 4 October 2004 12:45:07 UTC