comments on section 12 (and a little more) from Pat Hayes on 2007-03-19 (public-rdf-dawg@w3.org from January to March 2007)

From: Pat Hayes <phayes@ihmc.us>
Date: Sun, 18 Mar 2007 23:50:28 -0500
To: "Seaborne, Andy" <andy.seaborne@hp.com>
Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-Id: <p0623090dc22353b62869@[192.168.1.2]>
Overall comment (important).

There is a disconnect between the ideas of 
dataset and graph, which I think needs to be 
fixed. Section 8 discusses datasets in great 
detail with many examples, but it nowhere 
actually defines explicitly which RDF graph is 
determined to be the one that BGPs are required 
to match against. Section 12.3.2 defines matching 
for BGPs, but speaks of matching to a dataset 
(mia culpa). Section 12.5 finally introduces and 
uses the terminology "active graph", but it does 
not formally define this notion or say how it is 
computed. (See detailed comments of 12.5 below) 
In any case, it is far too late in the document 
for this idea to be defined.

"Active graph" is a basic concept which should be 
defined in section 8, which should give clear 
criteria for how to determine it given a query 
and a dataset. Then 12.3.2 should use this term 
when defining BGP matching, and the references in 
12.3.2 and 12.5 should have internal links to the 
definition in section 8.

--------
Comments on Section 12

"query as as string" -> "query as a string"
"abstract query comprises operators" -> "abstract query comprising operators"
"this can then be evaluated"  Should we only say 
'can' here? Suggest "this is then evaluated".

"This section defines the correct behavior for 
evaluation of graph patterns and solution 
modifiers, given a query string and an RDF 
dataset.  "

But you just said we would cover the process 
starting with the abstract syntax, not the 
string. Correct one of these statements.

This whole process seems awfully complicated and 
unmotivated. Can you give some guidelines on what 
the differences are between abstract syntax and 
abstract query and SPARQL algebra (?) . This 
whole topic of converting form one form to 
another isn't mentioned again until 12.2, and 
none of the intervening definitions in 12.1 seem 
to be relevant to it.

In fact, I would suggest switching sections 12.1 
and 12.2, and maybe merging 12.1 with 12.3.

12.1.1

"IRIs, a subset of RDF URI References that omits 
spaces." Is that *really* the definition of an 
IRI? Suggest provide a link to the IRI 
publication.

The link on the word 'updated' is to 
http://www.w3.org/TR/rdf-concepts/#section-Graph-URIref. 
Is this the most appropriate link?

First definition: et -> Let

12.1.2
"each <ui> is an IRI. Each <ui> is distinct."  -> 
"each  <ui> is a distinct IRI."

The notation used here seems odd to me. Usually, 
ordered pairs are indicated using <> as brackets. 
Why do you need them round the IRI names? Wouldnt 
it make sense to write this in this way
"An RDF dataset is a set:
{G, <u1, G1>, ...}..."

12.1.3
"...is a member of an infinite set V which is disjoint from RDF-T. "

You don't give a definition for BGP (its a set of 
triple patterns, yes?) I suggest it should be 
between 12.1.4 and 12.1.5. Right now there is no 
connection between the material up to 12.1.4 and 
the stuff starting at 12.1.5. Also, the internal 
link 
http://www.w3.org/2001/sw/DataAccess/rq23/rq25.html#rBasicGraphPattern
is broken.

12.1.5 seems a bit bare. Where do we find out 
more about these various kinds of pattern?  Can 
you provide links? (As you do in 12.1.7)

12.1.6. (Question about terminology. Is *every* 
such mapping a "solution" mapping? Or only the 
ones which actually are solutions? Right now we 
say the first, which seems a bit odd to me, 
because a solution mapping might not be a 
solution. (Later. Was it me who suggested this 
terminology? If I did, mia culpa again.))

No need to say from V to T since these are globally fixed.
-> "A solution mapping É  is a partial function É  : V -> T"

The note about multisets seems out of place here, 
since we havnt mentioned matching graph patterns 
yet, and nothing has been said about there being 
multiple answers. Suggest moving this to 12.2, 
and omitting the last sentence "It is 
described..." which reads like an implementation 
suggestion and seems out of place. Readers will 
likely know what a bag is in any case, right?

12.1.7 What is a solution sequence, that we can 
have a modifier of it? Is there a missing 
definition of 'solution sequence' ?

----
12.2

"an SPARQL query" -> "a SPARQL query"

This is hard to follow. After parsing, the syntax 
tree is composed of .. a table?? What is the 
'query form' in this table? Is it part of the 
syntax tree, or just there for reference?

"uses theses symbols" -> "uses these symbols"

What exactly is meant by "mapping" in "The result 
of mapping a SPARQL query..." ? This mapping idea 
hasn't been mentioned previously or defined 
(unless you mean solution mapping? Surely not.) 
Is this mapping the same as "converting"? The 
early material in the beginning of the section 12 
talks about a series of 'steps' and of 'turning 
into', but does not say 'mapping' or 
'converting'. Suggest choosing a uniform 
terminology and sticking to it throughout. Might 
also be a good idea to review that early material 
here (unless you put 12.2 before 12.1, as I 
suggested above)

What is a 'result form' in the definition of 
abstract query? The internal link is broken.

12.2.1

What does the title of this section mean? (Mapping graph patterns to what?)

Step 2
second line, remove comma after "GroupGraphPattern"
"replace with a sequence of nested union 
operators:" => "replace with nested union 
operators, associated to the left:"

Step 3. Odd change of font. Is it meaningful?

Does "Map ... to ..." mean the same as "replace 
... by ...."? Suggest use consistent terminology 
in describing these steps. "Replace ..by.." seems 
nicely unambiguous.

Step 4.

What is the point of the link from the cryptic 
word "Constraint" in parentheses, without 
explanation?

What does "Write: "A" for an algebra expression" 
mean? The earlier steps have been instructions to 
do something: is this an instruction (imperative) 
also? If not, what is it? If it is, where does 
one write "A" exactly?

In box:
"for i := 0 ; i < length(SP); i++"  Yechhh, do we 
really want to use C++ in the formal spec? 
Couldn't you write this in some kind of readable 
pseudocode?

BTW, what is the scope of this iteration? Is the 
"If F is nonempty" inside it or after it?

"LeftJoin(G , A, true)" -> "LeftJoin(G, A, true)" (no space after G)

"SP := List " -> "SP := list "

"If G = Join(A1, A2) then G := Filter(F, Join(A1, 
A2)" -> "If G = Join(A1, A2) then G := Filter(F, 
Join(A1, A2))" (extra paren at end)
----------

This step 4 is incomprehensible as written, I 
have to say. I have no idea what it is telling me 
to do. If that stuff in the box is a procedure, 
where is A initialized? I can't see how G can 
ever get rid of a LeftJoin; is this right?

What does "Map all sub-patterns contained in this 
group" mean? Sub-pattern hasn't been defined, and 
contain hasn't been defined.

step 5. "join({}, A)" -> "join({ }, A)" (space added)

12.2.3 What is this doing? A word or two would be helpful.

Step 1 "There is no implied ordering to the 
sequence" OK, but does it have to be fixed? That 
is, is ToList a real function?

This step says "set M =". Earlier part of this 
section have used assignment := or said "replace 
... by ..." Later steps in the subsection omit 
"set" and are written using equality, which is 
misleading if read as an equation. Suggest using 
uniform notation and terminology.

Step 2. Where does the list of order conditions come from?

Step 3. What is a 'named variable' ? Suggest 
rephrase as "all variables occurring in the query"

Step 5. "If the query mentions.." Does this mean 
the same as "If the query contains.." ? If so, 
suggest use consistent wording.

"defaults to the (size(M)-start)" -> "defaults to (size(M)-start)"

--------------

12.3 ( The definitions in this section seem to 
continue directly on from those in section 12.1, 
and not be very connected to those in section 
12.2.)

"for multiset" -> "for the multiset"

Definition of Compatible mappings. I'd suggest 
defining merge explicitly, rather than talking 
about the set-union of two mappings (tricky idea 
to get right):

Definition: The merge(mu1, mu2) of two compatible 
mappings is the mapping which is identical to mu1 
on dom(mu1) and to mu2 on dom(mu2).

Delete "Following the terminology of RDF semantics [RDF-MT]"

Make   "* An [RDF instance mapping]"  with [ ] a 
hyperlink to 
http://www.w3.org/TR/2004/REC-rdf-mt-20040210/#definst

(Because the rest of the terminology defined here *isn't* in RDF-MT :-)

12.3.1

Delete second sentence (no longer true of the 
material in this section). Could replace it with 
a forward reference to section 12.6

Solution mapping has already been defined, omit definition here.

Delete "P(x) = É (É(x))"

Definition of BGP Matching, change to:

-----
Let BGP be a basic graph pattern and G be an RDF 
graph. <mu> is a <em>solution</em> for BGP from G 
when there is a pattern instance mapping P such 
that P(BGP) is a subset of G and <mu> is the 
restriction of P to the query variables in BGP.
-----
A <em>solution sequence</em> is some total 
ordering of the multiset of all solutions for BGP 
from G, each derived from a distinct pattern 
instance mapping.
----

(NOTE. I hope this last bit is still right :-)

12.3.2

"as identifying nodes in the dataset." -> "as 
identifying nodes in the active graph of the 
dataset."

"understood to be not from DS itself," -> 
"understood to be not from the active graph of DS 
itself,"

"which is graph-equivalent to DS but shares no" 
-> "which is graph-equivalent to the active graph 
of DS but shares no"

"SPARQL adopts a simple subgraph matching 
criterion for this. A SPARQL answer is the 
restriction of a SPARQL instance mapping M to 
query variables, where M(BGP) is a subset of the 
scoping graph. There is one answer for each 
distinct such SPARQL instance mapping M."
->
"SPARQL uses the subgraph match criterion to 
determine the multiset of answers. There is one 
answer for each distinct pattern instance mapping 
from the basic graph pattern to a subset of the 
active graph."

Next para,

"when the dataset is lean" -> "when the active graph of the dataset is [lean]"

and put a hyperlink on [lean] to http://www.w3.org/TR/rdf-mt/#deflean

------------
12.4

Definitions here all refer to 'mappings'. As we 
have defined a number of different mappings, say 
which one of them is intended.

Defn of filter: "an expression that has a boolean 
effective value of true"  Is this verbiage really 
necessary? You havn't used the phrase "boolean 
effective value" elsewhere. Why not just say "an 
expression with the value true" ?

Is "card[Filter(expr, ‡)](É ) = card[‡](É )" 
really true? Surely the filter can reduce the 
cardinality, no??

Defn Join: "sum over É  in (‡1  set-union ‡2), 
card[‡1](É 1)*card[‡2](É 2)" What does this mean? 
The sum expression doesnt contain É .

Defn. Diff; again, is that equation for the cardinality really true?
Similarly for the union case: surely one only 
gets the sum of the cardinalities when the 
original sets are disjoint.

Is the C in [x | C]  a condition on the sequence 
or on the elements of the sequence?

---------
12.5

What is the range of eval? Its hard to read 
expressions like "Join(eval(D(G), P1), eval(D(G), 
P2))" without knowing this :-)

What is the "active graph" exactly? (See first comment.)

Its not clear (to me) what it means to say that 
the active graph is "initially" the default 
graph. (Initially? How did time get into the 
question?)

Suggest
"eval(D(G), BGP) = multiset of solution mappings" 
-> "eval(D(G), BGP) = multiset of all distinct 
solution mappings for BGP from G"  (assuming that 
the earlier suggested changes are made so this 
makes sense.)

Defn of Evaluation of a Union Pattern. "join" is 
written in lower case. Should this be "Join" ?

BTW, this would all be a lot easier to understand 
if you used some systematic way of distinguishing 
the evaluation function from the SPARQL algebra 
term, say by a font change or something? But its 
getting late, so never mind....

---------
12.6

"needless of inappropriate" -> "needless or inappropriate"

"... if and only if the triple (" ends a line, which is a pity.

"consistent source document SD is uniquely 
specified and is E-equivalent to SD."
->
"consistent active graph AG is uniquely specified and is E-equivalent to AG."

"For any basic graph pattern BGP and pattern solution P"
->
"For any basic graph pattern BGP and pattern solution mapping P"

"and answer set {P1 ... Pn} " -> "and answer sequence <P1 ... Pn>"

"and where {BGP1 .... BGPn} is a set of basic 
graph patterns" -> "and where <BGP1 .... BGPn> is 
a sequence of basic graph patterns"

"guarantee that every BGP and SD" -> "guarantee that every BGP and AG"

"(a) SG will often be graph equivalent to SD" -> 
"(a) SG will often be graph equivalent to AG"

"that SG share no blank nodes with SD or BGP. In 
particular, it allows SG to actually be SD."
->
"that SG share no blank nodes with AG or BGP. In 
particular, it allows SG to actually be AG."

"graph-equivalent to SD but shares no blank nodes with SD or BGP"
->
"graph-equivalent to AG but shares no blank nodes with AG or BGP"

-----------

Phew.

Pat

-- 
---------------------------------------------------------------------
IHMC		(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32502			(850)291 0667    cell
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Monday, 19 March 2007 04:50:43 UTC