- From: Pat Hayes <phayes@ihmc.us>
- Date: Sun, 18 Mar 2007 23:50:28 -0500
- To: "Seaborne, Andy" <andy.seaborne@hp.com>
- Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Overall comment (important).
There is a disconnect between the ideas of
dataset and graph, which I think needs to be
fixed. Section 8 discusses datasets in great
detail with many examples, but it nowhere
actually defines explicitly which RDF graph is
determined to be the one that BGPs are required
to match against. Section 12.3.2 defines matching
for BGPs, but speaks of matching to a dataset
(mia culpa). Section 12.5 finally introduces and
uses the terminology "active graph", but it does
not formally define this notion or say how it is
computed. (See detailed comments of 12.5 below)
In any case, it is far too late in the document
for this idea to be defined.
"Active graph" is a basic concept which should be
defined in section 8, which should give clear
criteria for how to determine it given a query
and a dataset. Then 12.3.2 should use this term
when defining BGP matching, and the references in
12.3.2 and 12.5 should have internal links to the
definition in section 8.
--------
Comments on Section 12
"query as as string" -> "query as a string"
"abstract query comprises operators" -> "abstract query comprising operators"
"this can then be evaluated" Should we only say
'can' here? Suggest "this is then evaluated".
"This section defines the correct behavior for
evaluation of graph patterns and solution
modifiers, given a query string and an RDF
dataset. "
But you just said we would cover the process
starting with the abstract syntax, not the
string. Correct one of these statements.
This whole process seems awfully complicated and
unmotivated. Can you give some guidelines on what
the differences are between abstract syntax and
abstract query and SPARQL algebra (?) . This
whole topic of converting form one form to
another isn't mentioned again until 12.2, and
none of the intervening definitions in 12.1 seem
to be relevant to it.
In fact, I would suggest switching sections 12.1
and 12.2, and maybe merging 12.1 with 12.3.
12.1.1
"IRIs, a subset of RDF URI References that omits
spaces." Is that *really* the definition of an
IRI? Suggest provide a link to the IRI
publication.
The link on the word 'updated' is to
http://www.w3.org/TR/rdf-concepts/#section-Graph-URIref.
Is this the most appropriate link?
First definition: et -> Let
12.1.2
"each <ui> is an IRI. Each <ui> is distinct." ->
"each <ui> is a distinct IRI."
The notation used here seems odd to me. Usually,
ordered pairs are indicated using <> as brackets.
Why do you need them round the IRI names? Wouldnt
it make sense to write this in this way
"An RDF dataset is a set:
{G, <u1, G1>, ...}..."
12.1.3
"...is a member of an infinite set V which is disjoint from RDF-T. "
You don't give a definition for BGP (its a set of
triple patterns, yes?) I suggest it should be
between 12.1.4 and 12.1.5. Right now there is no
connection between the material up to 12.1.4 and
the stuff starting at 12.1.5. Also, the internal
link
http://www.w3.org/2001/sw/DataAccess/rq23/rq25.html#rBasicGraphPattern
is broken.
12.1.5 seems a bit bare. Where do we find out
more about these various kinds of pattern? Can
you provide links? (As you do in 12.1.7)
12.1.6. (Question about terminology. Is *every*
such mapping a "solution" mapping? Or only the
ones which actually are solutions? Right now we
say the first, which seems a bit odd to me,
because a solution mapping might not be a
solution. (Later. Was it me who suggested this
terminology? If I did, mia culpa again.))
No need to say from V to T since these are globally fixed.
-> "A solution mapping É is a partial function É : V -> T"
The note about multisets seems out of place here,
since we havnt mentioned matching graph patterns
yet, and nothing has been said about there being
multiple answers. Suggest moving this to 12.2,
and omitting the last sentence "It is
described..." which reads like an implementation
suggestion and seems out of place. Readers will
likely know what a bag is in any case, right?
12.1.7 What is a solution sequence, that we can
have a modifier of it? Is there a missing
definition of 'solution sequence' ?
----
12.2
"an SPARQL query" -> "a SPARQL query"
This is hard to follow. After parsing, the syntax
tree is composed of .. a table?? What is the
'query form' in this table? Is it part of the
syntax tree, or just there for reference?
"uses theses symbols" -> "uses these symbols"
What exactly is meant by "mapping" in "The result
of mapping a SPARQL query..." ? This mapping idea
hasn't been mentioned previously or defined
(unless you mean solution mapping? Surely not.)
Is this mapping the same as "converting"? The
early material in the beginning of the section 12
talks about a series of 'steps' and of 'turning
into', but does not say 'mapping' or
'converting'. Suggest choosing a uniform
terminology and sticking to it throughout. Might
also be a good idea to review that early material
here (unless you put 12.2 before 12.1, as I
suggested above)
What is a 'result form' in the definition of
abstract query? The internal link is broken.
12.2.1
What does the title of this section mean? (Mapping graph patterns to what?)
Step 2
second line, remove comma after "GroupGraphPattern"
"replace with a sequence of nested union
operators:" => "replace with nested union
operators, associated to the left:"
Step 3. Odd change of font. Is it meaningful?
Does "Map ... to ..." mean the same as "replace
... by ...."? Suggest use consistent terminology
in describing these steps. "Replace ..by.." seems
nicely unambiguous.
Step 4.
What is the point of the link from the cryptic
word "Constraint" in parentheses, without
explanation?
What does "Write: "A" for an algebra expression"
mean? The earlier steps have been instructions to
do something: is this an instruction (imperative)
also? If not, what is it? If it is, where does
one write "A" exactly?
In box:
"for i := 0 ; i < length(SP); i++" Yechhh, do we
really want to use C++ in the formal spec?
Couldn't you write this in some kind of readable
pseudocode?
BTW, what is the scope of this iteration? Is the
"If F is nonempty" inside it or after it?
"LeftJoin(G , A, true)" -> "LeftJoin(G, A, true)" (no space after G)
"SP := List " -> "SP := list "
"If G = Join(A1, A2) then G := Filter(F, Join(A1,
A2)" -> "If G = Join(A1, A2) then G := Filter(F,
Join(A1, A2))" (extra paren at end)
----------
This step 4 is incomprehensible as written, I
have to say. I have no idea what it is telling me
to do. If that stuff in the box is a procedure,
where is A initialized? I can't see how G can
ever get rid of a LeftJoin; is this right?
What does "Map all sub-patterns contained in this
group" mean? Sub-pattern hasn't been defined, and
contain hasn't been defined.
step 5. "join({}, A)" -> "join({ }, A)" (space added)
12.2.3 What is this doing? A word or two would be helpful.
Step 1 "There is no implied ordering to the
sequence" OK, but does it have to be fixed? That
is, is ToList a real function?
This step says "set M =". Earlier part of this
section have used assignment := or said "replace
... by ..." Later steps in the subsection omit
"set" and are written using equality, which is
misleading if read as an equation. Suggest using
uniform notation and terminology.
Step 2. Where does the list of order conditions come from?
Step 3. What is a 'named variable' ? Suggest
rephrase as "all variables occurring in the query"
Step 5. "If the query mentions.." Does this mean
the same as "If the query contains.." ? If so,
suggest use consistent wording.
"defaults to the (size(M)-start)" -> "defaults to (size(M)-start)"
--------------
12.3 ( The definitions in this section seem to
continue directly on from those in section 12.1,
and not be very connected to those in section
12.2.)
"for multiset" -> "for the multiset"
Definition of Compatible mappings. I'd suggest
defining merge explicitly, rather than talking
about the set-union of two mappings (tricky idea
to get right):
Definition: The merge(mu1, mu2) of two compatible
mappings is the mapping which is identical to mu1
on dom(mu1) and to mu2 on dom(mu2).
Delete "Following the terminology of RDF semantics [RDF-MT]"
Make "* An [RDF instance mapping]" with [ ] a
hyperlink to
http://www.w3.org/TR/2004/REC-rdf-mt-20040210/#definst
(Because the rest of the terminology defined here *isn't* in RDF-MT :-)
12.3.1
Delete second sentence (no longer true of the
material in this section). Could replace it with
a forward reference to section 12.6
Solution mapping has already been defined, omit definition here.
Delete "P(x) = É (É(x))"
Definition of BGP Matching, change to:
-----
Let BGP be a basic graph pattern and G be an RDF
graph. <mu> is a <em>solution</em> for BGP from G
when there is a pattern instance mapping P such
that P(BGP) is a subset of G and <mu> is the
restriction of P to the query variables in BGP.
-----
A <em>solution sequence</em> is some total
ordering of the multiset of all solutions for BGP
from G, each derived from a distinct pattern
instance mapping.
----
(NOTE. I hope this last bit is still right :-)
12.3.2
"as identifying nodes in the dataset." -> "as
identifying nodes in the active graph of the
dataset."
"understood to be not from DS itself," ->
"understood to be not from the active graph of DS
itself,"
"which is graph-equivalent to DS but shares no"
-> "which is graph-equivalent to the active graph
of DS but shares no"
"SPARQL adopts a simple subgraph matching
criterion for this. A SPARQL answer is the
restriction of a SPARQL instance mapping M to
query variables, where M(BGP) is a subset of the
scoping graph. There is one answer for each
distinct such SPARQL instance mapping M."
->
"SPARQL uses the subgraph match criterion to
determine the multiset of answers. There is one
answer for each distinct pattern instance mapping
from the basic graph pattern to a subset of the
active graph."
Next para,
"when the dataset is lean" -> "when the active graph of the dataset is [lean]"
and put a hyperlink on [lean] to http://www.w3.org/TR/rdf-mt/#deflean
------------
12.4
Definitions here all refer to 'mappings'. As we
have defined a number of different mappings, say
which one of them is intended.
Defn of filter: "an expression that has a boolean
effective value of true" Is this verbiage really
necessary? You havn't used the phrase "boolean
effective value" elsewhere. Why not just say "an
expression with the value true" ?
Is "card[Filter(expr, ‡)](É ) = card[‡](É )"
really true? Surely the filter can reduce the
cardinality, no??
Defn Join: "sum over É in (‡1 set-union ‡2),
card[‡1](É 1)*card[‡2](É 2)" What does this mean?
The sum expression doesnt contain É .
Defn. Diff; again, is that equation for the cardinality really true?
Similarly for the union case: surely one only
gets the sum of the cardinalities when the
original sets are disjoint.
Is the C in [x | C] a condition on the sequence
or on the elements of the sequence?
---------
12.5
What is the range of eval? Its hard to read
expressions like "Join(eval(D(G), P1), eval(D(G),
P2))" without knowing this :-)
What is the "active graph" exactly? (See first comment.)
Its not clear (to me) what it means to say that
the active graph is "initially" the default
graph. (Initially? How did time get into the
question?)
Suggest
"eval(D(G), BGP) = multiset of solution mappings"
-> "eval(D(G), BGP) = multiset of all distinct
solution mappings for BGP from G" (assuming that
the earlier suggested changes are made so this
makes sense.)
Defn of Evaluation of a Union Pattern. "join" is
written in lower case. Should this be "Join" ?
BTW, this would all be a lot easier to understand
if you used some systematic way of distinguishing
the evaluation function from the SPARQL algebra
term, say by a font change or something? But its
getting late, so never mind....
---------
12.6
"needless of inappropriate" -> "needless or inappropriate"
"... if and only if the triple (" ends a line, which is a pity.
"consistent source document SD is uniquely
specified and is E-equivalent to SD."
->
"consistent active graph AG is uniquely specified and is E-equivalent to AG."
"For any basic graph pattern BGP and pattern solution P"
->
"For any basic graph pattern BGP and pattern solution mapping P"
"and answer set {P1 ... Pn} " -> "and answer sequence <P1 ... Pn>"
"and where {BGP1 .... BGPn} is a set of basic
graph patterns" -> "and where <BGP1 .... BGPn> is
a sequence of basic graph patterns"
"guarantee that every BGP and SD" -> "guarantee that every BGP and AG"
"(a) SG will often be graph equivalent to SD" ->
"(a) SG will often be graph equivalent to AG"
"that SG share no blank nodes with SD or BGP. In
particular, it allows SG to actually be SD."
->
"that SG share no blank nodes with AG or BGP. In
particular, it allows SG to actually be AG."
"graph-equivalent to SD but shares no blank nodes with SD or BGP"
->
"graph-equivalent to AG but shares no blank nodes with AG or BGP"
-----------
Phew.
Pat
--
---------------------------------------------------------------------
IHMC (850)434 8903 or (650)494 3973 home
40 South Alcaniz St. (850)202 4416 office
Pensacola (850)202 4440 fax
FL 32502 (850)291 0667 cell
phayesAT-SIGNihmc.us http://www.ihmc.us/users/phayes
Received on Monday, 19 March 2007 04:50:43 UTC