- From: Graham Klyne <GK@ninebynine.org>
- Date: Sat, 10 Sep 2005 13:56:31 +0100
- To: public-rdf-dawg-comments@w3.org
[Apologies for being late with these, but I'm hoping better late than
never...]
Reviewing:
http://www.w3.org/TR/2005/WD-rdf-sparql-query-20050721/
Overview: I find that the specification (or what I think it says) to be
generally sound and sensible, but I see a number of areas where the
explanations seem less clear than they might be.
I think this will be a very important specification for a range of RDF
users and developers, so I think making it as clear as possible is a
goal worth pursuing.
...
General, definitions:
I am finding the "Definitions" given in the text are less helpful than I
feel they should be. I discern two main reasons for this:
(a) although couched in a kind of formal language, they don't seem to be
constructed with the rigour I would associate with such language. The
definitions seem to be incomplete and/or ambiguous (or open to different
interpretation), so the expected benefit of formality is not being
realized. In the notes below, I pick out some problems I have identified.
(b) it's not easy to find definitions. My (printed) copy of the
document contains no collected list of definitions, even though the
table of contents and change log indicate this should be present. (ToC
has this between the references and the change log.)
(If I had the time, I'd like to try coding up the formal definitions in
Haskell, which I think would quickly flush out any problems, but I don't
see me having time in the next month.)
...
General, presentation of concetps:
I have the feeling that this document has been drafted by people who
have experience of constructing query implementations (I know Eric and
Andy have), and that some of the important concepts and ideas are made
implictly rather than explicitly, and hence that some of the ideas are
not fully explained for a person approaching this topic afresh. I have
tried to point out cases where I see them, but having myself implemented
RDF query systems I may easily have overlooked others.
An example of this might be section 8.3 (restriction by bound
variables): I think I understand what is being described based on my own
past experience, but I can't tell if I would otherwise be able to do so.
(I appreciate this comment doesn't readily admit a specific response,
and I don't expect one but, by mentioning it, maybe I can help sensitize
peope to some possible issues.)
...
General, prefixes in IRI results:
I think there is an awkward tension between theoretical requirements for
correct appliion functioning, and practical usability issues, in the way
that IRIs are returned in query results. In theory, all that is needed
is the IRI, but most SWeb applications I have seen go to some lengths to
preserve the prefixes used in the original data so that human-readable
qname values can be reconstructed.
As far as I can tell (also looking at
http://www.w3.org/TR/2005/WD-rdf-sparql-XMLres-20050801/), there is no
provision for returning prefixes. I think that practical considerations
suggest that there should be an optional mechanism for query processors
to return prefix information with variable binding results.
...
Section 3.1, "Matching integers" (and nearby) [editorial]:
"The pattern in the following query has a solution :x ..." is not
explicit that it refers to a solution when matched against the preceding
data. An immediate fix would be to add "in the above RDF data" after
":x", but maybe a more comprehensive approach would be to add a brief
paragraph, just after the sample data, along the lines of:
[[
This RDF data is the target for query examples in the following sections.
]]
...
Section 3.3, Boolean [editorial nit]:
It is my understanding (nad my dictionary agrees) that "Boolean" in
prose text should be capitalized, being named after Boole.
...
Section 3.3, definition [editorial]:
I found the last part of this definition was hard to follow. I suggest
something like: "For value constraint C, a solution S matches C if S(C)
is true, where S(C) is the Boolean-valued expression obtained by
substitution of variables mentioned in C."
...
Section 3.3, error conditions [functionality query]:
Has the full impact of the stated handling of errors been considered in
depth? While I think this is probably OK, I have a niggling concern
that there may be some classes of errors that may prove difficult to
catch in this way. For sure, I think that an "error condition" that is
caused by unanticipated values in the target graph should be handled as
described here, but in other cases, when the error is clearly in the way
the query has been constructed, it would be acceptable to simply return
a failure. For example, a regex filter containing an invalid regex.
My concern here, I think, is that it is not clear how broadly the term
"error condition" should be interpreted.
...
Section 4: 1st bullet, "Basic Graph Patterns" [editorial]
I think this should be cross-referenced to section 2.5.
I note the phrase is hyperlinked (or assume so, as it is underlined),
but as I am reviewing a paper copy of the document, I have no idea where
the hyperlink actually leads.
...
Section 4: 2nd bullet, "Group Pattern" [unclear]
I found the phrasing "must all match" was insufficient. Suggest
something like: "where each of a set of graph patterns must match using
the same variable substitution".
...
Section 4, general [editorial]:
There seems to be deal of overlap between this section and section 2.5,
with maybe some muddling of the concepts (notably "basic graph pattern"
and "Group graph pattern" seem to be somewhat tangled). For
specification purposes, I think it would be easier to treat a "basic
graph pattern" as a group of "triple patterns".
Thus, I think that merging sections 2.5 and 4 could create a simpler,
easier to follow descritpion with less scope for misinterpretation.
It seems strange that the start of section 4 contains a bulleted list of
topics that are described in sectrions 2.5, 4, 5 and 6. So my I would
expand previous suggestion to suggest a single section covering all of
these, starting with the list of various patterns described. A
preceding section could deal with matching of single triples, literals,
bnodes, etc.
...
Section 4.1, "For any solution ..." [editorial]:
I found this paragraph was potentially confusing, being an example of
the muddle I allude to in the preceding comment.
...
Section 5.1, para 1 [query correctness]:
"... OPTIONAL keyword applied to a graph pattern." Should this be "...
applied to a group pattern"? I ask this because section 4.1 indicates
braces as introducing a group pattern.
...
Section 5.1, example [incomplete spec]:
What happens if the triple
_:a foaf:mbox <mailto:alice@work.example> .
is added to the example data?
I think this should lead to two solutions that bind "name" to "Alice",
but that's not clear to me from the description here.
...
Section 5.4, formal definition [error?]:
I think this formal definition may be wrong or incomplete.
Preamble: it refers to a "S is a solution", but I see no definition of
solution. (Section 2.4 has "Pattern Solution" and "Query Solution". I'm
guessing the latter is meant.
Consider the example data:
[[
_:a rdf:type foaf:Person .
_:a foaf:name "Alice" .
_:a foaf:mbox <mailto:alice@work.example> .
]]
and the query pattern from section 5.1:
[[
WHERE { ?x foaf:name ?name .
OPTIONAL { ?x foaf:mbox ?mbox }
}
]]
This is an instance of OPT(A,B), where:
A = { ?x foaf:name ?name }
B = { ?x foaf:mbox ?mbox }
The substitution:
[ x/_:a, name/"Alice", mbox/<mailto:alice@work.example> ]
is a solution for both A and B, hence is a solution for OPT(A,B).
But also consider the substitution:
[ x/_:a, name/"Alice", mbox/<mailto:alice@home.example> ]
This is a solution for A but is not a solution for A and B, hence
according to the definition given it is a solution for OPT(A,B)
This means that the solution set should include:
[ x/_:a, name/"Alice", mbox/<mailto:alice@work.example> ]
[ x/_:a, name/"Alice", mbox/<mailto:alice@home.example> ]
and any other possible substitution for mbox, which is clearly not what
is intended.
...
Section 5.5, 1st para [editorial]:
I think this is confusing, or not making sense, as the inner optional
pattern is (syntactically) a part of the optional outer pattern. Thus
it might be expected that a match of the outer pattern must also match
the inner pattern.
Suggest:
[[
Optional patterns can occur inside any group graph pattern, including a
group graph pattern which itself is optional, forming a nested pattern.
Any non-optional part of the outer optional graph pattern must be
matched if any variable bindings from the nested optional pattern are
returned. Thus, for a nested optional pattern OPT(A,OPT(B,C)), B and
possibly C are matched only when A is matched.
]]
...
Section 6 [editorial]:
I think it might be helpful to include a test case that shows that:
OPT(A,B)
and
UNION(A,{A B})
are *not* equivalent.
...
Section 6.2 [incomplete spec]:
The formal definition does not explain what are the results from
matching a union pattern.
...
Section 7, "Definition of RDF Dataset Graph Pattern" [incomplete spec]:
This definition doesn't actually tell me what a "RDF Dataset Graph
Pattern" is.
...
Section 7, [clarification]:
From reading this, I think that the pattern
GRAPH ?g { (pat) }
does not match if (pat) is matched only in the default graph. Is this
what is intended? I think a brief explanation and test case would be in
order here.
...
Section 7.1 [superfluous content]:
I think the following text is superfluous and serves no useful purpose
over the examples given.
[[
Two useful arrangements are:
* to have information in the default graph that includes provenance
information about the named graphs
* to include the information in the named graphs in the default
graph as well.
]]
Suggest: remove this.
...
Section 7.1, example 2 [clarification]:
I'm not sure what is meant by "contain the same information as before".
I think it should be "contain the same triples as before".
...
Section 8, general [clarification]:
Following section 7, I'm not sure if this section adds anything other
than explanatory content. If there is any additional normative content
here, I think it should be highlighted. If it is purely explaantory,
then I think it would better be subsection(s) of sect 7, and the text
tweaked to show that it follows from what has been specified (e.g. a
subsection headed "Examples of Dataset Queries").
...
Section 9, general [grumble]:
I still feel (as I mentioned once previously) that the FROM clauses
don't really belong in the query language, but in the protocol. I think
of a query as being something like a regex that stands alone,
indpendently of the target data to which it is applied.
That said, I feel that the specification given is sufficiently flexible
that it doesn't force implementations to do anything onerous (and might
even be ignored if the Dataset is assembled by other means), so I won't
complain too loudly.
...
Section 9.2, example [clarification?]:
Is the intent here that the default graph is empty, or unspecified.
...
Section 10.1, "The effect of applying ..." [clarification]:
I feel the text here only imlicitly indicates that more than one
solution sequence modifier can be applied. Also, I think a cross
reference the the syntax terms showing how multiple modifiers may be
included would be helpful.
Does this paragraph refer to the order in which the modifiers are given
above in the document text, or in the query itself?
...
Section 10.1, "Order by", ordering of IRIs [clarification?]:
I'm wondering if anything needs to be said about the ordering of IRIs
that use different combining forms (cf section 2.1, and my comment in a
previous message). It seems life would be easier, in theory at least,
if IRI ordering were based on a normalized form, so that, e.g.,
different combining forms don't lead to effectively equivalent IRIs
having different ordering.
I see this is an awkward topic, and I don't feel I know the right answer.
...
Section 10.2, References, result format [clarification]:
Is http://www.w3.org/TR/rdf-sparql-XMLres/ is missing from the normative
references? It is linked directly from section 10.2, and appears in the
informative references. Maybe it should be normative because it is
needed to fully implement a processor for the SPARQL query
specification, per section 10.2.
Hmmm... Reading more closely the text in 10.2 ("Result sets can be
accessed...", I think there is some confusion (maybe on my part) about
what the spec is describing: a query language? a query protocol? a
query API?
Taking a cue from the specification title, I'd say the current
specification has it about right, but that the text in section 10.2
should maybe be a little bit more explciit about what is not being
described; e.g. replace the 2 paras from "Results can be thought of as
..." with:
[[
This specification does not define exactly how such results are
returned, as a query may be used in different contexts (e.g. query
protcol, query API) for which different forms are appropriate.
Results can be thought of as a table with one row per query solution,
and a column for each variable in the query. Some cells may be empty
because a variable is not bound in that particular solution.
The SPARQL Query Results XML Format [ref] form of the above result set
gives:
]]
...
Section 10.3, para 2 [clarification]:
The reference to "a warning may be generated" has me wondering how it is
expected such a warning might be returned. Does the protocol spec have
a means to return (a) warning(s) along with query results?
...
Section 10.4, general [grumble]:
I don't see why DESCRIBE is included in this specification, since what
it returns it's so vague as to defy any prospect of interoperability.
I think it would be better to provide an extensibility mechanism for
additional result formats, which could be used by applications wishing
to use DESCRIBE functionality without having to hopelessly overload a
single query language element. If a common resource description format
should be developed in future, it could then become standardized
extension to the query language.
...
Section 11, para 1 [editorial]:
I found the introductory description of value testing was awkward and
convoluted, with its focus on "effective Boolean values", and the
subsequent and separate discussion of type errors. The whole area of
handling type errors (which I think is a slight misnomer, as in SPARQL
terms theyt;re not really errros, just mismatches with predicted
results) seems to add an unnecessary layer of complexity to the
description of filters.
I think it would be easier to follow a description forumulated in terms
of "satsifying" a value test, and go on to explain when expressions
containing type mismatches may or may not be satisfied. Maybe,
introduce an "undefined" value that propagates through expressions in a
predicted fashion, which I think would lead to a simpler and more
complete explanation of how type mismatches effect query results.
(Also: section 11.2)
...
Section 11.1 [typo]:
s/constituant/constituent/
...
Sect 11.2.3.1, "known to have the same value" [clarification]:
The discussion of sop:RDFterm-Equal seems to have some ambiguity, since
it depenends upon how much the query processor knows about the datatypes
concerned. What happens if a datatype is used that the query processor
doesn't know how to test for equivalent values?
Reading this, I'm reminded of the introduction of D-interpretations
(datatyped interpretations) in the RDF semantics specification.
...
Section 11.2.3.6, sop:regex, general [implementation concern]:
I've mixed feelings about inclusion of this function, as it seems to
place a non-trivial complication to SPARQL processor implementations in
environments that don't already include a conforming REGEX
functionality. Is this really essential to a significant majority of
applications?
...
That's about it. I hope it helps, and apologies again for being late
with my comments.
#g
--
Graham Klyne
For email:
http://www.ninebynine.org/#Contact
Received on Saturday, 10 September 2005 12:56:07 UTC