- From: Graham Klyne <GK@ninebynine.org>
- Date: Sat, 10 Sep 2005 13:56:31 +0100
- To: public-rdf-dawg-comments@w3.org
[Apologies for being late with these, but I'm hoping better late than never...] Reviewing: http://www.w3.org/TR/2005/WD-rdf-sparql-query-20050721/ Overview: I find that the specification (or what I think it says) to be generally sound and sensible, but I see a number of areas where the explanations seem less clear than they might be. I think this will be a very important specification for a range of RDF users and developers, so I think making it as clear as possible is a goal worth pursuing. ... General, definitions: I am finding the "Definitions" given in the text are less helpful than I feel they should be. I discern two main reasons for this: (a) although couched in a kind of formal language, they don't seem to be constructed with the rigour I would associate with such language. The definitions seem to be incomplete and/or ambiguous (or open to different interpretation), so the expected benefit of formality is not being realized. In the notes below, I pick out some problems I have identified. (b) it's not easy to find definitions. My (printed) copy of the document contains no collected list of definitions, even though the table of contents and change log indicate this should be present. (ToC has this between the references and the change log.) (If I had the time, I'd like to try coding up the formal definitions in Haskell, which I think would quickly flush out any problems, but I don't see me having time in the next month.) ... General, presentation of concetps: I have the feeling that this document has been drafted by people who have experience of constructing query implementations (I know Eric and Andy have), and that some of the important concepts and ideas are made implictly rather than explicitly, and hence that some of the ideas are not fully explained for a person approaching this topic afresh. I have tried to point out cases where I see them, but having myself implemented RDF query systems I may easily have overlooked others. An example of this might be section 8.3 (restriction by bound variables): I think I understand what is being described based on my own past experience, but I can't tell if I would otherwise be able to do so. (I appreciate this comment doesn't readily admit a specific response, and I don't expect one but, by mentioning it, maybe I can help sensitize peope to some possible issues.) ... General, prefixes in IRI results: I think there is an awkward tension between theoretical requirements for correct appliion functioning, and practical usability issues, in the way that IRIs are returned in query results. In theory, all that is needed is the IRI, but most SWeb applications I have seen go to some lengths to preserve the prefixes used in the original data so that human-readable qname values can be reconstructed. As far as I can tell (also looking at http://www.w3.org/TR/2005/WD-rdf-sparql-XMLres-20050801/), there is no provision for returning prefixes. I think that practical considerations suggest that there should be an optional mechanism for query processors to return prefix information with variable binding results. ... Section 3.1, "Matching integers" (and nearby) [editorial]: "The pattern in the following query has a solution :x ..." is not explicit that it refers to a solution when matched against the preceding data. An immediate fix would be to add "in the above RDF data" after ":x", but maybe a more comprehensive approach would be to add a brief paragraph, just after the sample data, along the lines of: [[ This RDF data is the target for query examples in the following sections. ]] ... Section 3.3, Boolean [editorial nit]: It is my understanding (nad my dictionary agrees) that "Boolean" in prose text should be capitalized, being named after Boole. ... Section 3.3, definition [editorial]: I found the last part of this definition was hard to follow. I suggest something like: "For value constraint C, a solution S matches C if S(C) is true, where S(C) is the Boolean-valued expression obtained by substitution of variables mentioned in C." ... Section 3.3, error conditions [functionality query]: Has the full impact of the stated handling of errors been considered in depth? While I think this is probably OK, I have a niggling concern that there may be some classes of errors that may prove difficult to catch in this way. For sure, I think that an "error condition" that is caused by unanticipated values in the target graph should be handled as described here, but in other cases, when the error is clearly in the way the query has been constructed, it would be acceptable to simply return a failure. For example, a regex filter containing an invalid regex. My concern here, I think, is that it is not clear how broadly the term "error condition" should be interpreted. ... Section 4: 1st bullet, "Basic Graph Patterns" [editorial] I think this should be cross-referenced to section 2.5. I note the phrase is hyperlinked (or assume so, as it is underlined), but as I am reviewing a paper copy of the document, I have no idea where the hyperlink actually leads. ... Section 4: 2nd bullet, "Group Pattern" [unclear] I found the phrasing "must all match" was insufficient. Suggest something like: "where each of a set of graph patterns must match using the same variable substitution". ... Section 4, general [editorial]: There seems to be deal of overlap between this section and section 2.5, with maybe some muddling of the concepts (notably "basic graph pattern" and "Group graph pattern" seem to be somewhat tangled). For specification purposes, I think it would be easier to treat a "basic graph pattern" as a group of "triple patterns". Thus, I think that merging sections 2.5 and 4 could create a simpler, easier to follow descritpion with less scope for misinterpretation. It seems strange that the start of section 4 contains a bulleted list of topics that are described in sectrions 2.5, 4, 5 and 6. So my I would expand previous suggestion to suggest a single section covering all of these, starting with the list of various patterns described. A preceding section could deal with matching of single triples, literals, bnodes, etc. ... Section 4.1, "For any solution ..." [editorial]: I found this paragraph was potentially confusing, being an example of the muddle I allude to in the preceding comment. ... Section 5.1, para 1 [query correctness]: "... OPTIONAL keyword applied to a graph pattern." Should this be "... applied to a group pattern"? I ask this because section 4.1 indicates braces as introducing a group pattern. ... Section 5.1, example [incomplete spec]: What happens if the triple _:a foaf:mbox <mailto:alice@work.example> . is added to the example data? I think this should lead to two solutions that bind "name" to "Alice", but that's not clear to me from the description here. ... Section 5.4, formal definition [error?]: I think this formal definition may be wrong or incomplete. Preamble: it refers to a "S is a solution", but I see no definition of solution. (Section 2.4 has "Pattern Solution" and "Query Solution". I'm guessing the latter is meant. Consider the example data: [[ _:a rdf:type foaf:Person . _:a foaf:name "Alice" . _:a foaf:mbox <mailto:alice@work.example> . ]] and the query pattern from section 5.1: [[ WHERE { ?x foaf:name ?name . OPTIONAL { ?x foaf:mbox ?mbox } } ]] This is an instance of OPT(A,B), where: A = { ?x foaf:name ?name } B = { ?x foaf:mbox ?mbox } The substitution: [ x/_:a, name/"Alice", mbox/<mailto:alice@work.example> ] is a solution for both A and B, hence is a solution for OPT(A,B). But also consider the substitution: [ x/_:a, name/"Alice", mbox/<mailto:alice@home.example> ] This is a solution for A but is not a solution for A and B, hence according to the definition given it is a solution for OPT(A,B) This means that the solution set should include: [ x/_:a, name/"Alice", mbox/<mailto:alice@work.example> ] [ x/_:a, name/"Alice", mbox/<mailto:alice@home.example> ] and any other possible substitution for mbox, which is clearly not what is intended. ... Section 5.5, 1st para [editorial]: I think this is confusing, or not making sense, as the inner optional pattern is (syntactically) a part of the optional outer pattern. Thus it might be expected that a match of the outer pattern must also match the inner pattern. Suggest: [[ Optional patterns can occur inside any group graph pattern, including a group graph pattern which itself is optional, forming a nested pattern. Any non-optional part of the outer optional graph pattern must be matched if any variable bindings from the nested optional pattern are returned. Thus, for a nested optional pattern OPT(A,OPT(B,C)), B and possibly C are matched only when A is matched. ]] ... Section 6 [editorial]: I think it might be helpful to include a test case that shows that: OPT(A,B) and UNION(A,{A B}) are *not* equivalent. ... Section 6.2 [incomplete spec]: The formal definition does not explain what are the results from matching a union pattern. ... Section 7, "Definition of RDF Dataset Graph Pattern" [incomplete spec]: This definition doesn't actually tell me what a "RDF Dataset Graph Pattern" is. ... Section 7, [clarification]: From reading this, I think that the pattern GRAPH ?g { (pat) } does not match if (pat) is matched only in the default graph. Is this what is intended? I think a brief explanation and test case would be in order here. ... Section 7.1 [superfluous content]: I think the following text is superfluous and serves no useful purpose over the examples given. [[ Two useful arrangements are: * to have information in the default graph that includes provenance information about the named graphs * to include the information in the named graphs in the default graph as well. ]] Suggest: remove this. ... Section 7.1, example 2 [clarification]: I'm not sure what is meant by "contain the same information as before". I think it should be "contain the same triples as before". ... Section 8, general [clarification]: Following section 7, I'm not sure if this section adds anything other than explanatory content. If there is any additional normative content here, I think it should be highlighted. If it is purely explaantory, then I think it would better be subsection(s) of sect 7, and the text tweaked to show that it follows from what has been specified (e.g. a subsection headed "Examples of Dataset Queries"). ... Section 9, general [grumble]: I still feel (as I mentioned once previously) that the FROM clauses don't really belong in the query language, but in the protocol. I think of a query as being something like a regex that stands alone, indpendently of the target data to which it is applied. That said, I feel that the specification given is sufficiently flexible that it doesn't force implementations to do anything onerous (and might even be ignored if the Dataset is assembled by other means), so I won't complain too loudly. ... Section 9.2, example [clarification?]: Is the intent here that the default graph is empty, or unspecified. ... Section 10.1, "The effect of applying ..." [clarification]: I feel the text here only imlicitly indicates that more than one solution sequence modifier can be applied. Also, I think a cross reference the the syntax terms showing how multiple modifiers may be included would be helpful. Does this paragraph refer to the order in which the modifiers are given above in the document text, or in the query itself? ... Section 10.1, "Order by", ordering of IRIs [clarification?]: I'm wondering if anything needs to be said about the ordering of IRIs that use different combining forms (cf section 2.1, and my comment in a previous message). It seems life would be easier, in theory at least, if IRI ordering were based on a normalized form, so that, e.g., different combining forms don't lead to effectively equivalent IRIs having different ordering. I see this is an awkward topic, and I don't feel I know the right answer. ... Section 10.2, References, result format [clarification]: Is http://www.w3.org/TR/rdf-sparql-XMLres/ is missing from the normative references? It is linked directly from section 10.2, and appears in the informative references. Maybe it should be normative because it is needed to fully implement a processor for the SPARQL query specification, per section 10.2. Hmmm... Reading more closely the text in 10.2 ("Result sets can be accessed...", I think there is some confusion (maybe on my part) about what the spec is describing: a query language? a query protocol? a query API? Taking a cue from the specification title, I'd say the current specification has it about right, but that the text in section 10.2 should maybe be a little bit more explciit about what is not being described; e.g. replace the 2 paras from "Results can be thought of as ..." with: [[ This specification does not define exactly how such results are returned, as a query may be used in different contexts (e.g. query protcol, query API) for which different forms are appropriate. Results can be thought of as a table with one row per query solution, and a column for each variable in the query. Some cells may be empty because a variable is not bound in that particular solution. The SPARQL Query Results XML Format [ref] form of the above result set gives: ]] ... Section 10.3, para 2 [clarification]: The reference to "a warning may be generated" has me wondering how it is expected such a warning might be returned. Does the protocol spec have a means to return (a) warning(s) along with query results? ... Section 10.4, general [grumble]: I don't see why DESCRIBE is included in this specification, since what it returns it's so vague as to defy any prospect of interoperability. I think it would be better to provide an extensibility mechanism for additional result formats, which could be used by applications wishing to use DESCRIBE functionality without having to hopelessly overload a single query language element. If a common resource description format should be developed in future, it could then become standardized extension to the query language. ... Section 11, para 1 [editorial]: I found the introductory description of value testing was awkward and convoluted, with its focus on "effective Boolean values", and the subsequent and separate discussion of type errors. The whole area of handling type errors (which I think is a slight misnomer, as in SPARQL terms theyt;re not really errros, just mismatches with predicted results) seems to add an unnecessary layer of complexity to the description of filters. I think it would be easier to follow a description forumulated in terms of "satsifying" a value test, and go on to explain when expressions containing type mismatches may or may not be satisfied. Maybe, introduce an "undefined" value that propagates through expressions in a predicted fashion, which I think would lead to a simpler and more complete explanation of how type mismatches effect query results. (Also: section 11.2) ... Section 11.1 [typo]: s/constituant/constituent/ ... Sect 11.2.3.1, "known to have the same value" [clarification]: The discussion of sop:RDFterm-Equal seems to have some ambiguity, since it depenends upon how much the query processor knows about the datatypes concerned. What happens if a datatype is used that the query processor doesn't know how to test for equivalent values? Reading this, I'm reminded of the introduction of D-interpretations (datatyped interpretations) in the RDF semantics specification. ... Section 11.2.3.6, sop:regex, general [implementation concern]: I've mixed feelings about inclusion of this function, as it seems to place a non-trivial complication to SPARQL processor implementations in environments that don't already include a conforming REGEX functionality. Is this really essential to a significant majority of applications? ... That's about it. I hope it helps, and apologies again for being late with my comments. #g -- Graham Klyne For email: http://www.ninebynine.org/#Contact
Received on Saturday, 10 September 2005 12:56:07 UTC