- From: Dave Beckett <dave.beckett@bristol.ac.uk>
- Date: Mon, 4 Oct 2004 15:14:03 +0100
- To: Dan Connolly <connolly@w3.org>
- Cc: Andy Seaborne <andy.seaborne@hp.com>, Dave Beckett <Dave.Beckett@bristol.ac.uk>, Howard Katz <howardk@fatdog.com>, Steve Harris <S.W.Harris@ecs.soton.ac.uk>, RDF Data Access Working Group <public-rdf-dawg@w3.org>
Reviewing http://www.w3.org/2001/sw/DataAccess/rq23/ $Log: Overview.html,v $ Revision 1.77 2004/10/03 13:06:28 eric Completed. I'll now take a look at 1.77->1.79 changes Dave Items that I think must be fixed before publication --------------------------------------------------- See also MUSTFIX in detailed notes below. Summarising: * First sentence in 1. Introduction is wrong. RDF is a set of triples. * Consistency in use of individuals, sets of individuals examples: b in B used ok however T defined as a set and used as a member of that set, also defined as tp. T in GP should be tp in GP. See comments on definitions of Triple Pattern, Triple Pattern Matching, Graph Pattern, Graph Pattern Matching * Initial Binding definition baffles me, I need more explanation. General Comments ---------------- A thorough spellcheck is needed. Label all examples with Numbers, titles and add anchors. Add all example queries, data files as separate files with URIs, link to them. Add them to the test suite. Add labels and anchors to all definitions. Do not use underlining in the html style when it isn't a link. In query results, some of the tables use ?x and some use bare x. Some results use both! Suggest global s/<tt>OPTIONAL</tt>/optional/ since the OPTIONAL keyword is never explained in the document and only appears in the grammar. Detailed Comments ------------------ These should be fixed but are not critical. Title: SPARQL title does not mention protocol despite the 'P' in the name. Later on the document suggests that protocol is a separate document. Abstract typo: "end users [missing words] to write" ToC missing 4.3 8 "Chosing What to Query" to match document capitals 12.2 ditto Appendices labelled 1,2 actually A, B in doc suggest removing see also, old material. It's not ToC. 1 Introduction MUSTFIX: First sentence is wrong. The abstract syntax for RDF is not a "graph of nodes and arcs, often expressed as triples". It is a set of triples called an RDF graph formally defined in RDF semantics. It can be and is often described as a graph of nodes and arcs but RDF is not nodes+arcs; that was an RDF core decision closely argued. preference to graph "created dynamically" than "partly calculated on demand" (un-numbered section) Document Outline @@variables bound@@, @@bindings@@ can be linked to forward references "10 - Summary" doesn't match the style of the other paragraphs - no explanation 2 Making Simple Patterns last sentence preference to "[Simple] patterns can be ..." [All graph pictures are unreadable when printed out, too dark. Please re-compose on a light background or with much greater contrast. black on gray doesn't work.] First example. I suggest not using _:1 _:2 since it's not legal in N3, Turtle, N-Triples for blank node labels. I think a small edit can make the first example executable, testable. I'd prefer full names for variables, for easy of readability especially by non-native english speakers. So 'address' not 'addr' and something else instead of 'addrm' 2.1 P2 URIref expand to URI Reference for first use. Or use the correct definition RDF URI Reference and link to it. grammar - "XML. Qname" - delete the "." Link to QName in XML sepcs. datatype URIRef not URI Para "Because.." here and later I see "URIs used" - check for consistency. I suggest s/URI/URIref/ throughout N3/Turtle used without a reference, explanation. Spellings "intpretted" Para "Prefixes are..." refering to an earlier query, but it doesn't say which of the three previous it means. Suggest "same query as the previous one" 2.2 Triple Examples P1 grammar s/for for/for/ P2 "bnodes" introduced without explanation. Should be "blank node labels" [ref RDF docs] abbreviated to BNodes. Doesn't say which positions that bnodes can be used in. Definition RDF Term This implies that query variables are in the RDF data model since they are along with U, L and BN. I suggest moving to another block since V is not used till later. Maybe after/near Query Variable? Definition Query Variable This defines an individual, all the RDF Term definitions are sets. No letter is assigned to typically use it. Suggest "A query variable qv". OR define the set Q. Defn. Triple Pattern (spelling, grammar) "A triple pattern is [a] triple of 3 slots subject, predicate, object .." MUSTFIX: "union Q" <- Q is never defined. Q presumably is a set of Query Variables, in which case it is NOT Q, but a set of qv, or define Q as a set of qv. This also defines 'ground' but that is not pulled out. Suggest make it a separate 'Definition: Ground' block. Definition Binding suggest use B for variable, as they are used uppercase elswhere too. Suggest give an example for the convention for writing down a binding such as (f, "value") or ?f="value" or the tabular form --------- | ?f | --------- |"value"| --------- Suggest give an example of a set of bindings such as {?f="value", ?g="value2"} or the tabular form given later. Definition A substitution suggest uppercase "Substitution" Suggest not using B as a set of Bindings, but use SB or something to differ from lowercase 'b' as an individual binding. So this is a mapping S(set of b) How can a set of bindings define a substitution? Suggest rewording "A substitution S(B) on a set of bindings B maps a triple pattern ..." suggest ... "by the corresponding [variable] value" Suggest putting a subst() example. Definition Triple Pattern Matching MUSTFIX: I think there is a triple pattern/set of triple pattern issue here unless you are solely comparing a graph with one triple. T was earlier defined as a set of triple pattern. So subst(T, b in B) is not a substitution of a triple pattern, but of a set of triple patterns (and a binding b in B). Could re-use tp in T which was used in defining ground, and define subst(tp in T, b in B). Then edit to match such as 'Triple Pattern tp matches ...' Use of entails, reference/link to RDF entailment. rdfs: prefix is used in the second data, this was not defined as convention earlier. brql/sparql predefines rdf: but not rdfs:? 2.3 Graph patterns P1 "There are bNodes" No, there is 1. grammar: "not in the RDF graph [nor in] any query" Para "The next query.." but there is no query following. Confused. Does that mean the query just given Also grammar: "one or more triple patterns which must all match for the graph pattern to match." - the 'all' and 'one or more' say different things. Is it all or 1? Maybe the definition following explains better, remove? Definition: Graph pattern MUSTFIX: "A conjunctive Graph Pattern GP is a set of triple patterns T." T was earlier defined as; "let T be the set of triple patterns := A x A x A" So GP=T ? Not quite what was meant. GP is set of tp, where tp is a Triple Pattern in T. Maybe triple pattern & triple patterns are too hard to use and make nice sentences. Other suggestions ; triple pattern set. Defn: Graph Pattern - Conjunction Defines "conjunctive Graph Pattern" not the title of the definition. html - underlining doesn't match too Defn: Graph pattern Matching Hmm, confused by "same" in: "For a graph pattern to match, each triple pattern must match with each query variable having the same value whereever it occurs." suggestions "For a graph pattern GP to match, all triple patterns tp in GP must match with all query variables in all tp having the same value." This actually defines "Graph Pattern GP matches", not "Graph Pattern Matching" Using T in GP which is a (set of triple patterns). Probably should be tp in GP. MUSTFIX: [[ For all T in GP, subst(T, B) is a triple entailed by G. subst(GP, B) is the graph pattern formed by subst(T, B) for all T in GP. subst(GP, B) is a subgraph entailed by G if all triple patterns are grounded. ]] This is reusing subst(t in TP, b in B) redefined over graphs I suggest changing the name to graphsubst(GP, B) to distinguish it. subst(T in TP, b in B) returns a triple pattern, may not be ground. Suggestion: For all tp in GP, subst(tp, B) is a triple pattern entailed by G. graphsubst(GP, B) is the graph pattern formed by subst(tp, B) for all tp in GP. graphsubst(GP, B) is a subgraph entailed by G if all triple patterns are grounded. 2.4 Multiple Matches "The results of query are all the ways a query can match the graph being queried. Each match is one solution to the query and there may be zero, one or multiple solutions to a query, depending on the data." This uses "results", "solutions" and "matches", not in the same was as previously defined. I suggest using results only, and use match to mean graph matches, triple matches as used above: "2.4 Multiple results The results of query are all the ways a query can match the graph being queried. Each result is one solution to the query and there may be zero, one or multiple results to a query, depending on the data." Aside: A query actually hasn't been defined yet. It's hinted that it is something to do with graph pattern, but it hasn't been said so far. i.e. no. Or if sticking with "matching" make it clearer what the difference between a result and a solution is. Example query has commas between variables. Die. "When the query can match the data in more than one way, each possibility is returned as a solution to the query. In addition, we have more than one selected variable so each solution contains two bindings of variables to values." so now there are results, query matches, solutions and possibilities :) Query matching data hasn't been discussed. Graph patterns matching Graphs has been, could be reused. Could also refer to sets of bindings. ... and now Query Solution is given. definition Query Solution: "For conjucntion graph pattern GP, subst(GP, B), has no variables." spelling: conjunction. Also could add ".. and is a set of ground triple patterns" or possibly define a Ground Graph Pattern. 3 Constraining Values (Here the query uses selected variables without a comma) Definition: Value Constraint "A value constraint is a boolean expression that can be applied to restrict graph pattern solutions." For me that doesn't read as an expression that can refer to non-boolean things as parts of the expression but which has a boolean value. Definition: Query Stage (partial definition). "Graph Pattern (set of triple patterns) + set of Value Constraints. QS : GP x VR" + and x ? + doesn't mean addition here but...? You cannot join/merge a set of triple patterns and a set of value constraints. VR is not defined. Presumably means a set of value constraints. Later on VC seems to be used for that. spelling in comment: [[ operations [like] "source" ]] I prefer Query Block. 4 Including Optional Values grammar "The graph matching and value constraints [presented] so far ..." [here select vars have no commas] html/spelling "there is [an] mbox" - make mbox <tt> too, like in previous para "Failure to match does not ..." suggest "failure to match any of the triples in the optional block does not ..." spelling "optional block" not bock 4.2 Multiple Optional Blocks "Multiple OPTIONAL blocks " so far the OPTIONAL keyword has not been mentioned, and indeed it is not given in this section either. Suggest s/<tt>OPTIONAL</tt>/optional/ in 4.2 The constraints on variables seem to allow the same optional variable to be bound in different nested optional blocks, as long as they are not at the "given level of nesting" or "in the same containing block". Those two constraints seem to clash or at least constrain it in two ways of which I'm not sure is complete. Level of nesting presumably doesn't mean, anywhere inside 2 []s. How about these: Graph Pattern 1: ( ?q :a :a ) [ ( ?q :b ?x) ] [ ( ?q :b ?y) ] [ ( ?q :b ?x) ] <- same level of nesting, same containing block FORBIDDEN Graph Pattern 2: ( ?q :a :a ) [ [ ( ?q :b ?x) ] [ ( ?q :b ?y) ] ] [ ( ?q :b ?x) ] <- different level of nesting, containing block, allowed? 4.3 Optional Matching Definition: Initial Binding "The result of a query stage,QS = (GP, VC), with an initial binding B, has Query Result where all the bindings in B are valid (the graph pattern and any value constraints in QS). B extended with addition bindings given by matching subst(GP, B) and constraining with VC." VC is used here, never defined. Presumably refers to Value Constraint However Query Stage was earlier (partially) defined as QS: GP x VR grammar: "has [a] Query Result", "B [is] extended with addition[al] .." MUSTFIX: More substantially; after several re-readings, I don't understand this definition. Can I ask for some more explanation please? Definition: Graph Pattern - Optional Match "An optional match of QS, with initial binding B, the match of QS with initial binding B if there exists at least one solution, and is B otherwise." grammar: "binding B, [is] the match..." That seems to define an optional match of a query stage, not of a graph pattern. Is the definition title correct? 5 Nested Patterns Nesting was already mentioned in 4.2. Definition: Graph Pattern - Nesting This definition I note, excludes nested VC - good! The example query uses ()s for nesting (you should mention it before the example what the extra ones are for (which is like lisp (like this))) "Since this definition makes a inner pattern just be a conjunctive element of the outher pattern, and because a graph patterns of triple patterns is also the conjunction, this is the same as:" spelling: outher=>outer grammar: "because [] graph patterns of [graph] patterns [are] also [] conjunctions ..." "Optional blocks can be nested. The outer optional must match for any inner ones to apply. That is, the outer optional triple patterns is fixed for the purposes of any inner optional block." s/triple patterns/graph pattern/ grammar: "optional [block]" Let me use that to read from: "Optional blocks can be nested. The outer optional block must match for any inner ones to apply. That is, the outer optional graph pattern is fixed for the purposes of any inner optional block." So it means, using nested optional patterns are essentially subqueries where the outer optional graph pattern is used as a must-match graph pattern and the inner optional blocks relative to that as optional graph patterns Query result has typo in gname result #3: "EveE should be "Eve" grammar: "... query only access[es] these ..." This example does hint at the usefulness of the nested patterns however I think the details of the operation and restrictions on binding with optionals are incomplete. Maybe add more words to the intro status for this section re completedness. Sections 6-7: Placeholders Not reviewed 8 Choosing What to Query Definition: Target graph "The target graph of a query." Ok, this must be a sketch. Especially with the current discussion of graphs. Maybe expand a little, "... to which a query may be applied". I recall that we discussed these words and ended up pruning them. [[ SELECT ... FROM <uri1>, <uri2> ]] Commas, die grammar: "Implementations [may] provide " 9 Querying the Origin of Statements The status here probably needs expanding to "under discussion and will change" "the following term." I guess "term" should be triple pattern or nested graph pattern? Those are the two choices I think. Note " As with OPTIONAL, a variable that is bound to NULL must not match another variable that is bound to NULL. " seems to be worthy of being in the body rather than parenthetical to the main text. Can you delete the red text? All that was notes from 2 FTFs in the past, we've discussed a lot more things since then and have an issues list to track things too. 10 Summary of Query Patterns Link to the definitions of all the terms here Suggest you use QP for query pattern rather than GP - confuses with graph pattern. I don't think it's possible to apply the term 'matches' to all the elements given here. match is only defined for triple patterns and graph patterns. Could just add a status note to this section that it is initial draft. 11 Query Forms + status note? "These result forms use the bindings in the query results to form result sets or RDF graphs." what's a result set? there are Query Results (set of bindings) and Query Solution. This is the first mention of result set. Is it not a set of solutions? spelling: "Returns either [an] RDF graph that ..." 11.1 Choosing which Variables to Return SELECT DISTINCT "The result set can be modified by adding the DISTINCT keyword which ensures that every set of variables for a query solution is different from the other sets of variables returned. Thought of as a table, each row is differen" "set of variables" should be Query Result; it's the variable names and values that matter (Bindings) 11.2 Constructing an Output Graph "If no pattern is supplied, instead "*" is used," s/pattern/graph template/ That might be better as "*" indicates an empty graph template is supplied. however that isn't quite right, as when an empty graph template is used, the variables are instead substituted into the query pattern. So maybe should be "*" indicates that the graph template is the query pattern. 2 paragraphs later, this is spelt out in more detail. "... each matching of the query pattern." => each solution? "The form CONSTRUCT * WHERE {query pattern} is shorthand for CONSTRUCT {pattern} WHERE {pattern}, that is, the query pattern is the same as the construct pattern. Consistency here and elsewhere in 11.2 - use of graph template and construct pattern for the same thing. WHERE {.. }s should be real examples and not using {}s Prefer re-ordering to: "... signifies the construct pattern[graph template] is the query pattern" 11.3 Descriptions of Resources placeholder text. syntax - n3 needs adding proper example namespace URIs 11.4 Asking "yes or no" questions Add a Query Result with either YES or NO suggested format 12 Testing Values placeholder text. 12.2 Extending Value Testing placeholder text. A. SPARQL Grammar Some of my previous comments in [1] still apply such as: * Die CommaOpt * Use FOO+ not FOO FOO? for one or more * OPTIONAL keyword * A ::= B with only one use of A (all non-terminals) should be inlined * E/BNF used has no reference. Preference to XML's Additional: What does SOURCE * mean ? Add some comments to say why NCCAME, NCCHAR1 is done like this. Pattern Literal needs expanding too No idea what (~[">"," "])* means without consulting some EBNF documentation; where's that from? complement of set? B. References W3C style fixes needed - expanding to have URIs, latest versions, dates, organisations. Check they are cited in the document
Received on Monday, 4 October 2004 14:23:51 UTC