- From: Fred Zemke <fred.zemke@oracle.com>
- Date: Thu, 12 Jan 2006 09:47:40 -0800
- To: public-rdf-dawg-comments@w3.org
2.1.1 Syntax of IRI terms The word "term" appears here without introduction or definition. It might help to say that the primary lexical constituent of triple patterns is terms, of which there are n varieties (fill in the correct number. At a cursory glance, I see IRI terms, literal terms, and variables. Much later in the specification, blank nodes appear as another kind of term). 2.1.5, Examples of query syntax It says "nor do prefix names in queries need to be the same prefixes as used for data". But this specification does not provide a language for describing or entering data. Compare with section 2.1.6 "Data descriptions used in this document" which says that this specification uses Turtle to portray RDF data. Perhaps there are other representations for RDF, some of which also provide prefixes. Perhaps the statement could be fixed by changing it to "nor do prefix names in queries need to be the same as prefixes used in some language for portraying RDF data, such as Turtle." 2.1.7 Result descriptions used in this document This introduces the phrase "RDF term", which is not defined until section 2.2 "Initial definitions". A hot link to the definition might be useful. In addition, section 2.1.1 used the word "term", evidently to mean a lexical token of SPARQL. This is potentially confusing. Possibly one of the following ideas would be useful: -- change "term" to "token" in section 2.1.1 -- change "term" to "SPARQL term" in section 2.1.1, to create a constrast with "RDF term". 2.2 Initial definitions The definition of "query variable" does not mention the lexical requirements of VARNAME in Appendix A. 2.4 Pattern solutions It says "The result of replacing every member v of W in a graph pattern P by S(v) is written S(P)". But what is S? I think you mean, "If S is a pattern solution, then...". Or you could reword the sentence "A pattern solution is a substitution function..." to define S in that sentence. 2.5 Basic graph patterns It says "The SPARQL syntax uses the keyword WHERE to intoduce the Query pattern." But the grammar in Appendix A shows that WHERE is optional. The reality appears to be "the SPARQL grammar uses curly braces to enclose the query pattern. Optionally the keyword WHERE may be used immediately prior to the opening curly brace." An alternative solution would be to make WHERE mandatory in the grammar. However, it would still be good to state that graph patterns are enclosed in curly braces. 2.6 Multiple matches It says "The results of a query are all the ways a query can match the graph being queried". But you have introduced formal terminology; why are you not using it? In your formal terminology, what you mean is "The results of a query is the set of all pattern solutions that match the dataset of the query." It is probably okay to include the informal translation too. 2.7.1 Blank nodes and queries There are no examples in this section. The reader is left with the impression that section 2.7.2 "Blank nodes and query results" is intended to provide the examples for section 2.7.1. But I think that the two sections are actually orthogonal topics. Section 2.7.1 talks about blank nodes in the query, such as SELECT ?x WHERE { _:a foaf:name ?x .}, whereas section 2.7.2 talks about blank nodes in the result. Perhaps the solution is to delete section 2.7.1, which appears to be completely redundant with section 2.8.3 "Blank nodes". 2.7.1 Blank nodes and query results It says "A blank node in a query may match any RDF term". I think this wording is too loose. One might think that this means that a blank node is a wildcard that may match different RDF terms in different triple patterns. Example: SELECT ?x ?y ?z WHERE { ?x _:a ?y . ?x _:a ?z . } One might think that one can bind the blank node _:a to one RDF term in one triple and a different RDF term in a different triple of the dataset (as if the example were SELECT ?x ?y ?z WHERE { ?x * ?y . ?x * ?z . } using * to indicate a wildcard for the verb in each triple). However, the definition of pattern solution in section 2.4 seems to indicate that the same mapping of a blank node to an RDF term is required for each triple pattern. This should be reiterated here. 2.8.1 Object lists It would be helpful to show an example that uses both predicate-object lists and object lists, for example ?x v:erb1 ?z, ?w ; v:erb2 ?r, ?s . is equivalent to ?x v:erb1 ?z . ?x v:erb1 ?w . ?x v:erb2 ?r . ?x v:erb2 ?s . 2.8.3 Blank nodes What is the relationship between this section and section 2.7.1 "Blank nodes and queries"? Perhaps they can be combined or one of them can be deleted (probably section 2.7.1, which has no examples and is completely redundant with section 2.8.3). 2.8.4 RDF collections This section is too terse. Because the example shows an RDF collection with exactly three items, the reader might infer that an RDF collection is a triple constructor. However, the syntax in Appendix A indicates that much more than a single triple can be written within an RDF collection. It would be good to discuss the available syntactic options, with examples. 2.9 Querying reification vocabulary typo in second sentence"... can be queried be..." should read "...can be queried by...". 3.1.4 Matching with RDF D-entailment An example would be helpful. For example, knowing that "42"^^xsd:integer and "042"^^xsd:integer are eqiuvalent literals, the query SELECT ?x WHERE { ?x a 42 } will match a triple in a dataset x a "042"^^xsd:integer. 3.2 Value constraints This topic does not seem to be subordinate to the overall topic of section 3, "Working with RDF literals". Perhaps sections 3.2 and 3.3 should be transfered to section 4, "Graph patterns". Note that section 4 begins with a list of ways to build complex graph patterns, among which is value constraints, yet value constraints are not described in any subsection of section 4. 3.4 Matching values and RDF D-entailment This section is redundant with section 3.1.4, "Matching with RDF D-entailment". 4.1 Group graph patterns The defined term appears to be "group graph pattern". Consequently occurrences of "group pattern" should replaced by "group graph pattern". (For example, last sentence of first paragraph following the box.) 4.1 Group graph patterns It would be helpful to move the last sentence, ("In a SPARQL query string, a group graph patern is delimited by braces") earlier in this section. Before I reached that sentence, I had a very hard time deciphering the following sentence: "this query has a group pattern (sic, 'group graph pattern' is meant) of one basic graph pattern as the query pattern". It just seemed like you were running around in circles. 4.1 Group graph patterns It would be helpful to show an example with two consecutive group graph patterns. The example already in this section is equivalent to PREFIX foaf: etc SELECT ?name ?mbox WHERE { { ?x foaf:name ?name } { ?x foaf:mbox ?mbox } } 5. Including optional values It says "RDF is semi-structured". Actually, RDF is highly structured, especially compared to XML (which is routinely called semi-structured) since RDF consists entirely of triples. This makes RDF even more structured than relational databases (though RDF is weakly typed compared to most relational databases). This sentence is not necessary and can be deleted. The remainder of the paragraph is still true (for example, in relational database terms, you are talking about outer joins, which are a highly useful feature that was absent from the earliest formulations of relational database technology.) Being structured is irrelevant to the utility of optional matching. 7. RDF dataset typo, First sentence: "comprising of" -> "comprising" or "consisting of". 7. RDF dataset The last definition, of "RDF dataset graph pattern", items 1 and 2 refer to "dataset {Gi, (<u1>, G1), ... }". This is confusing because the preceding definition refers to dataset {G, (<u1>, G1), ... }. The reader is left wondering whether this is a typo, but if so, what is the role of the <ui>'s and Gi's? I think what you are trying to say in items 1 and 2 is that in the second definition, the Gi gets treated like the default dataset does in the first definition. But in that case, why not unravel the logic for the reader? The first definition translates matching a pattern P (other than an RDF dataset graph pattern) down to matching the default graph. Why not just use that language in the second definition as well? The two items would read: "1. g is an IRI where g = <ui> for some i, and P matches Gi with solution S. 2. g is a variable, S maps the variable g to <ui>, and P matches Gi with solution S." 10.1 Solution sequences and result forms First sentence: "each solution being a function from variables to RDF terms". Actually, you mean, "a function from variables and blank nodes to RDF terms." See section 2.4 "Pattern solutions". (However, my prefered resolution is to remove blank nodes from the language, as noted in a separate comment.) 10.1.1 Projection You should also note that blank nodes are always projected out of the solution sequence. In terms of section 10.1.2 "DISTINCT", this means that it is possible to get duplicates in the result even if all variables are retained. Example: SELECT ?x WHERE { [] v:loves ?x } finds all RDF terms that are the object of the verb v:loves. If the dataset consists of "Bob" v:loves "Alice" . "Carl" v:loves "Alice" . Then I think the solution sequence is { ([] = "Bob", ?x = "Alice"), ([] = "Carl", ?x = "Alice") }. After projecting away the blank node, the sequence is { "Alice", "Alice" }. 10.1.3 ORDER BY Issues in the five-point arbitrary ordering: 1. what is meant by a "plain literal before an RDF literal with type xsd:string of the same lexical form"? The inscrutable terms here are "plain literal", "before" (points 1 through 5 prescribe an ordering, so "before" presumably does not indicate the ordering, it must mean something else), and "same lexical form". 2. is there any order to RDF literals? Note that there is a paragraph following the five-point ordering which explains that "IRIs are ordered by comparing the character strings making up each IRI". 3. Does language tag have any influence on ordering of RDF literals? 4. What is the relative ordering of two literals that have types of incomparable categories (for example, comparing a numeric and a dateTime, a numeric and an xsd:string, or a dateTime and an xsd:string)? My conjectured resolution is that point 5 "A plain literal..." should be eliminated, and point 4 should be amplified with a follow-on paragraph to clarify the ordering of all RDF literals. Such a follow-on paragraph might say, for example, "Two RDF literals L1 and L2 are ordered as follows: 1. If L1 and L2 are both numeric, both xsd:dateTime, or both xsd:string, then they are ordered according to the operator '<' in the Operator mapping table. 2. Otherwise, let LF1 and LF2 be the lexical forms of L1 and L2 (ie, the portion of the literal enclosed in single or double quotes, after replacing any escape characters by their equivalents). LF1 and LF2 are compared using Unicode code point order, applied lexicographically, to determine the order of L1 and L2. The language tags of L1 and L2, if any, are ignored. If LF1 = LF2, LF1 has no datatype and LF2 has type xsd:string, then L1 < L2. This still leaves unanswered what is the relative ordering of the following pairs: "12"^^xsd:integer and "12" "12 ^^xsd:integer and "12"^^xsd:string 10.1.3 ORDER BY The specification uses the ordering of types numeric, dateTime and xsd:string, but not xsd:boolean. Maybe this is not a problem, since "false" precedes "true" in an alphabetic ordering of the xsd:boolean type anyway. Still, it raised my eyebrows that the ordering of xsd:boolean was not used. 10.1.3 ORDER BY It says "IRIs are ordered by comparing the character strings making up each IRI". Fine, but how does this ordering work? Perhaps it is Unicode code point order applied lexicographically to the IRI? 10.1.3 ORDER BY Do language tags have any role in ordering? For example, what is the relative ordering of "the"@en and "the"@fr? 10.2 Selecting variables It says that "The syntax SELECT * is an abbreviation that selects all of the named variables". What is a named variable? This term is not defined. I think all variables have names. Probably you can just delete "named". 10.3.2 Accessing graphs in the RDF dataset It might also be interesting to the reader to note that CONSTRUCT can be used to construct a graph with IRIs that are different from the IRIs in the input graph. The technique is to create an xsd:string corresponding to the desired IRI and cast to IRI type. 10.3.2 Accessing graphs in the RDF dataset issues with the definition of graph template: 1. The term "triple pattern" is a poor choice of terminology, because in most of the specification, the word "pattern" refers to a pattern to be matched. A better term would be "triple template". 2. There is no definition of what S(tj) is. Perhaps the reader is supposed to recall the definition in section 3.3 "Value constraints - definition", which defines S(C) where C is a constraint. However, tj is not a constraint. It might be a good idea to define (or repeat the definition) of S(tj). 10.4.3 Description of resources typo, first sentence: "...is the determined by...": delete "the" typo after second box: "as well information which as name and other...": possibly you mean "as well as information such as name..." or possibly "as well as information which has name...". typo, formal definition, last sentence: "does not proscribe". "proscribe" means "prohibit"; you want "prescribe", which means "specify". 11. Testing values It says "the operands of these functions and operators are the subset of XML Schema datatypes...". But the operands are values of these types, not the types themselves. 11.2.3.1 bound The text around the examples contains misstatements. The text preceding the first sample query says "This query finds the people without a dc:date property" whereas in fact it finds the people who do have a dc:date. The sentence following the second sample query is also wrong. It says "Because Alice's mbox was known, "Alice" was not a solution to the query" but "Alice" does not have a mailbox and "Alice" is a solution to the query. 11.2.3.3 isBlank The text before the sample query is a cut-and-paste error, a duplicate of the text in 11.2.3.2 "isIRI". A.2 White space last sentence "As a hint, rule names below in capitals indicate a possible choice of terminals". Who has this possible choice? There are two consumers of this document: implementers and users. The entire grammar defines a space of choices for the language user, so I don't think this sentence is pitched at the user. I think you mean "possible choice of terminals for those who are constructing a SPARQL parser". Appendix A.7, Grammar Rules [43] "Expression" through [51] "UnaryExpression" follow a top-down pattern in the order of presentation. Then rule [51] "UnaryExpression" requires the definition of PrimaryExpression, which one expects to be the next BNF. Actually, PrimaryExpression occurs as rule [58], and many (though not all) of its constituents appear in rules [52] "BuiltinCall" through [57] "BracketedExpression". It would be better to rearrange the rules in the following order: [58] "PrimaryExpression" [57] "BracketedExpression" [52] "BuiltinCall" [53] "RegexExpression" [55] "IRIrefOrFunction" [56] "ArgList" As for [54] "FunctionCall", this rule is used in [16] OrderCondition, but, in a separate comment, I think that arbitrary expressions should be permitted in OrderCondition, not just function calls. Appendix D, Collected formal definitions This appendix is listed in the table of content but is not present, not even as a to-be-done item. I would appreciate having this appendix. I suspect that the current formal definitions have omissions, inconsistencies, etc., but it is very hard to check currently. No particular location The terms "solution" and "pattern solution" are both in use in the document. For consistency it would be better to pick one of these terms and use it exclusively. Fred Zemke
Received on Thursday, 12 January 2006 17:48:02 UTC