Re: editorial comments on SPARQL Query Lanuage for RDF [OK?] from Seaborne, Andy on 2006-02-10 (public-rdf-dawg-comments@w3.org from February 2006)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Fri, 10 Feb 2006 11:58:04 +0000
To: Fred Zemke <fred.zemke@oracle.com>
CC: public-rdf-dawg-comments@w3.org
Message-ID: <43EC7FCC.3050501@hp.com>
Fred Zemke wrote:
> 2.1.1 Syntax of IRI terms
> The word "term" appears here without introduction or definition.
> It might help to say that the primary lexical constituent of
> triple patterns is terms, of which there are n varieties (fill in
> the correct number.  At a cursory glance, I see IRI terms, literal
> terms, and variables.  Much later in the specification, blank nodes appear
> as another kind of term).

Removed the word term in 2.1.1 and 2.1.2
Linked to RDF term from the intro para of section 1.

> 2.1.5, Examples of query syntax
> It says "nor do prefix names in queries need to be the same prefixes
> as used for data".  But this specification does  not provide a language
> for describing or entering data.  Compare with section 2.1.6 "Data
> descriptions used in this document" which says that this specification
> uses Turtle to portray RDF data.  Perhaps there are other
> representations for RDF, some of which also provide prefixes.
> Perhaps the statement could be fixed by changing it to "nor do prefix
> names in queries need to be the same as prefixes used in some
> language for portraying RDF data, such as Turtle."

Changed to:

"""
nor do prefix names in queries need to be the same prefixes as used in a
serialization of the data.
"""


> 2.1.7 Result descriptions used in this document
> This introduces the phrase "RDF term", which is not defined until
> section 2.2 "Initial definitions".  A hot link to the definition
> might be useful.

Link added.

> 
> In addition, section 2.1.1 used the word "term", evidently to mean
> a lexical token of SPARQL.  This is potentially confusing.  Possibly
> one of the following ideas would be useful:
> -- change "term" to "token" in section 2.1.1
> -- change "term" to "SPARQL term" in section 2.1.1, to create a
> constrast with "RDF term".

See above.

> 
> 
> 2.2 Initial definitions
> The definition of "query variable" does not mention the lexical
> requirements of VARNAME in Appendix A.

Added to 2.1.3:
"""
The possible names for variables are given in the SPARQL grammar.
"""
with a link to VARNAME.

> 
> 
> 2.4 Pattern solutions
> It says "The result of replacing every member v of W in a graph pattern
> P by S(v) is written S(P)".  But what is S?  I think you mean,
> "If S is a pattern solution, then...".  Or you could reword the
> sentence "A pattern solution is a substitution function..." to define
> S in that sentence.

Inserted the name of the pattern solution:
"A pattern solution, S, is a variable substitution ...."


> 2.5 Basic graph patterns
> It says "The SPARQL syntax uses the keyword WHERE to intoduce the
> Query pattern."  But the grammar in Appendix A shows that WHERE is
> optional.  The reality appears to be "the SPARQL grammar uses curly
> braces to enclose the query pattern.  Optionally the keyword WHERE
> may be used immediately prior to the opening curly brace."  An alternative
> solution would be to make WHERE mandatory in the grammar.  However,
> it would still be good to state that graph patterns are enclosed in
> curly braces.

Removed - the sentence is no longer appropriate at this point.  And WHERE has
already been used in earlier examples.

> 2.6 Multiple matches
> It says "The results of a query are all the ways a query can match
> the graph being queried".  But you have introduced formal terminology;
> why are you not using it?  In your formal terminology, what you mean is
> "The results of a query is the set of all pattern solutions that match
> the dataset of the query."  It is probably okay to include the informal
> translation too.

Added the suggestion of "The results of a query is the set of all pattern
solutions..."

> 2.7.1 Blank nodes and queries
> There are no examples in this section.  The reader is left with the
> impression that section 2.7.2 "Blank nodes and query results" is
> intended to provide the examples for section 2.7.1.  But I think
> that the two sections are actually orthogonal topics.  Section 2.7.1
> talks about blank nodes in the query, such as
> SELECT ?x WHERE { _:a foaf:name ?x .}, whereas section 2.7.2 talks
> about blank nodes in the result. 
> 
> Perhaps the solution is to delete section 2.7.1, which appears to
> be completely redundant with section 2.8.3 "Blank nodes".
> 2.7.1 Blank nodes and query results
> It says "A blank node in a query may match any RDF term". 
> I think this wording is too loose.  One might think that this means
> that a blank node is a wildcard that may match different RDF terms
> in different triple patterns.  Example: 
> SELECT ?x ?y ?z WHERE { ?x _:a ?y . ?x _:a ?z . }
> One might think that one can bind the blank node _:a to one RDF term
> in one triple and a different RDF term in a different triple
> of the dataset
> (as if the example were SELECT ?x ?y ?z WHERE { ?x * ?y . ?x * ?z . }
> using * to indicate a wildcard for the verb in each triple).
> However, the definition of pattern solution in section 2.4 seems to
> indicate that the same mapping of a blank node to an RDF term is
> required for each triple pattern.  This should be reiterated here.

I have incorporated your suggestion and removed the section 2.7.1, added a new
2.1.5 in the syntax of query terms and made 2.7 purely about blank nodes in
results.

The revised section 2.5 details how blank nodes are involved in entailment.

> 2.8.1 Object lists
> It would be helpful to show an example that uses both predicate-object
> lists and object lists, for example
> ?x v:erb1 ?z, ?w ; v:erb2 ?r, ?s .
> is equivalent to
> ?x v:erb1 ?z .
> ?x v:erb1 ?w .
> ?x v:erb2 ?r .
> ?x v:erb2 ?s .

Example added.  I also noted that object lists are conjunctions in case it is
assumed they are disjunctions.

> 2.8.3 Blank nodes
> What is the relationship between this section and section 2.7.1 "Blank
> nodes and queries"?
> Perhaps they can be combined or one of them can be deleted (probably
> section 2.7.1, which has no examples and is completely redundant with
> section 2.8.3).

As above - 2.7.1 deleted.

> 2.8.4 RDF collections
> This section is too terse.  Because the example shows an RDF collection
> with exactly three items, the reader might infer that
> an RDF collection is a triple constructor.  

Made it 4 items (2 looks a but like a predicate-object pair).

      >
> However, the syntax in
> Appendix A indicates that much more than a single triple can be written
> within an RDF collection.  It would be good to discuss the available
> syntactic options, with examples.

Included a mixed for example with nested collection.

> 2.9 Querying reification vocabulary
> typo in second sentence"... can be queried be..." should read
> "...can be queried by...".

Fixed

> 
> 
> 3.1.4 Matching with RDF D-entailment
> An example would be helpful.  For example, knowing that
> "42"^^xsd:integer and "042"^^xsd:integer are eqiuvalent literals, the
> query SELECT ?x WHERE { ?x a 42 } will match a triple in a dataset
> x a "042"^^xsd:integer.

3.1.4 should have been removed before.  Now done.
> 
> 
> 3.2 Value constraints
> This topic does not seem to be subordinate to the overall topic of
> section 3, "Working with RDF literals".  Perhaps sections 3.2 and 3.3
> should be transfered to section 4, "Graph patterns".  Note that section
> 4 begins with a list of ways to build complex graph patterns, among
> which is value constraints, yet value constraints are not described in
> any subsection of section 4. 

Section 4 only covers group patterns, the other items in that list all have
their own section.  It's because groups can combine other patterns, that they
are listed.

> 
> 
> 3.4 Matching values and RDF D-entailment
> This section is redundant with section 3.1.4, "Matching
> with RDF D-entailment".
> 
> 
> 4.1 Group graph patterns
> The defined term appears to be "group graph pattern".  Consequently
> occurrences of "group pattern" should replaced by "group graph pattern".
> (For example, last sentence of first paragraph following the box.)

Done

> 
> 
> 4.1 Group graph patterns
> It would be helpful to move the last sentence, ("In a SPARQL query string,
> a group graph patern is delimited by braces") earlier in this
> section.  Before I reached that sentence, I had a very hard time
> deciphering the following sentence: "this query has a group pattern
> (sic, 'group graph pattern' is meant)
> of one basic graph pattern as the query pattern".  It just seemed like
> you were running around in circles.  

Done.

> 
> 
> 4.1 Group graph patterns
> It would be helpful to show an example with two consecutive group graph
> patterns.  The example already in this section is equivalent to
> PREFIX foaf: etc
> SELECT ?name ?mbox
> WHERE { { ?x foaf:name ?name } { ?x foaf:mbox ?mbox } }

Using the extra level of {} produces a query with two basic graph patterns
(each of one triple pattern).

> 5. Including optional values
> It says "RDF is semi-structured".  Actually, RDF is highly structured,
> especially compared to XML (which is routinely called semi-structured)
> since RDF consists entirely of triples.
> This makes RDF even more structured than relational databases
> (though RDF is weakly typed compared to most relational databases).
> This sentence is not necessary and can be deleted.  The
> remainder of the paragraph is still true (for example, in relational
> database terms, you are talking about outer joins, which are a
> highly useful feature that was absent from the earliest formulations
> of relational database technology.)  Being structured is irrelevant to the
> utility of optional matching.

RDF is referred to be "semi-structured" meaning above the triple level.  As
the text is no longer necessary here, I have removed it.

> 7. RDF dataset
> typo, First sentence: "comprising of" -> "comprising" or
> "consisting of".

Interesting - a web search for "comprising of" yields only UK based websites.

Changed to "consisting of".

> 
> 
> 7. RDF dataset
> The last definition, of "RDF dataset graph pattern", items 1 and 2
> refer to "dataset {Gi, (<u1>, G1), ... }". This is confusing because
> the preceding definition refers to dataset {G, (<u1>, G1), ... }.
> The reader is left wondering whether this is a typo, but if so, what
> is the role of the <ui>'s and Gi's?  I think what you are trying to say
> in items 1 and 2 is that in the second definition, the Gi gets treated
> like the default dataset does in the first definition.  But in that
> case, why not unravel the logic for the reader?  The first definition
> translates matching a pattern P (other than an RDF dataset graph pattern)
> down to matching the default graph.  Why not just use that language
> in the second definition as well?  The two items would read:
> "1. g is an IRI where g = <ui> for some i, and P matches Gi with solution S.
> 2. g is a variable, S maps the variable g to <ui>, and P matches Gi
> with solution S."

Already done in the editors' working draft from another comment.  Also, the
definition of "RDF Dataset Graph Pattern" has moved to the start of section 8.

> 10.1 Solution sequences and result forms
> First sentence: "each solution being a function from variables to
> RDF terms".  Actually, you mean, "a function from variables and
> blank nodes to RDF terms."  See section 2.4 "Pattern solutions".
> (However, my prefered resolution is to remove blank nodes from the language,
> as noted in a separate comment.)

Under the revised definition of solutions, they explicitly only include
variables now.

> 
> 
> 10.1.1 Projection
> You should also note that blank nodes are always projected out of the
> solution sequence.  In terms of section 10.1.2 "DISTINCT", this means
> that it is possible to get duplicates in the result even if all
> variables are retained.  Example:
> SELECT ?x WHERE { [] v:loves ?x }
> finds all RDF terms that are the object of
> the verb v:loves.  If the dataset consists of
> "Bob" v:loves "Alice" .
> "Carl" v:loves "Alice" .
> Then I think the solution sequence is
> { ([] = "Bob", ?x = "Alice"), ([] = "Carl", ?x = "Alice") }. 
> After projecting away the blank node, the sequence is { "Alice", "Alice" 
> }. 

As above.

> 
> 
> 10.1.3 ORDER BY
> Issues in the five-point arbitrary ordering:
> 1. what is meant by a "plain literal
> before an RDF literal with type xsd:string of the same lexical
> form"?  The inscrutable terms here are "plain literal", "before" (points
> 1 through 5 prescribe an ordering, so "before" presumably does not
> indicate the ordering, it must mean something else), and "same lexical
> form".

Plain literal is terminology defined in RDF concepts, and I've linked to it.

"before" replaced by "lower" for consistency.

> 2. is there any order to RDF literals?

Yes - where possible, the "<" operator is used.

> Note that there is a paragraph following
> the five-point ordering which explains that "IRIs are ordered by
> comparing the character strings making up each IRI".

Moved to before the arbitrary ordering list.  That should also emphasis that
the list is an arbitrary ordering where "<" and IRI ordering does not apply.


> 3. Does language tag have any influence on ordering of RDF literals?

Not directly.  Literals with language tags can't be compared for ordering but 
the application can choose, say, lexical order by ordering by str(?x).

ORDER BY ASC(str(?x))


> 4. What is the relative ordering of two literals that have types
> of incomparable categories (for example, comparing a numeric and a 
> dateTime,
> a numeric and an xsd:string, or a dateTime and an xsd:string)?

That would be covered by "If the ordering criteria do not specify the order of
values, then the ordering in the solution sequence is undefined."  That is
also necessary to fit in other datatypes, not from the set required by SPARQL,
which might have some natural ordering between datatypes.

> My conjectured resolution is that point 5 "A plain literal..."
> should be eliminated, and point 4 should be amplified with a follow-on
> paragraph to clarify the ordering of all RDF literals.  Such a
> follow-on paragraph might say, for example,
> "Two RDF literals L1 and L2 are ordered as follows:
> 1. If L1 and L2 are both numeric, both xsd:dateTime, or both
> xsd:string, then they are ordered according to the operator '<' in the
> Operator mapping table.
> 2. Otherwise, let LF1 and LF2 be the lexical forms of L1 and L2
> (ie, the portion of the literal enclosed in single or double quotes,
> after replacing any escape characters by their equivalents).  LF1 and
> LF2 are compared using Unicode code point order, applied lexicographically,
> to determine the order of L1 and L2.  The language tags of L1 and L2,
> if any, are ignored.  If LF1 = LF2, LF1 has no datatype and LF2 has
> type xsd:string, then L1 < L2.
> 
> This still leaves unanswered what is the relative ordering of the following
> pairs:
> "12"^^xsd:integer and "12"
> "12 ^^xsd:integer and "12"^^xsd:string

That falls under the undefined clause because we might have:

      "12"^^app:base3 and "12"^^xsd:integer

where there is an ordering the implementation wishes to impose.
It would be expected that a query processor would only order things in knows
to be ordered when it is providing additional datatypes.

> 10.1.3 ORDER BY
> The specification uses the ordering of types numeric, dateTime and
> xsd:string, but not xsd:boolean.  Maybe this is not a problem, since
> "false" precedes "true" in an alphabetic ordering of the xsd:boolean
> type anyway.  Still, it raised my eyebrows that the ordering of
> xsd:boolean was not used.

"<" does not apply to xsd:booleans.
Lexical order is not applied (SPARQL does sorting by value, the arbitrary
classes rules, or does not define a relative order).

> 10.1.3 ORDER BY
> It says "IRIs are ordered by comparing the character strings making up
> each IRI".  Fine, but how does this ordering work?  Perhaps it is
> Unicode code point order applied lexicographically to the IRI?

Added "using the "<" operator."

> 
> 
> 10.1.3 ORDER BY
> Do language tags have any role in ordering?  For example, what is the
> relative ordering of "the"@en and "the"@fr?

Two RDF terms that have different language tags don't compare by the "<"
operator does not apply.

> 
> 
> 10.2 Selecting variables
> It says that "The syntax SELECT * is an abbreviation that selects all
> of the named variables".  What is a named variable?  This term is not
> defined.  I think all variables have names.
> Probably you can just delete "named".

Deleted "named" - this is also inline with refining other work on matching
basic graph patterns.

> 10.3.2 Accessing graphs in the RDF dataset
> It might also be interesting to the reader to note that CONSTRUCT
> can be used to construct a graph with IRIs that are different from the
> IRIs in the input graph.  The technique is to create an xsd:string
> corresponding to the desired IRI and cast to IRI type. 

It is true that new IRIs can be used in the CONSTRUCT template (as shown in
the first example of 10.3).  But casts of a string can't be done in the
template, nor is there assignment for expressions in SPARQL (variables only
end up with RDF terms from the graphs matched).

> 
> 10.3.2 Accessing graphs in the RDF dataset
> issues with the definition of graph template:
> 1. The term "triple pattern" is a poor choice of terminology,
> because in most of the specification,
> the word "pattern" refers to a pattern to be matched.  A better term
> would be "triple template".

The key concept is the graph template.  Rather than define triple template for
the sole purpose of defining a graph template, it is better to use triple
pattern again because the same machinery of solutions and substitutions
applies.  A whole parallel structure would have to be defined.

> 2. There is no definition of what S(tj) is.  Perhaps the reader is
> supposed to recall the definition
> in section 3.3 "Value constraints - definition",
> which defines S(C) where C is a constraint.  However, tj is not a
> constraint.

S is a solution and S(tj) is the triple pattern as before.  It's the same
machinery as in matching.

I've added "pattern" to solution to more strongly tie back to the earlier use.

> It might be a good idea to define (or repeat the definition) of S(tj).

   From other comments and refinement by the working group, there has been other
changes that will make the definitions clearer.

> 
> 
> 10.4.3 Description of resources
> typo, first sentence: "...is the determined by...": delete "the"
> typo after second box: "as well information which as name and other...":
> possibly you mean "as well as information such as name..." or
> possibly "as well as information which has name...".
> typo, formal definition, last sentence: "does not proscribe".
> "proscribe" means "prohibit"; you want "prescribe", which means
> "specify".

Edits applied.

> 
> 
> 11. Testing values
> It says "the operands of these functions and operators are the subset
> of XML Schema datatypes...".  But the operands are values of these
> types, not the types themselves.

Changed to "The datatypes of the operands"

> 
> 11.2.3.1 bound
> The text around the examples contains misstatements.  The text
> preceding the first sample query says "This query finds the people
> without a dc:date property" whereas in fact it finds the people who
> do have a dc:date.  The sentence following the second sample query
> is also wrong.  It says "Because Alice's mbox was known, "Alice"
> was not a solution to the query" but "Alice" does not have a mailbox
> and "Alice" is a solution to the query.

Fixed

> 11.2.3.3 isBlank
> The text before the sample query is a cut-and-paste error, a
> duplicate of the text in 11.2.3.2 "isIRI". 

Fixed

> A.2 White space
> last sentence "As a hint, rule names below in capitals indicate a
> possible choice of terminals".  Who has this possible choice?
> There are two consumers of this document: implementers and users.
> The entire grammar defines a space of choices for the language user,
> so I don't think this sentence is pitched at the user.  I think you
> mean "possible choice of terminals for those who are constructing
> a SPARQL parser". 

It now says:

""
Rule names below in capitals indicate where whitespace is significant; these
form a possible choice of terminals for constructing a SPARQL parser.
"""

> 
> 
> Appendix A.7, Grammar
> Rules [43] "Expression" through [51] "UnaryExpression" follow a top-down
> pattern in the order of
> presentation.  Then rule [51] "UnaryExpression" requires
> the definition of PrimaryExpression, which one expects to be
> the next BNF.  Actually, PrimaryExpression occurs as rule [58],
> and many (though not all) of its constituents appear in rules [52]
> "BuiltinCall" through [57] "BracketedExpression".  It would be better
> to rearrange the rules in the following order:
> [58] "PrimaryExpression"
> [57] "BracketedExpression"
> [52] "BuiltinCall"
> [53] "RegexExpression"
> [55] "IRIrefOrFunction"
> [56] "ArgList"

Reordered with "FunctionCall"/"ArgList" moved earlier.

> 
> As for [54] "FunctionCall", this rule is used in [16] OrderCondition,
> but, in a separate comment, I think that arbitrary expressions should
> be permitted in OrderCondition, not just function calls.

They are allowed - the syntax is "ORDER BY (1/?x)"

> Appendix D, Collected formal definitions
> This appendix is listed in the table of content but is not present,
> not even as a to-be-done item.  I would appreciate having this
> appendix.  I suspect that the current formal definitions have
> omissions, inconsistencies, etc., but it is very hard to check currently.

A live link to the collected definitions is:

http://www.w3.org/2000/06/webdata/xslt?xslfile=http%3A%2F%2Fwww.w3.org%2F2001%2Fsw%2FDataAccess%2Frq23%2Fdefns.xsl&xmlfile=http%3A%2F%2Fwww.w3.org%2F2001%2Fsw%2FDataAccess%2Frq23%2F&transform=Submit

> No particular location
> The terms "solution" and "pattern solution" are both in use in the
> document.  For consistency it would be better to pick one of these terms
> and use it exclusively.

Could you say where it is confusing?  It can get rather repetitive to say
"patterns solution" everytime.

> 
> Fred Zemke
> 

Thank you very much for the detail comments. I hope this message responds to
all your editing comments - please let us know if it does.

	Andy
Received on Friday, 10 February 2006 11:58:31 UTC