Re: query review, part 1 (ACTION-546) from Steve Harris on 2011-11-24 (public-rdf-dawg@w3.org from October to December 2011)

From: Steve Harris <steve.harris@garlik.com>
Date: Thu, 24 Nov 2011 13:04:07 +0000
To: Andy Seaborne <andy.seaborne@epimorphics.com>
Cc: public-rdf-dawg@w3.org
Message-Id: <2C28727E-CE41-48DF-851D-42822F50ED93@garlik.com>
On 24 Nov 2011, at 12:23, Andy Seaborne wrote:

> More fixes.
> 
> About half the remainder done.
> 
> @@ for things outstanding.
> 
> CVS commited (when there is a link for this email) - will continue.
> 
> 	Andy
> 
> On 22/11/11 10:38, Steve Harris wrote:
>> Great, thanks Greg.
>> 
>> Some easy fixes inline. Some of the others I'm either not swapped in on, or will need discussion.
> 
> Which? (I don't see any)

Which don't you see?

I should actually have written "Some easy fixes indicated inline…"

>> On 2011-11-21, at 17:13, Gregory Williams wrote:
>> 
>>> Below is a review of the current Query draft, except for section 18 which I will send in a subsequent email.
>>> 
>>> thanks,
>>> .greg
>>> 
>>> 
>>> 
>>> === Section 1.1
>>> 
>>> The fragment id in the link to section 11 has a typo (#aggreates).
>> 
>> Fixed.
>> 
>>> The sentence/paragraph linking to section 14 should end with a period.
>> 
>> Done.
>> 
>>> === Section 1.2.1
>>> 
>>> The prefix sfn: is defined, but doesn't seem to ever be used.
>> 
>> I imagine that was supposed to be used for "native" SPARQL functions, e.g. STR, if so it should be fixed by giving the URI for STR in the document.
> 
> Yes.  Sometime we need to add URIs for all the functions.

I can maybe do that tomorrow? That's probably my day on SPARQL text hacking for a while, as I have a week off (ha!) then will be /very/ busy for a week or more.

[ I have nothing to add to the below ]

- Steve

>>> === Section 1.2.2
>>> 
>>> What is meant by "explicitly" in: "This document uses the Turtle [TURTLE] data format to show each triple explicitly." Is this meant to include the turtle abbreviated forms (which I would think of as "showing" triples implicitly)?
> 
> IIRC (this is SPARQL 1.0 text):
> 
> The contrast is RDF/XML where triples are not as clear.
> 
>>> 
>>> === Section 2.2
>>> 
>>> "This is a basic graph pattern match; all the variables used in the query pattern must be bound in every solution."
>>> Is it clear enough that "the query pattern" in this sentence is constrained to mean only "basic graph pattern"? Because not all variables used in general graph patterns need be bound.
> 
> I'm comfortable with the current wording but if you have any better wording, we can use that.
> 
>>> 
>>> === Section 2.4
>>> 
>>> """Blank node labels are scoped to a result set (as defined in "SPARQL Query Results XML Format") """
>>> Should this reference be generalized to any of the sparql results formats?
> 
> Yes - added JSON (it applies to TSV as well but that's not going to be a REC).
> 
> "as defined in" ==> "see"
> 
>>> 
>>> === Section 2.5
>>> 
>>> The introductory paragraph should include a link to section 17.4.3.12 when mentioning string concatenation.
> 
> "The queries below show how to the CONCAT function can be used" (with link)
> 
>>> FOAF is used in example data and queries in previous sections, but 2.5 uses "foaf data" without any reference to what foaf is.
> 
> I think FOAF is sufficiently well-known but I've made the first use (2.2) of
> 
> @prefix foaf:  <http://xmlns.com/foaf/0.1/> .
> 
> have a link as the URI.
> 
>>> 
>>> === Section 3
>>> 
>>> "SPARQL FILTERs restrict solutions to those for which the filter expression evaluates to TRUE."
>>> Is there a reason "TRUE" is set in emphasized text?
> 
> Not a strong reason but it's also <tt> as it's a keyword.
> 
>>> 
>>> === Section 3.1
>>> 
>>> s/FILTER functions/functions/ (?)
>>> 
>>> The link to the regex function is broken.
>> 
>> Fixed regex link.
>> 
>>> "regex matches only plain literals with no language tag."
>>> Why not use "simple literal" in this sentence?
> 
> Historical. "simple literal" wasn't around on day one.
> 
> And now wrong anyway as preparation for RDF 1.1.  Changed to "string literal" with link to section discussion this.
> 
>>> 
>>> === Section 3.3
>>> 
>>> "17.3 Operator Mapping lists a set of test functions, including BOUND, isLITERAL and langMATCHES and accessors, including STR, LANG and DATATYPE."
>>> Why cite these specific functions as examples now that the function library is so large?
> 
> Worse, it's wrong because I removed the full list of functions from 17.3 (because the table would be too long ...)
> 
> [[
> In addition to numeric types, SPARQL supports
> types xsd:string, xsd:boolean and xsd:dateTime
> (see Operand Data Types).
> Section Operator Mapping describes the operators
> and section Function Definitions the functions that can be
> that can be applied to RDF terms.
> ]]
> 
>>> 
>>> === Section 4.1.1
>>> 
>>> I think "irelative-ref" should be typeset with emphasis to make it (more) clear that it's a grammar rule in the referenced document.
> 
> Done
> 
>>> 
>>> === Section 4.1.2
>>> 
>>> In 'with language tag "fr"', the language tag is shown in quotes, but in section 2.3, a language tag is shown bare: 'a language tag en'.
> 
> It was <code> in 2.3.  Changed to "en"
> 
>>> 
>>> === Section 4.2
>>> 
>>> "Triple Patterns are written as a whitespace-separated list of a subject, predicate and object"
>>> I don't think this is true. For example,<a><b><c>  is a fine triple pattern, right?
> 
> I'm fairly sure that phrasing is from the "other" syntax!
> 
> Changed to:
> 
> Triple Patterns are written as subject,
> predicate and object;
> 
> (I think white-space are obvious by this point in the doc ... and Turtle will be a REC).
> 
>>> 
>>> === Section 5
>>> 
>>> The grammer rule number (13) used to identify the rule for WhereClause is wrong. It should be 17.
>> 
>> Fixed.
>> 
>>> === Section 5.1
>>> 
>>> "SPARQL graph pattern matching is defined in terms of combining the results from matching basic graph patterns."
>>> What about the new to 1.1 (non-BGP) methods of producing results that are combined (paths, assignment, bindings)?
>>> 
>>> "A sequence of triple patterns interrupted by a filter comprises a single basic graph pattern."
>>> What about an interruption of multiple filters?
> 
> s/a filter/filter expressions/
> 
>>> 
>>> === Section 9.1
>>> 
>>> In the table, "iri" is said to match "A IRI or a prefixed name," but the introduction to section 9.1 also indicates that the 'a' keyword is acceptable.
> 
> Changed to:
> 
> """
> In the description below, iri is either an IRI written in full or abbreviated by a prefixed name, or the keyword a.
> """
> and in the table
> 
> iri     An IRI. A path of length one.
> 
>>> 
>>> s/A path between n and m/A path of between n and m/
> 
> Done
> 
>>> 
>>> Is there a reason not to continue using the linguistic style of "A path of XXX" for the syntax forms elt{n}, elt{n,} and elt{,n}?
> 
> No - done.
> 
>>> 
>>> === Section 9.2
>>> 
>>> Why do these examples use different styling (in a table) than example query patterns in other parts of the document?
> 
> Because the HTML/CSS is <example>.
> 
> Large scale rewrite of the HTML done.
> 
>>> 
>>> Section 9.3
>>> 
>>> Is "possible cyclic" meant to be "possibly cyclic"?
> 
> Done
> 
>>> 
>>> Typo (extra "a") in "... if an RDF term in the data would be a matched again ...".
> 
> Done
> 
>>> 
>>> === Section 11
>>> 
>>> s/rather that a single solution/rather than a single solution/
>> 
>> Fixed.
>> 
>>> === Section 11.1
>>> 
>>> "AS" should be in tt text in "using the keyword 'AS'".
>> 
>> Fixed.
>> 
>>> === Section 11.2
>>> 
>>> This seems to be the first place where a solution sequence is presented with inline text instead of the tabular form. Is there a reason not to show it in a table? If so, could the binding syntax ("?x->2") at least use a proper arrow character?
>> 
>> I would prefer to use a real arrow, but I changed it from that at someone else's request.
>> 
>> It was done that way to show that were dealing with algebra solutions, not results (results are ordered), but it's not a very strong preference.
>> 
>>> "GROUP BY" should be in tt text in "The GROUP BY function, Group((?x), S) gives G = ...".
>>> Also, "Group(...)" should be linked to the appropriate definition for the Group function, as should "GroupConcat" used in the final paragraph of this section.
>> 
>> Fixed.
>> 
>>> === Section 11.3
>>> 
>>> "HAVING" should be in tt text in "An example of the use of HAVING is given below."
>> 
>> Fixed.
>> 
>>> === Section 11.4
>>> 
>>> "In a query level which uses aggregates, only expressions consisting of aggregates and constants may be projected, with one exception."
>>> I don't think this is true. A variable may be projected if it is the alias of a complex group expression. This is actually used as an example later in 11.4.
>> 
>> What do you mean by alias? AS actually binds a variable to a value for that solution, so I think it's covered by the exception.
>> 
>>> === Section 11.5
>>> 
>>> s/builtin/built-in/
>> 
>> Fixed.
>> 
>>> === Section 11.7
>>> 
>>> This section should have some introductory text.
>> 
>> Added: “This section shows some examples of queries using aggregation, which demonstrate how errors are handled in results, in the presence of aggregates.”
>> 
>>> === Section 12
>>> 
>>> "Project" should be in tt text in "Only variables projected by the Project function are visible to operations outside the ToMultiset call."
>> 
>> Fixed.
>> 
>>> === Section 13
>>> 
>>> Is "one of all of the named graphs" meant to be "one **or** all of the named graphs"?
>> 
>> I think that's correct as written, though the phraseology is a bit strange.
>> 
>>> === Section 13.2
>>> 
>>> The link to the protocol document will need to change whenever there's a FPWD of the 1.1 protocol document.
> 
> Done (there already are WD of protocol)
> 
>>> 
>>> === Section 13.2.1
>>> 
>>> The example data is said to be "stored at http://example.org/foaf/aliceFoaf". Is this too strong a claim? Can't it just be a representation of the resrouce<http://example.org/foaf/aliceFoaf>? This happens again in section 13.2.3, but not in 13.2.2.
> 
> s/stored/located/
> 
> (I don't think the exact language of a "obtained from a representation of the resource ..." is helpful in a one line comment)
> 
>>> 
>>> "the default graph is based on the RDF merge ..."
>>> The phrase "based on" seems needless here. The default graph *is* the RDF merge of the FROM graphs, correct?
> 
> done.
> 
>>> 
>>> === Section 13.3
>>> 
>>> "The use of GRAPH changes the active graph for matching basic graph patterns within part of the query."
>>> The active graph affects the matching of property paths, too, correct?
> 
> Easier to say all patterns (they pass down the active graph to BGPs and paths)
> 
> """
> The use of GRAPH changes the active graph for matching
> graph patterns within that part of the query. Outside the use of GRAPH,
> matching is done using the default graph.
> """
> 
>>> === Section 13.3.4
>>> 
>>> "The IRI for the date datatype has been abbreviated in the results for clarity."
>>> I don't think this sentence is necessary. Many other results are shown with abbreviations (such as using prefix names without prefix definitions) without comment.
> 
> Removed.
> 
>>> 
>>> === Section 15.3.1
>>> 
>>> "Specifically, each solution that binds the same variables to the same RDF terms as another solution is eliminated from the solution sequence."
>>> Wouldn't this imply all such solutions would be dropped instead of all but one of them?
> 
> """
> Only one solution solution that binds the same variables to the same RDF terms is returned from the query.
> """
> 
>>> === Section 15.5
>>> 
>>> "If the number of actual solutions is greater than the limit, then at most the limit number of solutions will be returned."
>>> This doesn't seem right (or seems misleading). Without an OFFSET clause, if the numer of actual solutions is greater than the limit, then THE LIMIT NUMBER of solutions will be returned. If the wording is due to the possible presence of an OFFSET clause, this should be made explicit.
> 
> Added: " after <code>OFFSET</code> is applied"
> 
>>> === Section 16
>>> 
>>> "DESCRIBE Returns an RDF graph that describes the resources found."
>>> Is "found" the correct term for the "DESCRIBE<iri>" form (as opposed to queries describing terms 'found' by evaluating a graph pattern)?
> 
> I'm happy - it's a short, descriptive piece of text
> (or: it's trivially "found" like 1 is an expression.)
> 
> Do you have preferred other wording?
> 
>>> 
>>> === Section 16.1.1
>>> 
>>> "Result sets can be accessed by a local API but also can be serialized into either XML or an RDF graph."
>>> This should mention the other result formats (JSON, CSV/TSV).
> 
> In 16.1 I've put all the forms.
> 
> I've added a JSON example in 16.1.1.
> 
> Corrected name of XML results in one place.
> Removed vestige of RDF graph result sets.
> 
>>> 
>>> === Section 16.1.2
>>> 
>>> s/New variables can be also be used/New variables can also be used/
>> 
>> Fixed.
>> 
>>> === Section 16.2.1
>>> 
>>> "The use of variable x in the template, which in this example will be bound to blank nodes with labels _:a and _:b in the data, causes different blank node labels (_:v1 and _:v2) in the resulting RDF graph."
>>> This isn't required, correct? The resulting RDF graph *could* use the same blank node labels as are present in the RDF Dataset.
> 
> @@
> 
>>> 
>>> === Section 16.2.2
>>> 
>>> Doesn't the use of an extension function in the second example needlessly comlicate the example?
> 
> @@
> 
>>> 
>>> === Section 16.3
>>> 
>>> The query results should probably be "true" and "false", not "yes" and "no" (unless we're catering to objective-c programmers :)
> 
> @@
> 
>>> 
>>> === Section 16.4
>>> 
>>> This section is marked as informative, but there seems to be information here which isn't just informative. The text specifically calls out parts that aren't normative ("this data is not prescribed by a SPARQL query"). With this section being non-normative, there's no text beyond the grammar that talks normatively about how DESCRIBE is handled.
>>> I also notice there's no discussion about the allowable forms of the DESCRIBE form. For example, what does "DESCRIBE *" (without a graph pattern) mean?
> 
> @@
> 
>>> 
>>> === Section 16.4.3
>>> 
>>> "The RDF returned is determined by the information publisher."
>>> 
>>> This seems to imply that "the information publisher" == "the SPARQL query processor" (as described in 16.4).
> 
> @@
> 
>>> 
>>> "It is the useful information the service has about a resource."
>>> 
>>> This seems to prescribe what data is/should be returned for a DESCRIBE query (contradicting earlier text).
> 
> @@
> 
>>> 
>>> If we're going to use the foaf:mbox_sha1sum property in the example RDF, should we at least use a lexical form that's actually valid for SHA1?
> 
> @@
> 
>>> 
>>> "For a vocabulary such as FOAF, where the resources are typically blank nodes..."
>>> Is this true anymore? Does it matter?
>>> 
>>> "WHERE" should be in tt text in "In the example, the match to the WHERE clause was returned, but this is not required."
> 
> @@
> 
>>> 
>>> === Section 17
>>> 
>>> The section number in the link text "11.2.2 Effective Boolean Value" is wrong. It should be "17.2.2".
>> 
>> Fixed.
>> 
>>> The section number in the link text "section 11.3" is wrong. It should be "17.3".
>> 
>> Fixed.
>> 
>>> The section number in the link text "section 11.5" is wrong. It should be "17.5".
>> 
>> Fixed.
>> 
>>> "Operators introduced by this specification are indicated with the SPARQLoperator class."
>>> It took me some time to figure out that 'SPARQLoperator' was a class name being used in the underlying HTML, and that what this meant was that such operators are shown in the same styling (yellow background) as the text "SPARQLoperator class".
>>> The SPARQLoperator styling seems to be used inconsistently. It is sometimes applies to "logical-or" and "logical-and", but not always.
> 
> @@
> 
>>> 
>>> === Section 17.2
>>> 
>>> The section number in the text "EBV rules in section 11.2.2" is wrong. It should be "17.2.2".
>> 
>> Fixed.
>> 
>>> "Apart from BOUND, NOT EXISTS and EXISTS, all functions and operators operate on RDF Terms and will produce a type error if any arguments are unbound."
>>> I assume this list should include COALESCE?
>> 
>> Fixed.
>> 
>>> "Any expression other than logical-or (||) or logical-and (&&) that encounters an error will produce that error."
>>> Again regarding COALESCE.
>> 
>> Not quite, the situation with COALESCE is a bit more complex. The word "encounters" is vague enough that it covers that I think.
>> 
>>> === Section 17.2.2
>>> 
>>> "If the argument is a typed literal with a datatype of xsd:boolean, the EBV is the value of that argument."
>>> Since this comes immediately after the text says the EBV is false if the lexical form is invalid, I think this should be: "If the argument is a typed literal with a datatype of xsd:boolean and a valid lexical form, the EBV is the value of that argument." Similarly for the subsequent rule regarding numeric types.
> 
> @@
> 
>>> 
>>> === Section 17.3
>>> 
>>> The section number in the link text "section 11.4" is wrong. It should be "17.4".
>> 
>> Fixed.
>> 
>>> === Section 17.3.1
>>> 
>>> "The consequence of this rule is that SPARQL extensions will produce at least the same solutions as an unextended implementation, and may, for some queries, produce more solutions."
>>> Is this always true? I want to say that with negation there are probably examples where changing a type error into a result could yield fewer results.
> 
> @@
> 
>>> 
>>> === Section 17.4.1.2
>>> 
>>> "Only one of expression2 and expression3 is evaluated."
>>> This contradicts section 17.2.1 which indicates that argument expressions are evaluated before the function or operator is invoked.
> 
> @@
> 
>>> 
>>> === Section 17.4.1.4
>>> 
>>> Why does 'exists' appear in both lower and uppercase, and both quoted and unquoted in this section?
>>> 
>>> "Variables in the pattern pat that are not bound in the current solution mapping take part in pattern matching."
>>> There is no pattern 'pat'. This should read "Variables in the pattern ..." with 'pattern' in tt text.
> 
> @@
> 
>>> 
>>> === Section 17.4.1.5
>>> 
>>> The section number in the text "section 11.2, Filter Evaluation" is wrong. It should be "17.2".
>> 
>> Fixed.
>> 
>>> === Section 17.4.1.6
>>> 
>>> The section number in the text "section 11.2, Filter Evaluation" is wrong. It should be "17.2".
>> 
>> Fixed.
>> 
>>> === Section 17.4.1.7
>>> 
>>> The footnote link is broken. I was able to read the footnote text in the document source, but was confused that the footnote text only describes the case of custom datatyped literals, not any literal as the document text discusses. Neither the text or the footnote discusses explicitly the case of comparing two non-equal literals in *supported* datatypes (e.g. xsd:integer).
> 
> @@
> 
>>> 
>>> "In this query for documents that were annotated on New Year's Day (2004 or 2005) ..."
>>> I find this text confusing, as the query is about a single, specific dateTime, not two different dates. The datetime with "2004" in its lexical value does *not* represent "New Year's Day" of 2004.
> 
> @@
> 
>>> 
>>> === Section 17.4.1.9
>>> 
>>> "Errors in comparisons cause the IN expression to raise an error if the RDF term being tested is not found elsewhere in the list of terms."
>>> This contradicts section 17.2.1 which indicates that if the evaluation of any argument results in an error, the entire invocation generates an error.
> 
> @@
> 
>>> 
>>> === Section 17.4.2.2
>>> 
>>> s/foaf:knows/dc:creator/
>> 
>> Fixed.
>> 
>>> === Section 17.4.2.5
>>> 
>>> Should the example REGEX be escaping the dot in "@work.example" pattern?
>> 
>> Possibly, yes, but maybe it should also have a $?
> 
> @@@@
> 
>> 
>>> === Section 17.4.2.6
>>> 
>>> The new working draft of turtle allows mixed-case language tags, but the current turtle spec only allows lowercase languages. Also, RDF Concepts says literals have an optional language tag "normalized to lowercase" (I'm not sure if this means LANG() should *always* return lower-case values). I suggest the example in this section be changed to use lowercase language tags.
> 
> @@
> 
>>> === Section 17.4.2.7
>>> 
>>> Need an "at risk" note talking about rdf:langString's relationship to the RDF WG.
> 
> @@
> 
>>> 
>>> === Section 17.4.2.8
>>> 
>>> "Passing any RDF term other than a simple literal or an IRI is an error."
>>> This contradicts the function signature as accepting xsd:string literals.
>> 
>> Added xsd:string to the text.
>> 
>>> === Section 17.4.2.9
>>> 
>>> Why not merge this with 17.4.2.8, just as isURI() is part of the section for isIRI?
> 
> @@
> 
>>> 
>>> === Section 17.4.2.10
>>> 
>>> Why not accept xsd:string arguments, as with IRI()?
>> 
>> It should probably list all the option, but RDF WG is about to make this redundant anyway.
> 
> @@
> 
>> 
>>> Why not allow language-tagged literals? It would seem very burdensome to jump through the necessary hoops to get "foo", "foo"@en and "foo"@es to produce different bnodes using this function…
>> 
>> Yes, they probably should take anything string-y, but why not accept numerical/datetime literals too?
> 
> @@
> 
>> 
>>> === Section 17.4.2.11
>>> 
>>> Should there be any mention of what happens (or should happen) if an invalid lexical form is used with STRDT() for a known/supported datatype?
> 
> @@
> 
>>> 
>>> === Section 17.4.2.12
>>> 
>>> Should there be any mention of the language tag conforming to RFC-3066?
> 
> @@
> 
>>> 
>>> === Section 17.4.3.1.1
>>> 
>>> s/fucntion/function/
>> 
>> Fixed.
>> 
>>> === Section 17.4.3.2
>>> 
>>> Should this section mention the impact of Unicode normalization on the return value of STRLEN? For example, the difference between "\u0075\u0308" (strlen 2) and its normalized equivalent "\u00fc" (strlen 1).
> 
> @@
> 
>>> 
>>> === Section 17.4.3.14
>>> 
>>> The text describing REGEX should mention the flags argument.
>> 
>> I think that's handled well enough by F&O. No point repeating bits of that doc.
>> 
>>> === Section 17.4.3.15
>>> 
>>> "It replaces each non-overlapping occurence of the regular express 'pattern' with ..."
>>> s/express/expression/
>> 
>> Fixed.
>> 
>>> s/'pattern'/pattern/ in tt text
>> 
>> Not sure about that one, Andy?
> 
> @@
> 
>> 
>>> === Section 17.4.4.1
>>> 
>>> Can ABS() produce an invalid lexical form (e.g. for xsd:negativeInteger)?
> 
> @@
> 
>>> 
>>> === Section 17.4.6.1–17.4.6.5
>>> 
>>> "Hex digits should be in lower case."
>>> Should "should" be RFC2119-emphasized?
>> 
>> Agreed, fixed.
>> 
>>> === Section 19.5
>>> 
>>> s/must be conform/must conform/
>> 
>> Fixed.
>> 
>>> === Section 19.8
>>> 
>>> s/covering the the/covering the/
>>> s/which syntactically a/which is syntactically a/
>> 
>> Fixed.
>> 
>>> The anchor link in note 14 (link text "function call") is missing the '#' character to make it a link to a fragment id.
>> 
>> Fixed.
>> 
>>> === Section 20
>>> 
>>> "This specification is intended for use in conjunction with..."
>>> The JSON and CSV/TSV documents should be listed along with protocol and SPARQL/XML.
> 
> @@
> 
>>> 
>>> "Note that the SPARQL protocol describes an abstract interface as well as a network protocol, and the abstract interface may apply to APIs as well as network interfaces."
>>> This isn't true anymore.
> 
> @@
> 
>>> 
>>> === Section A
>>> 
>>> The protocol document citation will need to change whenever there's a FPWD of the 1.1 protocol document.
> 
> @@
> 

-- 
Steve Harris, CTO, Garlik Limited
1-3 Halford Road, Richmond, TW10 6AW, UK
+44 20 8439 8203  http://www.garlik.com/
Registered in England and Wales 535 7233 VAT # 849 0517 11
Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD
Received on Thursday, 24 November 2011 13:04:52 UTC